1. Using suffix trees, give an algorithm to find a longest common substring shared among three input strings: s_{1} of length n_{1}, s_{2} of length n_{2} and s_{3 }of length n_{3}.
2. A non-empty string α is called a minimal unique substring of s if and only if it satisfies:
(i) α occurs exactly once in s (uniqueness),
(ii) all proper prefixes of α occur at least twice in s (mimimality), and
(iii) α >= l for some constant l.Give an optimal algorithm to enumerate all minimal unique substrings of s.
3. Redundant sequence identification: Given a set of k DNA sequences, S = {s_{1}, s_{2}........s_{k }}, give an optimal algorithm to identify all sequences that are completely contained in (i.e., substrings of) at least one other sequence in S.
4. Let S = {s_{1}, s_{2,}......,s_{k}} denote a set of k genomes. The problem of fingerprinting is the task of identifying a shortest possible substring α_{i} from each string s_{i} such that α_{i} is unique to s_{i} i.e., no other genome in the set S has α_{i}. Such an α_{i} will be called a fingerprint of s_{i}. (Note that it is OK for α_{i} to be present more than once within s_{i}.) Give an algorithm to enumerate a fingerprint for each input genome, if one exists. Assume that no two input genomes are identical.
5. A string s is said to be periodic with a period α , if s is α^{k} for some k>=2. (Note that α^{k} is the string formed by concatenating α k times.) A DNA sequence s is called a tandem repeat if it is periodic. Given a DNA sequence s, determine if it is periodic, and if so, the values for α and k. Note that there could be more than one period for a periodic string. In such a case, you need to report the shortest period.
6. A non-empty string β is called a repeat prefix of a string s if ββ is a prefix of s. Give a linear time algorithm to find the longest repeat prefix of s.
7. Given strings s_{1} and s_{2} of lengths m and n respectively, a minimum cover of s_{1} by s_{2} is a decomposition s_{1} = w_{1}w_{2}......w_{k}, where each w_{i} is a non-empty substring of s_{2} and k is minimized. Eg., given s_{1} = accgtatct and s_{2} = cgtactcatc, there are several covers of s_{1} by s_{2} possible, two of which are: (i) cover_{1}: s_{1} = w_{1}w_{2}w_{3}w_{4} (where w_{1} = ac, w_{2} = cgt, w_{3} = atc, w_{4} = t) , and (ii) cover_{2}: s_{1} = w_{1}w_{2}w_{3}w_{4}w_{5} (where w_{1} = ac, w_{2} = c, w_{3} = gt, w_{4 }= atc, w_{5} = t). However, only cover_{1} is a minimal cover. Give an algorithm to compute a minimum cover (if one exists) in O(m + n) time and space. If a minimum cover does not exist your algorithm should state so and terminate within the same time and space bounds. Give a brief justification of why you think your algorithm is correct | meaning, how it guarantees finding the minimial cover (if one exists).