Ask Computer Engineering Expert

Suffix table

Please read the following text carefully and use it to reply to the six queries at the end of the problem.

Finding a given character string (a so-called pattern) in a longer character string (a text) is one of the basic applications of computer science. You can implement the search by going through the whole text from start to finish while checking whether the pattern you are searching for is present. This is a very slow way of doing it if the text is very large. You can make the search quicker by creating an index structure based on the text beforehand. With the help of the index structure, you can avoid going through the whole text. One simple and frequently used index structure is called a suffix table. This method has been used for efficient searching in large DNA data collections, for example, such as the human genome with some 3 billion characters.

A character string consists of characters following each other. You can refer to a character string with a character string variable. You can refer to certain characters in the string by giving the index of the character within square brackets following the character string variable. For instance, x[1] refers to the first character in the string x. The annotation x[i..j] refers to the partial character string in x that starts at index i and ends at index j. A partial character string that continues to the end of the character string is called a suffix. The starting index of the suffix is called the index of the suffix. We use <, ≤, =, ≥, > for the character strings x and y to compare their alphabetical order. For example, x ≤ y means that x is alphabetically smaller than or as large as y, and x = y means that x and y are the same character string.

Example 1.

If the character string x is "group" it is true that:
• x[1] = " g", x[2] = " r" and x[4] = " u".
• x[1..3] = " gro", x[3..3] = " o" and x[2..5] = " roup".

• The suffixes to the character string x given in order according to the suffix indexes, i.e. according to the start indexes 1, 2, 3, 4 and 5, are x[1..5] = " group", x[2..5] = " roup", x[3..5] = " oup", x[4..5] = " up" and x[5..5] = " p".

We refer to a text with the character string variable t and to the pattern we are looking for in the text with the string variable p. Furthermore, we assume that the length of the text t is n.

The suffix table S for text t is an integer table with n elements, which lists the indexes of the suffixes in the text in alphabetical order of the suffixes. The value of index i in table S, annotated as S[i], indicates from which position in the text the ith smallest index suffix starts: S[1] indicates the index of the smallest suffix (in alphabetical order) in the text, S[2] the index for the next suffix in alphabetical order, etc. You can also express this as follows: for index i = 2, . . . , n it is true that t[S[i - 1]..n] ≤ t[S[i]..n].

Example 2. The suffixes for text t = " abababba" are " abababba", " bababba", " ababba", " babba", " abba", " bba", " ba" and " a". Below to the left are the suf- fixes for text t in alphabetical order, and to the right is the suffix table S for text t. Please note that the values of the suffix table are the same as the suf- fix indexes to the left. For example, the value S[5] = 7 tells us that the fifth largest suffix alphabetically starts from index 7, i.e. t[7..8] = is " ba".

Suffix   (alphabetical order)

a

Suffix  index

8

Index i

1

S[i]

8

abababba

1

2

1

ababba

3

3

3

abba

5

4

5

ba

7

5

7

bababba

2

6

2

babba

4

7

4

bba

6

8

6

A basic suffix feature is that, if pattern p occurs anywhere in the text, then p occurs as the start of the suffix that starts at that point in the text. In that case, we say that pattern p matches the suffix in question. We can search for pattern p in the text by searching for a suffix that matches p. With the help of the suffix table, this search can be made efficient by using a so-called binary search.

The binary search maintains information on a suffix table interval that, in accordance with our current information, could contain a suffix that matches pattern p. We use the annotation start for the start index of the search interval, and end for the end index. Further, we will determine the middle index middle with the formula middle = (start + end)/2, which we will round upwards if the sum of (start + end) is uneven. At first, all the suffixes are possible, so start = 1 and end = n.

The binary search compares the pattern and suffix in the middle of the search interval t[S[middle]..n] to each other. If the pattern matches, the search can end1 . Otherwise, either p < t[S[middle]..n] or p > t[S[middle]..n] is true.

If p < t[S[middle]..n], i.e. the pattern is alphabetically smaller than the suffix in index S[middle], none of the suffixes in interval middle, . . . , end can match the pattern. This follows directly from the suffix table containing the suffixes in alphabetical order. In that case, the binary search updates the upper bound of the search interval to end = middle - 1 and compares the pattern to the suffixes in the middle of the new search interval during the next search round.

In the same way, if p > t[S[middle]..n], we can say that none of the suffixes in the interval start, . . . , middle can match the pattern. Then we can update the lower bound to start = middle + 1.

If the search interval remains empty during the search, i.e. the criterion in start > end is met, the search ends with no result: the text does not contain pattern p. Below we see the search for pattern p in text t with the help of suffix table S in more detail, step by step:

1. Set for the search interval start index start = 1 and end index end = n.
2. If start > end i.e. the search interval is empty, end the search: pattern p does not occur in the text.
3. Calculate the middle index middle between start and end index: middle = (start + end)/2, round off upwards if necessary.
4. Compare pattern p with middle suffix t[S[middle]..n] of the interval.
• If p matches the start of t[S[middle]..n], end the search with the information that pattern p was found starting at index S[middle] of the text.
• If p does not match the middle suffix t[S[middle]..n] of the interval, then:
- If p < t[S[middle]..n], update end = middle - 1 and continue searching by going back to .
- If p > t[S[middle]..n], update start = middle + 1 and continue the search by going back to 4.

Queries

Query 1. Give the steps a binary search makes when searching for the pattern p = " babaa" in the text in the example 3 t = " abbaababbababbab". Please write your answer in the same format as the example 3, i.e. for each step, write the search interval values start, end and middle.

Query 2. Write the suffix table for the character string t = " assignment".

Query 3.

a) Write the suffix table for the character string t = " aacatcgatagctagaacat".

b) Write the steps a binary search takes when searching for the pattern p = " cga" in the text in a). Please write your answer in the same format as the example 3, i.e. for each step, write the search interval values start, end and middle.

Query 4. Please write a character string consisting of characters in the English alphabet with a suffix table like the one below. This means that characters in the alphabet" a", " b", " c", . . ., " z" are acceptable.

i

S[i]

1

8

2

7

3

6

4

5

5

4

6

3

7

2

8

1

Query 5. Write a character string that contains only the characters " a" and" b" with a suffix table like the one below.

i

S[i]

1

16

2

13

3

3

4

14

5

11

6

9

7

4

8

6

9

15

10

12

11

2

12

10

13

8

14

5

15

1

16

7

Query 6. Program the algorithm given on page 3. Test your program using the example 3 and queries 2-5.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M91583149
  • Price:- $95

Guranteed 48 Hours Delivery, In Price:- $95

Have any Question?


Related Questions in Computer Engineering

Does bmw have a guided missile corporate culture and

Does BMW have a guided missile corporate culture, and incubator corporate culture, a family corporate culture, or an Eiffel tower corporate culture?

Rebecca borrows 10000 at 18 compounded annually she pays

Rebecca borrows $10,000 at 18% compounded annually. She pays off the loan over a 5-year period with annual payments, starting at year 1. Each successive payment is $700 greater than the previous payment. (a) How much was ...

Jeff decides to start saving some money from this upcoming

Jeff decides to start saving some money from this upcoming month onwards. He decides to save only $500 at first, but each month he will increase the amount invested by $100. He will do it for 60 months (including the fir ...

Suppose you make 30 annual investments in a fund that pays

Suppose you make 30 annual investments in a fund that pays 6% compounded annually. If your first deposit is $7,500 and each successive deposit is 6% greater than the preceding deposit, how much will be in the fund immedi ...

Question -under what circumstances is it ethical if ever to

Question :- Under what circumstances is it ethical, if ever, to use consumer information in marketing research? Explain why you consider it ethical or unethical.

What are the differences between four types of economics

What are the differences between four types of economics evaluations and their differences with other two (budget impact analysis (BIA) and cost of illness (COI) studies)?

What type of economic system does norway have explain some

What type of economic system does Norway have? Explain some of the benefits of this system to the country and some of the drawbacks,

Among the who imf and wto which of these governmental

Among the WHO, IMF, and WTO, which of these governmental institutions do you feel has most profoundly shaped healthcare outcomes in low-income countries and why? Please support your reasons with examples and research/doc ...

A real estate developer will build two different types of

A real estate developer will build two different types of apartments in a residential area: one- bedroom apartments and two-bedroom apartments. In addition, the developer will build either a swimming pool or a tennis cou ...

Question what some of the reasons that evolutionary models

Question : What some of the reasons that evolutionary models are considered by many to be the best approach to software development. The response must be typed, single spaced, must be in times new roman font (size 12) an ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As