Ask Computer Engineering Expert

Assignment: Classi?cation trees

1. Introduction

A classification tree is used for classifying inputs into one of several categories. Each node in the classification tree consists of a criterion that is applied to the input to classify it into one of (potentially) many sets. The following tree illustrates this for classifying 12 TV (s1,...,s12) shows into various categories:

156_Tree.jpg

As can be seen there are 3 shows that are half-hour mysteries, and two shows that are hour comedies. The tree allows one to find all shows that satisfy various criteria, by starting at the top of the tree, and descending through the tree based upon supplied criteria, eventually arriving at the appropriate category (shown in rectangles) at the leaves of the tree.

In this project you will build (by hand) a binary classification tree for a fixed set of inputs and analyze its behavior.

2. Binary classi?cation trees

Each criterion in a binary classification tree classifies the input into either one of two categories. It can therefore be implemented as a binary tree in which the interior nodes contained the classification criterion and each leaf contains all elements that satisfy the all the criteria along the path from the root to the leaf. Any classification tree can be built using a binary classification tree, by splitting any non-binary criteria into a sequence of binary criteria.

For this project, the leaves in the binary tree will contain single elements. This means that the criteria nodes must uniquely select a single element from the set of elements that comprise the leaves in the tree. Ideally a search tree will be balanced, in the sense that for every node in the tree, the heights of its left and right subtrees are with k| of one another, for some small k| (ideally, 1). call such a tree a balanced binary classification search tree, or a search tree for short.

3. Constructing a search tree

Let us assume that keys we are interested in are 2 characters long, e.g.

CD
ac
kc
ol

Choose 10 such codes, all different. Write each code out in binary. Now perform this recursive algorithm:

1. If you have but one code, append = to it and return the augmented pattern.

2. Total each column, counting the number of 1's. Call the leftmost column 0. Let b[i] designate this number for column-i and let a[i] = 10-b[i] designate the number of zeros.

3. Find a collection of a's or b's, such that the sum of the collection is closest to n/2 where n is the number of elements in your list.

a. If your collection is of b's, then write a bit pattern that consists of 16 bits, where the bit-i is set to 1 if b[i] is in your collection. Append to your result an operator of & (the bitwise "and").

b. If your collection is of a's, then write a bit pattern that consists of 16 bits, where the bit-i is set to 1 if b[i] is in your collection. Append to your result an operator of ^ (the bitwise "exclusive or").

4. Apply your chosen operator one-by-by to your bit patterns. Those that return "true" (i.e., a non-zero number) remove and place into a separate collection. These will form the starting block for the left-hand side of the tree. Those that are not selected will form the starting block fro the right-hand side of the tree.

5. You now have two starting sets. Recursively apply steps 1-5 to form two subtrees, with root being the bit pattern + operator you formed in step 3 above.

Here is an example

HEX

Binary

CD

43

44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ac

61

63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc

6B

63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

ol

6F

6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

b-Totals

3

1

3

0

2

1

3

4

3

1

3

0

1

2

2

2

a-Totals

1

3

1

4

2

3

1

0

1

3

1

4

3

2

2

2

In our case n=4, so we want to form a subset of size n/2=2 of either a's exclusively, or b's, exclusively.

For b's we note there are several columns that are candidates for inclusion in our subset of (approximately) size n/2.

b[1]
b[1]
b[4]
b[5]
b[9]
b[12]
b[13]
b[14]
b[15]

These are all 2 or less.

The goal is to form a subset from the list above that will match (as close to) two of the four input rows. Clearly one could pick either b[4], b[13], b[14] or `b[15]. because matching a 1 in any of these columns would pick two rows. Say we pick b[14]. Then we form the selection pattern:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

When this pattern is matched against the four inputs (using an &), it will select exactly two:

                                                     *
ac 61 63 1 0 1 0 0 0 0 1  1 0 1 0 0 0 1 1
kc 6B 63 1 0 1 0 1 0 1 1  1 0 1 0 0 0 1 1

Had b[4] been chosen, then we would form the pattern:

0 0 0 0 1 0 0 0  0 0 0 0 0 0 0 0

When this pattern is matched against the four inputs, it will select exactly two:

kc  6B  63  1 0 1 0 1 0 1 1   1 0 1 0 0 0 1 1
ol  6F   6C  1 0 1 0 1 1 1 1   1 0 1 0 1 1 0 0

In the case where there is no n/2 in the b-totals, then we can pick combinations of bits, which, when applied will choose close to n/2 inputs. This is done as follows.

Choose b[1], which will pick one row:

Pattern:      0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CD  43  44  0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 0

Having chosen that row, eliminate it from consideration and recalculate the b-totals (which can be computed by subtracting CD from the b-totals).

ac  61  63   1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 1
kc  6B  63   1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1
ol  6F  6C    1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 0
b-Totals     3 0 3 0 2 1 2 3 3 0 3 0 1 1 2 2

Now choose another column with a 1 in it (say b[12]) and "or' this with the previous pattern:

Pattern: 0 1 0 0 0 0 0 0  0 0 0 0 1 0 0 0

Now and-ing this against the original four rows

CD

43

44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ac

61

63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc

6B

63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

ol

6F

6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

will choose any row with a 1 in column 1 or column 10, giving:
=>

CD  43  44  0 1 0 0 0 0 1 1  0 1 0 0 0 1 0 0
ol    6F  6C  1 0 1 0 1 1 1 1  1 0 1 0 1 1 0 0

Of course, selecting different combinations of b-totals may choose different rows, but that does not matter, because the goal is to choose n/2 rows, which is accomplished.
Writing our pattern in hex we have:

Pattern:  0 1 0 0 0 0 0 0  0 0 0 0 1 0 0 0
Pattern:      4         0           0          8

At this point we have formed:

1085_Tree1.jpg

The designation &:4008 in the root node means that when deciding which side of the tree a candidate key would be found, perform key & 4008. If it matches, take the T branch, otherwise take the F branch.

Having split the set in half, the process is repeated recursively on both subsets.

The left set b-totals are:

CD  43  44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ol    6F  6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

b-totals:

1

1

1

0

1

1

2

2

1

1

1

0

1

2

0

0

In this case n/2=1, so we choose any b-total with a 1, say (b[1]), forming the pattern:

Pattern:   0 1 0

0

0

0 0

0

0

0 0

0

0

0 0

0

Pattern:               4

 

 

0

 

 

0

 

 

0

 

which chooses:

which chooses:

 

 

 

 

 

 

 

 

 

 

The b-totals for the left-hand set are:

ac  61 63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc  6B 63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

b-totals:

2

0

2

0

1

0

1

2

2

0

2

0

0

0

2

2

Again we choose any b-total with a 1, say b[4],

Pattern:   0 0 0

0

1

0 0

0

0

0 0

0

0

0 0

0

Pattern:               0

 

 

8

 

 

0

 

 

0

 

which chooses:

kc  6B 63  1 0 1

0

1

0 1

1

1

0 1

0

0

0 1

1

Our search tree now looks like:

825_Tree2.jpg

Note that the nodes contain patterns to be matched against a search key. If the search key is key = "ol" then key & 0x4008 matches so the left branch is selected. key & 0x4000 does not match (evaluates to 0), so the right branch is taken. Last, key =

"ol" matches at a leaf node, so ol is in the set.

Searching for "oZ' will follow the same path, but the final comparison of "oZ" = "ol" fails, so oz is not in the set.

Analysis

Now that you have worked through an example, it is time analyze both the algorithm for building the tree and the algorithm for searching the tree.

Build a tree

To demonstrate that you understand the algorithm, choose 6 two-character keys, translate them into binary (using ASCII mapping), and build a balanced search tree. Draw your tree.

Analyze the construction algorithm

Suppose we have n| records and that each record contains a key of k| bytes and information associated with the key of r| bytes. Assume also that the tree is built using 4-byte pointers.

1. The algorithm for building the tree proceeds by splitting the list in two each time, based on the b-counts to try and create a balanced tree. However, what would happen in the following pathological case?

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

a. There are 8 members of this set. Can the b-counts be used to split the set into two equally-sized subsets?

b. Suggest a way of using the a-counts in a situation like this. Specifically,

c. How would you use the a-counts to create a pattern that would split the list in two almost equal-size sets? List a particular pattern that would accomplish this for the 8 rows above.

d. What operation(s) would need to be applied to match your pattern to select 4 rows? Which four rows would be selected?

2. Suppose the b-counts are all the same value c|. This then means that all the values for the a-counts are n-c|. For what value c| would this cause the most uneven splitting of the set? (This is tricky. Spend some time on it).

3. How much space will the tree consume? Your answer should be in terms of n|, k| and r|. You may assume the r| bytes for records are stored in the leaves of the search tree.

a. What is the best possible case? (Hint consider a perfectly balanced search tree. Further hint: Your discrete math background should be sufficient to know that count of interior nodes given the count of leaf nodes. If it makes things easier for you, assume n| is a power of 2.)

b. Bonus: What is the worst case? (Hint: consider how badly the tree might be unbalanced. Calculate the number of nodes required in such a case. Consider your answer to the problem labeled "tricky" above.)

4. What is the running time of the algorithm that creates the tree? where the measure of the size of the program is the number of rows (n|) and the number of bytes (k|) in the key, and the size of the record information (r|) in the situation in which the list can be split perfectly in half each time? This is a very hard problem and it is likely that few will get it right. It is your thinking processes that will be judged rather than perfect mathematical analysis. Big-O answer without constant coefficients is good enough.

a. How many steps will it take to calculate the b-counts from a collection of rows?
(Note that this depends on k| and n|.)

b. How many steps to find the pattern? (Worst case: you have to form the pattern from a collection of 1's.)

c. How many steps does it take to reduce the b-counts for each selected pattern?

d. How many times will three steps above have to be repeated in a perfectly split binary tree?

e. Combine your results together to give your estimate of the running time of the tree-construction algorithm.

5. What is the running time of the search routine for a perfectly balanced search tree?

6. What is the running time of the search routine for the worst-balanced search tree?

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92063027
  • Price:- $120

Priced at Now at $120, Verified Solution

Have any Question?


Related Questions in Computer Engineering

Does bmw have a guided missile corporate culture and

Does BMW have a guided missile corporate culture, and incubator corporate culture, a family corporate culture, or an Eiffel tower corporate culture?

Rebecca borrows 10000 at 18 compounded annually she pays

Rebecca borrows $10,000 at 18% compounded annually. She pays off the loan over a 5-year period with annual payments, starting at year 1. Each successive payment is $700 greater than the previous payment. (a) How much was ...

Jeff decides to start saving some money from this upcoming

Jeff decides to start saving some money from this upcoming month onwards. He decides to save only $500 at first, but each month he will increase the amount invested by $100. He will do it for 60 months (including the fir ...

Suppose you make 30 annual investments in a fund that pays

Suppose you make 30 annual investments in a fund that pays 6% compounded annually. If your first deposit is $7,500 and each successive deposit is 6% greater than the preceding deposit, how much will be in the fund immedi ...

Question -under what circumstances is it ethical if ever to

Question :- Under what circumstances is it ethical, if ever, to use consumer information in marketing research? Explain why you consider it ethical or unethical.

What are the differences between four types of economics

What are the differences between four types of economics evaluations and their differences with other two (budget impact analysis (BIA) and cost of illness (COI) studies)?

What type of economic system does norway have explain some

What type of economic system does Norway have? Explain some of the benefits of this system to the country and some of the drawbacks,

Among the who imf and wto which of these governmental

Among the WHO, IMF, and WTO, which of these governmental institutions do you feel has most profoundly shaped healthcare outcomes in low-income countries and why? Please support your reasons with examples and research/doc ...

A real estate developer will build two different types of

A real estate developer will build two different types of apartments in a residential area: one- bedroom apartments and two-bedroom apartments. In addition, the developer will build either a swimming pool or a tennis cou ...

Question what some of the reasons that evolutionary models

Question : What some of the reasons that evolutionary models are considered by many to be the best approach to software development. The response must be typed, single spaced, must be in times new roman font (size 12) an ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As