Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Assignment: Classi?cation trees

1. Introduction

A classification tree is used for classifying inputs into one of several categories. Each node in the classification tree consists of a criterion that is applied to the input to classify it into one of (potentially) many sets. The following tree illustrates this for classifying 12 TV (s1,...,s12) shows into various categories:

156_Tree.jpg

As can be seen there are 3 shows that are half-hour mysteries, and two shows that are hour comedies. The tree allows one to find all shows that satisfy various criteria, by starting at the top of the tree, and descending through the tree based upon supplied criteria, eventually arriving at the appropriate category (shown in rectangles) at the leaves of the tree.

In this project you will build (by hand) a binary classification tree for a fixed set of inputs and analyze its behavior.

2. Binary classi?cation trees

Each criterion in a binary classification tree classifies the input into either one of two categories. It can therefore be implemented as a binary tree in which the interior nodes contained the classification criterion and each leaf contains all elements that satisfy the all the criteria along the path from the root to the leaf. Any classification tree can be built using a binary classification tree, by splitting any non-binary criteria into a sequence of binary criteria.

For this project, the leaves in the binary tree will contain single elements. This means that the criteria nodes must uniquely select a single element from the set of elements that comprise the leaves in the tree. Ideally a search tree will be balanced, in the sense that for every node in the tree, the heights of its left and right subtrees are with k| of one another, for some small k| (ideally, 1). call such a tree a balanced binary classification search tree, or a search tree for short.

3. Constructing a search tree

Let us assume that keys we are interested in are 2 characters long, e.g.

CD
ac
kc
ol

Choose 10 such codes, all different. Write each code out in binary. Now perform this recursive algorithm:

1. If you have but one code, append = to it and return the augmented pattern.

2. Total each column, counting the number of 1's. Call the leftmost column 0. Let b[i] designate this number for column-i and let a[i] = 10-b[i] designate the number of zeros.

3. Find a collection of a's or b's, such that the sum of the collection is closest to n/2 where n is the number of elements in your list.

a. If your collection is of b's, then write a bit pattern that consists of 16 bits, where the bit-i is set to 1 if b[i] is in your collection. Append to your result an operator of & (the bitwise "and").

b. If your collection is of a's, then write a bit pattern that consists of 16 bits, where the bit-i is set to 1 if b[i] is in your collection. Append to your result an operator of ^ (the bitwise "exclusive or").

4. Apply your chosen operator one-by-by to your bit patterns. Those that return "true" (i.e., a non-zero number) remove and place into a separate collection. These will form the starting block for the left-hand side of the tree. Those that are not selected will form the starting block fro the right-hand side of the tree.

5. You now have two starting sets. Recursively apply steps 1-5 to form two subtrees, with root being the bit pattern + operator you formed in step 3 above.

Here is an example

HEX

Binary

CD

43

44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ac

61

63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc

6B

63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

ol

6F

6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

b-Totals

3

1

3

0

2

1

3

4

3

1

3

0

1

2

2

2

a-Totals

1

3

1

4

2

3

1

0

1

3

1

4

3

2

2

2

In our case n=4, so we want to form a subset of size n/2=2 of either a's exclusively, or b's, exclusively.

For b's we note there are several columns that are candidates for inclusion in our subset of (approximately) size n/2.

b[1]
b[1]
b[4]
b[5]
b[9]
b[12]
b[13]
b[14]
b[15]

These are all 2 or less.

The goal is to form a subset from the list above that will match (as close to) two of the four input rows. Clearly one could pick either b[4], b[13], b[14] or `b[15]. because matching a 1 in any of these columns would pick two rows. Say we pick b[14]. Then we form the selection pattern:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

When this pattern is matched against the four inputs (using an &), it will select exactly two:

                                                     *
ac 61 63 1 0 1 0 0 0 0 1  1 0 1 0 0 0 1 1
kc 6B 63 1 0 1 0 1 0 1 1  1 0 1 0 0 0 1 1

Had b[4] been chosen, then we would form the pattern:

0 0 0 0 1 0 0 0  0 0 0 0 0 0 0 0

When this pattern is matched against the four inputs, it will select exactly two:

kc  6B  63  1 0 1 0 1 0 1 1   1 0 1 0 0 0 1 1
ol  6F   6C  1 0 1 0 1 1 1 1   1 0 1 0 1 1 0 0

In the case where there is no n/2 in the b-totals, then we can pick combinations of bits, which, when applied will choose close to n/2 inputs. This is done as follows.

Choose b[1], which will pick one row:

Pattern:      0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CD  43  44  0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 0

Having chosen that row, eliminate it from consideration and recalculate the b-totals (which can be computed by subtracting CD from the b-totals).

ac  61  63   1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 1
kc  6B  63   1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1
ol  6F  6C    1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 0
b-Totals     3 0 3 0 2 1 2 3 3 0 3 0 1 1 2 2

Now choose another column with a 1 in it (say b[12]) and "or' this with the previous pattern:

Pattern: 0 1 0 0 0 0 0 0  0 0 0 0 1 0 0 0

Now and-ing this against the original four rows

CD

43

44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ac

61

63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc

6B

63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

ol

6F

6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

will choose any row with a 1 in column 1 or column 10, giving:
=>

CD  43  44  0 1 0 0 0 0 1 1  0 1 0 0 0 1 0 0
ol    6F  6C  1 0 1 0 1 1 1 1  1 0 1 0 1 1 0 0

Of course, selecting different combinations of b-totals may choose different rows, but that does not matter, because the goal is to choose n/2 rows, which is accomplished.
Writing our pattern in hex we have:

Pattern:  0 1 0 0 0 0 0 0  0 0 0 0 1 0 0 0
Pattern:      4         0           0          8

At this point we have formed:

1085_Tree1.jpg

The designation &:4008 in the root node means that when deciding which side of the tree a candidate key would be found, perform key & 4008. If it matches, take the T branch, otherwise take the F branch.

Having split the set in half, the process is repeated recursively on both subsets.

The left set b-totals are:

CD  43  44

0

1

0

0

0

0

1

1

0

1

0

0

0

1

0

0

ol    6F  6C

1

0

1

0

1

1

1

1

1

0

1

0

1

1

0

0

b-totals:

1

1

1

0

1

1

2

2

1

1

1

0

1

2

0

0

In this case n/2=1, so we choose any b-total with a 1, say (b[1]), forming the pattern:

Pattern:   0 1 0

0

0

0 0

0

0

0 0

0

0

0 0

0

Pattern:               4

 

 

0

 

 

0

 

 

0

 

which chooses:

which chooses:

 

 

 

 

 

 

 

 

 

 

The b-totals for the left-hand set are:

ac  61 63

1

0

1

0

0

0

0

1

1

0

1

0

0

0

1

1

kc  6B 63

1

0

1

0

1

0

1

1

1

0

1

0

0

0

1

1

b-totals:

2

0

2

0

1

0

1

2

2

0

2

0

0

0

2

2

Again we choose any b-total with a 1, say b[4],

Pattern:   0 0 0

0

1

0 0

0

0

0 0

0

0

0 0

0

Pattern:               0

 

 

8

 

 

0

 

 

0

 

which chooses:

kc  6B 63  1 0 1

0

1

0 1

1

1

0 1

0

0

0 1

1

Our search tree now looks like:

825_Tree2.jpg

Note that the nodes contain patterns to be matched against a search key. If the search key is key = "ol" then key & 0x4008 matches so the left branch is selected. key & 0x4000 does not match (evaluates to 0), so the right branch is taken. Last, key =

"ol" matches at a leaf node, so ol is in the set.

Searching for "oZ' will follow the same path, but the final comparison of "oZ" = "ol" fails, so oz is not in the set.

Analysis

Now that you have worked through an example, it is time analyze both the algorithm for building the tree and the algorithm for searching the tree.

Build a tree

To demonstrate that you understand the algorithm, choose 6 two-character keys, translate them into binary (using ASCII mapping), and build a balanced search tree. Draw your tree.

Analyze the construction algorithm

Suppose we have n| records and that each record contains a key of k| bytes and information associated with the key of r| bytes. Assume also that the tree is built using 4-byte pointers.

1. The algorithm for building the tree proceeds by splitting the list in two each time, based on the b-counts to try and create a balanced tree. However, what would happen in the following pathological case?

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

0

1

1

1

1

1

1

1

1

a. There are 8 members of this set. Can the b-counts be used to split the set into two equally-sized subsets?

b. Suggest a way of using the a-counts in a situation like this. Specifically,

c. How would you use the a-counts to create a pattern that would split the list in two almost equal-size sets? List a particular pattern that would accomplish this for the 8 rows above.

d. What operation(s) would need to be applied to match your pattern to select 4 rows? Which four rows would be selected?

2. Suppose the b-counts are all the same value c|. This then means that all the values for the a-counts are n-c|. For what value c| would this cause the most uneven splitting of the set? (This is tricky. Spend some time on it).

3. How much space will the tree consume? Your answer should be in terms of n|, k| and r|. You may assume the r| bytes for records are stored in the leaves of the search tree.

a. What is the best possible case? (Hint consider a perfectly balanced search tree. Further hint: Your discrete math background should be sufficient to know that count of interior nodes given the count of leaf nodes. If it makes things easier for you, assume n| is a power of 2.)

b. Bonus: What is the worst case? (Hint: consider how badly the tree might be unbalanced. Calculate the number of nodes required in such a case. Consider your answer to the problem labeled "tricky" above.)

4. What is the running time of the algorithm that creates the tree? where the measure of the size of the program is the number of rows (n|) and the number of bytes (k|) in the key, and the size of the record information (r|) in the situation in which the list can be split perfectly in half each time? This is a very hard problem and it is likely that few will get it right. It is your thinking processes that will be judged rather than perfect mathematical analysis. Big-O answer without constant coefficients is good enough.

a. How many steps will it take to calculate the b-counts from a collection of rows?
(Note that this depends on k| and n|.)

b. How many steps to find the pattern? (Worst case: you have to form the pattern from a collection of 1's.)

c. How many steps does it take to reduce the b-counts for each selected pattern?

d. How many times will three steps above have to be repeated in a perfectly split binary tree?

e. Combine your results together to give your estimate of the running time of the tree-construction algorithm.

5. What is the running time of the search routine for a perfectly balanced search tree?

6. What is the running time of the search routine for the worst-balanced search tree?

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92063027
  • Price:- $120

Priced at Now at $120, Verified Solution

Have any Question?


Related Questions in Computer Engineering

Algorithms assignment -task -1 design and implement an

Algorithms Assignment - TASK - 1. Design and implement an efficient algorithm for determining the best route between two given stations in a given rail network. 2. The rail network should be passed to your program as an ...

Which sentence would work better as a thesis statement for

Which sentence would work better as a thesis statement for a three-to-five-page college paper? Remember that a thesis should be a central idea that requires supporting evidence; it should be of adequate scope for essay's ...

Question suppose user process application p1 of one

Question : Suppose user process (application) P1 of one computer wishes to transfer data (file) to process (application) P2 on another computer in the Internet. What addressing information about P2 is necessary for P1 to ...

Question responsive frameworks and libraries provide

Question : Responsive frameworks and libraries provide convenience when creating a modern website. Discuss at least three design options, such as hiding content or layout changes, that would be beneficial for a responsiv ...

Link changes in unemployment inflation wages and gdp to one

Link changes in unemployment, inflation, wages, and GDP to one another and how they impacted each other during periods of economic decline (recessions) and periods of economic growth (expansion) over the past 10 years.

Show a schedule with two transactions that share a single

Show a schedule with two transactions that share a single data item (A in block BA) and that is not serializable, but that is equivalent to a serial schedule of the two transactions for certain initial values of A. Indic ...

Please do it in c programrevisit the matrix addition

Please do it in C program Revisit the matrix addition function Write the program so that it would not need to take a third pointer for the result matrix Instead, have the function return a pointer to the results matrix P ...

Question subject digital securitywhat do you think about

Question : Subject : Digital Security What do you think about how cloud makes DLP (Data Loss Prevention) more difficult? The response must be typed, single spaced, must be in times new roman font (size 12) and must follo ...

What is the types of cost fixed variable and marginal in

What is the types of cost: fixed, variable, and marginal in economics, and methods that market power alters the relationship between a firm's costs and the price at which it sells its product?

Question add referenceswrite 1 page that respond to the

Question: Add References Write 1 page that respond to the following questions with your thoughts, ideas, and comments. This will be the foundation for future discussions by your classmates. Be substantive and clear, and ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As