Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask DBMS Expert


Home >> DBMS

Q1.

Based on the data in the following table,

(1) estimate a Bernoulli Naive Bayes classifer (using the add-one smoothing)

(2) apply the classifier to the test document.

(3) estimate a multinomial Naive Bayes classifier (using the add-one smoothing)

(4) apply the classifier to the test document

You do not need to estimate parameters that you don't need for classifying the test document.


docID words in document Class = China?
training set 1  Taipei Taiwan  Yes

2 Macao Taiwan Shanghai Yes

3 Japan Sapporo NO

4 Sapporo Osaka Taiwan NO 
test set 5 Taiwan Taiwan Taiwan Sapporo Bangkok ?

Q2:

Algorithm 1: k-means(D, k)
Data: D is a dataset of n d-dimensional points; k is the number of clusters.
1 Initialize k centers C = [c1, c2, . . . , ck ];
2 canStop ← false;
3 while canStop = false do
4 Initialize k empty clusters tt = [g1, g2, . . . , gk ];
5 for each data point p ∈ D do
6 cx ← NearestCenter(p, C);
7 gcx .append(p);
8 for each group g ∈ G do
9 ci ← ComputeCenter(g);
10 return G;

Consider the (slightly incomplete) k-means clustering algorithm as depicted in Algorithm 1.

(1) Assume that the stopping criterion is till the algorithm converges to the final k clusters. Can you insert several lines of pseudo-code after Line 8 of the algorithm to implement this logic.

(2) The cost of k clusters

cost(g1, g2, . . . , gk) = ki=1 cost(gi)

where cost(gi) = ∑p∈gi dist(p, ci). dist() is the Euclidean distance. Now show that the cost of k clusters as evaluated at the end of each iteration (i.e., after Line 11 in the current algorithm) never increases. (You may assume d = 2)

(3) Prove that the cost of clusters obtained by k-means algorithm always converges to a local minima. (Hint: you can make use of the previous conclusion even if you have not proved it).

Q3. Consider the given similarity matrix. You are asked to perform group average hierarchical clustering on this dataset.

You need to show the steps and final result of the clustering algorithm. You will show the final results by drawing a dendrogram. The dendrogram should clearly show the order in which the points are merged.

 

p1

p2

p3

p4

p5

p1

1.00

0.10

0.41

0.55

0.35

p2

0.10

1.00

0.64

0.47

0.98

p3

0.41

0.64

1.00

0.44

0.85

p4

0.55

0.47

0.44

1.00

0.76

p5

0.35

0.98

0.85

0.76

1.00

Q4. Play several rounds of the Akinator game at http://au.akinator.com/.

(1) It is not uncommon that users may give completely or partially wrong answers during a game. Assume the site maintains a large table, where each row is about a person, and each column is a Boolean-type question, and each cell value is the correct answer ("Yes" or "No"), and that the core algorithm the site uses is a decision tree. To accommodate possible errors, let's assume the site allows up to one error in a game. That is, a person will still be a candidate if at most one question answer the user provided does not match the correct answer in the data table. Now describe how you will modify the ID3 decision tree construction algorithm to build a decision tree for the site while allowing up to one error in a game.

(2) Assume that you do not think the site uses decision trees as the backbone algo- rithm. What are the reason(s) to support this conjecture? You may list more than one reason. If you design some experiments and will refer to them, please include the setup and the details of the experiments (e.g., something like Figure 1)

Q5.

We consider the linear counting estimator that estimates the number of distinct elements in a data stream. Using this as a building block, we shall derive methods to estimate the number of distinct elements after some common set operations on several data streams.

Let S1 and S2 be two data streams1, and C(Si) be the linear counting estimator for Si using the same hash function h() and same length of bit array (i.e., using m bits and the bit array is denoted as C(Si).B).

(1)Prove that C(S1 ∪ S2) = C(S1) ∨ C(S2). Here ∪ is the multiset union operator, and the ∨ operator on two linear counting estimators C1 and C2 returns a new estimator (with the same hash function) with a m-bit bit array where its j-th entry is the result of bitwise OR of the corresponding bits in C1 and C2, i.e., C1.B[j] | C2.B[j]. (2)Prove that C(S1 ∩ S2) ≠ C(S1) ∧ C(S2). Here ∩ is the multiset intersection operator, and the ∧ operator is defined similar to ∨ except that we use bitwise AND instead of bitwise OR, i.e., C1.B[j] & C2.B[j].

(3) Derive a method to estimate the number of distinct elements in S1 ∩ S2, based only on linear counting estimators.

DBMS, Programming

  • Category:- DBMS
  • Reference No.:- M91596806
  • Price:- $140

Guranteed 48 Hours Delivery, In Price:- $140

Have any Question?


Related Questions in DBMS

Project outline and requirements provide a brief

Project Outline and Requirements Provide a brief description of the organization (can be hypothetical) that will be used as the basis for the projects in the course. Include company size, location(s), and other pertinent ...

Question suppose we have two kinds of doctors hospital

Question : Suppose we have two kinds of doctors: hospital doctors and family physicians. In addition to the doctor's id number, name, specialty, and years of experience, we want to record the hospital name for the hospit ...

Case study problem 1 the case study company has experienced

Case Study: Problem 1 The case study company has experienced rapid growth in both the size of its client base and also in the services provided to clients. Unfortunately, the growth in data management policies, procedure ...

Assignmenta restaurant is designing a database to keep

Assignment A restaurant is designing a database to keep track of customer services. A customer is defined as a customer ID, name, address and a telephone number. Customers are served by employees. Each employee is define ...

Sql injection on a searchthe way search performs its task

SQL injection on a search The way Search performs its task is by executing the following query (in a php script):          $var=stripslashes($_POST['search']);          $query = "SELECT username from lab1_login where use ...

Backgrounda new training organization called abc

Background A new training organization called ABC TechTraining is opening soon and they have approached you to help design their new database. They have just completed the refurbishment of the premises and are now lookin ...

Quesiton 1 what is data-manipulation language dml there are

Quesiton: 1. What is Data-Manipulation Language (DML)? There are four types of access in DML, explain each one. 2. Assume we have a Library Database consists of the following relations: author(author_id, first_name, last ...

Question lab 1 creating a database designthis assignment

Question: Lab 1: Creating a Database Design This assignment contains two (2) Sections: Database Design Diagram and Design Summary. You must submit both sections as separate files in order to complete this assignment. Not ...

Assignmentqueries functions and triggersdatabase

Assignment Queries, Functions and Triggers Database Systems Aims The aims of this assignment are to: formulate SQL queries; populate an RDBMS with a real dataset, and analyse the data; design test data for testing SQL qu ...

Question create the physical data model for the logical

Question: Create the physical data model for the logical data model that you submitted in IP3. This should include all of the data definition language SQL. Your submission should include all DDL needed to: Create the tab ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As