Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask DBMS Expert


Home >> DBMS

Vector Model

This question requires you to use the following data. Assume a repository of 10 documents over eight key terms. Table 2.1 gives the document-term table that shows the raw frequencies with which the eight key terms appear 1 in each of the 10 documents, as well as the TF values for a query document.

Exercise 1. Using the information from Table 2.1, which documents would be returned by the following queries:

a) Term2 AND Term7
b) Term4 OR Term2
c) (Term2 OR Term7) AND (NOT Term7)

Table 2.1: A2: Document-Term and Query-Term Table

 

Term 1

Term 2

Term 3

Term 4

Term 5

Term 6

Term 7

Term 8

Doc 1

4

8

9

0

10

8

0

9

Doc 2

1

5

0

0

12

0

1

3

Doc 3

0

3

0

0

0

4

2

0

Doc 4

1

0

4

3

9

0

0

0

Doc 5

0

4

0

0

0

5

1

0

Doc 6

1

2

2

0

3

1

0

1

Doc7

0

5

3

4

0

0

4

2

Doc 8

0

7

0

3

0

0

3

3

Doc 9

0

5

0

0

0

4

1

2

Doc 10

0

3

4

0

0

2

4

0

Query

2

3

1

2

2

0

1

0

Is it possible to rank the documents returned in (a) to (c)? If it is possible, then supply the rankings in each case. If it is not possible, then state why.

Exercise 2. Answer the following questions.

a) Using the information from Table 2.1, calculate the ranking score for each of the ten documents based on each of the following query-document similarity measures:

dot product using TF weight for both documents and query vectors cosine coefficient using TF weight for both documents and query vectors.

b) Compare the rankings that you obtained using the two similarity measures. If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 3. Answer the following questions.

a) Using the information in Table 2.1, calculate the idf (inverse document frequency) weight vector. Make sure you show how your calculation was performed.

b) Construct a table similar Table 2.1, but, instead of raw term frequencies, show the tf-idf weights.
c) Using tf weights for the query vector, and tf-idf weights for the document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

d) How does this ranking compare with ranking obtained using the cosine similarity measure in Exercise 20? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 4. Answer the following questions.

a) This time, using tf-idf weights for both the query and document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

b) How does this ranking compare with ranking obtained in Exercise 21? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 5. This time, use tf weights for the query vector, tf-idf weights for the document vectors, and the Dice coefficient rather than the Cosine coefficient as the similarity measure.

a) Compute the ranking scores for all documents. Show how your calculations were performed for the first document only.

b) How does the ranking compare with ranking obtained in Exercise 21? If there are differences between the rankings, then discuss why you think these differences occurred.

2.2. IR Evaluation

Exercise 6. The following data displays retrieval results for two different algorithms (Algorithm 1 and Algorithm 2) in response to two distinct queries (Query 1 and Query 2). An expert has manually labelled each of the documents as being either relevant or not relevant to the queries.

Algorithm 1 Returns the following results:

Query 1 : d4 , d15 , d1 , d3 , d8 , d76 , d2 , d33 , d30 , d5 , d11 , d29 , d66 , d10
Query 2 : d9 , d91 , d2 , d87 , d13 , d52 , d92 , d16 , d17 , d22 , d20 , d71 , d48 , d60 , d56

Algorithm 2 Returns the following results:

Query 1 : d8 , d29 , d6 , d5 , d15 , d17 , d20 , d65 , d2 , d33 , d44 , d41 , d7 , d77 , d13 , d14 , d90 , d80 , d70 , d4
Query 2 : d3 , d87 , d2 , d28 , d15 , d14 , d12 , d10 , d41 , d11 , d85 , d89 , d1 , d49 , d52 , d76 , d55 , d9 , d91 , d99 , d30 , d17 , d13 , d26 , d94 , d18 , d86 , d72 , d48 , d8 , d93 , d42 , d79 , d43 , d88 , d7 , d98 , d51 , d50 , d6

Relevance The known one is as follows:

Query 1 : d2 , d4 , d7 , d15 , d29
Query 2 : d1 , d2 , d3 , d7 , d8 , d9 , d11 , d12 , d13 , d15 , d16 , d20

a) For Algorithm 1, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 1 (all three curves should be on a single chart).

b) For Algorithm 2, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 2 (all three curves should be on a single chart, but a separate chart from that used in part (a)).

c) Plot the averages for Algorithm 1 and Algorithm 2 on a separate chart, and compare the algorithms in terms of precision and recall. Do you think one of the algorithms is superior? Why?

DBMS, Programming

  • Category:- DBMS
  • Reference No.:- M91364731
  • Price:- $70

Priced at Now at $70, Verified Solution

Have any Question?


Related Questions in DBMS

Sql transactions exercisesconsider table itemnameprice

SQL Transactions Exercises Consider table Item(name,price) where name is a key, and the following two concurrent transactions. T1: Begin Transaction; Update Item Set price = 2*price Where name = 'pencil'; Insert Into Ite ...

Project outline and requirements provide a brief

Project Outline and Requirements Provide a brief description of the organization (can be hypothetical) that will be used as the basis for the projects in the course. Include company size, location(s), and other pertinent ...

Backgrounda new training organization called abc

Background A new training organization called ABC TechTraining is opening soon and they have approached you to help design their new database. They have just completed the refurbishment of the premises and are now lookin ...

Systems analysis project scenic routes operates a bus

Systems analysis project Scenic Routes operates a bus company that specializes in travelling on secondary roads, rather than Interstate highways. Their slogan is: "It Takes a Little Longer, But It's Scenic." The firm nee ...

Solve the following questions using oracle you are not

Solve the following questions using Oracle. You are not allowed to use the syntax of any DBMS other than Oracle. Make sure to upload an electronic copy of your solution to your CSC335 TRACE folder. Name the file hw4.sql. ...

Case study problem 1 the case study company has experienced

Case Study: Problem 1 The case study company has experienced rapid growth in both the size of its client base and also in the services provided to clients. Unfortunately, the growth in data management policies, procedure ...

Question lab 1 creating a database designthis assignment

Question: Lab 1: Creating a Database Design This assignment contains two (2) Sections: Database Design Diagram and Design Summary. You must submit both sections as separate files in order to complete this assignment. Not ...

In sql database questions phase-1 in 100 words what steps

In SQL Database Questions: Phase-1 In 100 words, what steps can one take to avoid losing work? Which command is used to save changes to the database? What is the syntax for this command? Phase-2 In 100 words, explain the ...

Question sql injection is in the top 10 owasp and common

Question : SQL Injection is in the top 10 OWASP and Common Weakness Enumeration. Using MySQL and PHP, show your own very short and simple application that is vulnerable to this attack. Provide another version that mitiga ...

A taking an unnormalised list describe how you would

(a) Taking an unnormalised list, describe how you would normalise it using the normal forms technique and show how the result of this method is used. (b) You are currently in the process of developing a RDBMS for a natio ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As