Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

The basic task is to measure the similarity between any two files in our collection. To do this, we will need a suitable universe of words. This will consist of all words in the collection that are (a) more than four letters long, (b) don't occur more than 20 times overall, and (c) don't occur in more than 7 files in the collection. Now we constructor a vector (in the mathematical sense) corresponding to each file. The vector will have as many coordinates as words in the universe -- so there is one coordinate for each word in the universe. If a word occurs in the file, the corresponding coordinate is 1, otherwise it is 0. 

Let us give an example: suppose the universe consists of the five words: apple, grapes, banana, doctor, program. Suppose file1 contains: apple, banana, program. Then the vector for file1 is (1,0,1,0,1). 

We need to normalize each of the vectors so that it has unit length. So each coordinate in the above vector gets divided by the square root of 3. 

The similarity of two files is defined to be the scalar product of the corresponding two vectors. The scalar product of two vectors is obtained by multiplying corresponding components and adding. For example, the scalar product of (2,1,3) and (0,5,6) is 2 * 0 + 1 * 5 + 3 * 6. 

Your task is to write a program that prints the names of the two files with the highest similarity among the files in the collection, and the names of the two files with the lowest similarity 

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M9651524

Have any Question?


Related Questions in Computer Engineering

Question describe and discuss at least two backup

Question: Describe and discuss at least two backup strategies. Discuss the use of cloud backup strategies, as well? The response must be typed, single spaced, must be in times new roman font (size 12) and must follow the ...

Suppose a bowl has 9 chips one chip is labeled 1 three

Suppose a bowl has 9 chips. One chip is labeled "1", three chips are labeled "3", and five chips are labeled "5". Suppose two chips are selected at random with replacement. Let the random variable X equal the absolute di ...

Question write a 1-2 page paper in apa format describe how

Question: Write a 1-2 page paper in APA format. Describe how Windows Active Directory enhances security in your organization. What, if anything, could be improved? This project is due On Saturday morning 9/22. The respon ...

How does a java server page uses the client-server model to

How does a Java Server Page uses the client-server model to make a Web page interactive?

Nfs allows the file system on one linux computer to be

NFS allows the file system on one Linux computer to be accessed over a network connection by another Linux system. Discuss the security vulnerabilities of NFS in networked Linux systems, and possible mitigation solutions ...

Fully explain at least one reason why many developing

Fully explain at least one reason why many developing countries suffered serious debt crisis in the early 1980s. Does this reason you explained in debt support Krueger & Srinivasan's argument? Why or why not? How could t ...

Solve the following one 1 self-check problems on paper and

Solve the following one (1) Self-Check problems on paper and bring your sheet of paper to your section on Thursday: Write a function called grades that takes a list of tuples as a parameter. Each tuple in this list conta ...

Is smartart graphic and table slide important for

Is Smartart graphic and Table slide important for PowerPoint Presentation? How would it benefit?

A substring is a contiguous sequence of characters from a

A substring is a contiguous sequence of characters from a string. For example, "cde" is a substring of the string "abcdefg". We say that substring s1 is duplicated in string s if s1 shows us in s at least two times, with ...

Question show the dynamic programming table of the longest

Question : Show the dynamic programming table of the longest common subsequence problem for two sequences: S1 = ABAABBA and S2 = BAAABAB. Also show how to find the LCS itself from the table.

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As