Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

1. Objective

In this project, you will implement and experiment with several algorithms to approxi- mate historical item counts in a space-efficient manner.

Note that it will take you quite some time to complete this project even if you are familiar with Java programming and have good programming experience. Therefore, we earnestly recommend that you start working on this project as early as possible.

2. Background

This project will be based on the paper "Hokusai - Sketching Streams in Real Time".

You are required to implement the two algorithms in the paper - the time aggregation and the item aggregation. Both the Count-Min sketch (denoted as cm-sketch in the rest of this specification) [CM04] and the two algorithms have been introduced in the Data Stream Mining lecture slides; nevertheless, you are exepcted to read the corresponding sections of the paper [MSA12] for further details. Note that the paper has several typos and inconsistencies, and you need to fix them. See also Section 4.

Finally, in addition to the implementation, you will also need to submit a report on corrections to the paper and ideas for possible improvement.

3. Task I: Implementation

Your first task is to write a Java class named Proj1, which receives six command line arguments:
• DATAFILE: the file that contains the data stream items. Its format will be described shortly.
• QUERYFILE: the file that contains the queries. Its format will be described shortly.
• METHOD: it is a string of either time or item.
• WIDTH: it is the width of each cm-sketch.
• DEPTH: it is the depth of each cm-sketch.
• SEED: it is an integer that initiates the random generator in Java (i.e., java.util.Random).

Your program should use time or item aggregation as specified in the METHOD to process the data stream in the DATAFILE, and then computes the estimates for each query specified in the QUERYFILE; the cm-sketches constructed in your algorithms should use the parameter WIDTH, DEPTH, and SEED.

Query File Format. The QUERYFILE consists of one or more lines. Each line consists of three fields separated by a white space character.

The three fields are:

• a string specifying the query item (denoted as Q)
• an integer specifying the start timestamp (denoted as t1)
• an integer specifying the end timestamp (denoted as t2)
Hence each query asks for (t2 - t1 + 1) estimates of the total count of Q at timestamp i, for t1 ≤ i ≤ t2.

Output Format. If there are m queries in the QUERYFILE, you need to output m lines, where each line is the estimated numbers for each corresponding query. The estimated numbers should be separated by a comma (i.e., ,); if the estimated number is a floating number, you should round it down to an integer and display it as the estimated total count.

Hash Functions and Random Seed. We will source the hash functions used in the cm-sketches from the following universal hashing function family:

h(x) = ((a • x + b) mod p) mod w where a ∈ [1, p - 1] and b ∈ [0, p - 1] are random integers (deterministically derived from the given random seed)1, p is a large prime number, and w is the width of the cm-sketch. To make sure we can reliably test your program, you have to use the SEED given from the command line to initiate your random hash functions. The logic in the following code should be used:

// p = 2147483647

Random r = new Random(SEED); for (int i = 0; i < depth; ++i)
{
a[i] = 1 + r.nextInt(p-1);
b[i] = r.nextInt(p);
}
For example, if given the seed 1234, the first two hash functions should be:
h1(x) = ((1388524629 • x + 557894633) mod p) mod w h2(x) = ((2043025134 • x + 509900220) mod p) mod w

You perhaps need to beware certain care is needed as Java does not support unsigned data types.

To convert a string into an integer, we use String::hashCode() method.

4. Task II: Report

You need to write a report with the following components:

• Present the correct version of Algorithms 2 and 3 in [MSA12]. You should follow the style and typesetting of the algorithms as much as possible, while addressing the following goals:

- For Algorithm 2: fix the error(s) in the algorithm, and also make sure there is no useless computation in the pseudo-code level.

- For Algorithm 3: fix the algorithm so that it starts with timestamp 0 (rather than 1). Your corrected version should work as that shown in the lecture slides, i.e., it has two full-width cm-sketches, followed by two half-width, four quarter-width sketches, and so on and so forth.2

• Describe the estimation algorithm for sketched obtained via time aggregation.

• Describe at least one way to improve the methods to deliver a possible more accurate solution to the problem. You should clearly describe (1) the problem(s) of the current solution(s), and (2) your proposed method.

Automarking Your Implementation. Auto-marking will be used to mark your implementation on a linux machine. Specifically, we will

(1) Untar the proj1.tar and run javac *.java to compile your .java files.

(2) Run your program with some valid parameters and record your output and compare it with sample output.

(3) Repeat Step b) several times, each with a different set of parameters.

Therefore, you need to make sure:

• Your java program can be correctly compiled. Make sure you do not put some of your java source files at a sub-directory.

• Your java program can be run by java Proj1 DATAFILE QUERYFILE METHOD WIDTH DEPTH SEED. Make sure you do not use package for your code.

• Your java program output results in a format exactly identical to that described in Section 3.3. Make sure you do not output does not contain other messages (e.g., debugging information).

Be warned that you will recieve 0 mark if your program cannot be compiled or run correctly in our testing environment (e.g., program exit due to some Exception).

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M91597006
  • Price:- $120

Guranteed 48 Hours Delivery, In Price:- $120

Have any Question?


Related Questions in Computer Engineering

Quesiton an important principle in information security is

Quesiton: An important principle in information security is the concept of layers of security, which is often referred to as layered security, or defense in depth. 1) Please explain the concept of layers of security. 2) ...

Question goal setting please respond to the

Question: "Goal Setting" Please respond to the following: • Early computers were only usable by experts with strong technical knowledge. Examine how interactive systems have changed throughout the years to accommodate av ...

Fully explain at least one reason why many developing

Fully explain at least one reason why many developing countries suffered serious debt crisis in the early 1980s. Does this reason you explained in debt support Krueger & Srinivasan's argument? Why or why not? How could t ...

A chemistry student needsnbsp600 mlnbspof carbon

A chemistry student needs 60.0 mL of carbon tetrachloride for an experiment. By consulting the  CRC Handbook of Chemistry and Physics , the student discovers that the density of carbon tetrachloride is 1.59 g.cm^-3. Calc ...

If unemployment rate is 55 and underemployed unemployed and

If unemployment rate is 5.5% and underemployed, unemployed and discouraged workers is 8.4%. What is % of underemployed and discouraged. Is it as easy as just 8.4-5.5?

Explain how the following industries should adapt their

Explain how the following industries should adapt their businesses to the ever expanding use of social networks and mobile computing (smart phones, tablet computers, etc.): 1) Media and Entertainment, 2) Department store ...

Regional blocs like the eu are straining the british have

Regional blocs like the EU are straining. The British have voted to Brexit! What has caused the tension and what does the future hold - for Brexit and beyond?

Describe a ping of death attack as an attack that causes

Describe a ping of death attack as an attack that causes the victim computer to freeze and malfunction.

Suppose circle is a class derived from the class point now

Suppose, Circle is a class derived from the class Point. Now consider the following statements: Point p[] = new Point[12]; P[0] = new Circle(); Now suppose Circle has a method Area() computing are of a circle, and Point ...

Security and network discussion questionsa discuss the pros

Security and Network Discussion Questions a) Discuss the pros and cons of an organization regularly engaging in penetration testing. b) What are the motivations of the ethical hacker?

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As