Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Java Expert


Home >> Java

COMP9318 (13S2) PROJECT

 

1. Objective

 In this project, you will implement and experiment with several algorithms to approximate historical item counts in a space efficient manner. Note that it will take you quite some time to complete this project even if you are familiar with Java programming and have good programming experience. Therefore, we earnestly recommend that you start working on this project as early as possible.

2. Background

This project will be based on the paper \Hokusai - Sketching Streams in Real Time" [MSA12].You are required to implement the two algorithms in the paper | the time aggregation and the item aggregation. Both the Count-Min sketch (denoted as cm-sketch in the rest

of this specification) [CM04] and the two algorithms have been introduced in the Data Stream Mining lecture slides; nevertheless, you are expected to read the corresponding sections of the paper [MSA12] for further details. Note that the paper has several typos and inconsistencies, and you need to _x them. See also Section 4.Finally, in addition to the implementation, you will also need to submit a report on corrections to the paper and ideas for possible improvement.

3. Task I: Implementation

 Your  first task is to write a Java class named Proj1, which receives six command line arguments:

  • DATAFILE: the file that contains the data stream items. Its format will be described
  • shortly.
  • QUERYFILE: the file that contains the queries. Its format will be described shortly.
  • METHOD: it is a string of either time or item.
  • WIDTH: it is the width of each cm-sketch.
  • DEPTH: it is the depth of each cm-sketch.
  • SEED: it is an integer that initiates the random generator in Java (i.e., java.util.Random).

Your program should use time or item aggregation as specified in the METHOD to process

the data stream in the DATAFILE, and then computes the estimates for each query specified

in the QUERYFILE; the cm-sketches constructed in your algorithms should use the parameter

WIDTH, DEPTH, and SEED.

2 DUE DATE: 23:59 15 OCT 2013 (TUE)

3.1. Data File Format. The DATAFILE consists of one or more lines. Each line species

the items and their frequencies in the current timestamp; the first line corresponds to

Timestamp 0 and the second line corresponds to Timestamp 1, and so on and so forth.

Each line consists of multiple fields separated by a white space character. Each field has

the format of ITEM:COUNT, where ITEM is a non-empty string, and COUNT is a non-negative

integer specifying the number of times the string ITEM has been observed at the current

timestamp.

3.2. Query File Format. The QUERYFILE consists of one or more lines. Each line consists

of three fields separated by a white space character. The three fields are:

  • a string specifying the query item (denoted as Q)
  • an integer specifying the start timestamp (denoted as t1)
  • an integer specifying the end timestamp (denoted as t2)

Hence each query asks for (t2 ?? t1 + 1) estimates of the total count of Qattimesamp

i, for t1 _ i _ t2.

3.3. Output Format. If there are m queries in the QUERYFILE, you need to output m

lines, where each line is the estimated numbers for each corresponding query. The estimated

numbers should be separated by a comma (i.e., ,); if the estimated number is a routing

number, you should round it down to an integer and display it as the estimated total count.

3.4. Hash Functions and Random Seed. We will source the hash functions used in

the cm-sketches from the following universal hashing function family:

h(x) = ((a _ x + b) mod p) mod w

where a 2 [1; p ?? 1] and b 2 [0; p ?? 1] are random integers (deterministically derived from

the given random seed)1, p is a large prime number, and w is the width of the cm-sketch.

To make sure we can reliably test your program, you have to use the SEED given from

the command line to initiate your random hash functions. The logic in the following code

should be used:

// p = 2147483647

Random r = new Random(SEED);

for (int i = 0; i < depth; ++i)

{

a[i] = 1 + r.nextInt(p-1);

b[i] = r.nextInt(p);

}

For example, if given the seed 1234, the first two hash functions should be:

h1(x) = ((1388524629 _ x + 557894633) mod p) mod w

h2(x) = ((2043025134 _ x + 509900220) mod p) mod w

1Check out \hashing integers" in https://en.wikipedia.org/wiki/Universal_hashing if interested.

COMP9318 (13S2) PROJECT 3

You perhaps need to beware certain care is needed as Java does not support unsigned datatypes.

To convert a string into an integer, we use String::hashCode() method.

4. Task II: Report

You need to write a report with the following components:

  • _ Present the correct version of Algorithms 2 and 3 in [MSA12]. You should follow the style and typesetting of the algorithms as much as possible while addressing the following goals:

Ø  For Algorithm 2: fix the error(s) in the algorithm, and also make sure there is no useless computation in the pseudo-code level.

Ø  For Algorithm 3: fix the algorithm so that it starts with timestamp 0 (rather than 1). Your corrected version should work as that shown in the lecture slides, i.e., it has two full-width cm-sketches, followed by two half-width, four quarter-width sketches, and so on and so forth.2

Describe the estimation algorithm for sketched obtained via time aggregation.

  • Describe at least one way to improve the methods to deliver a possibly more accurate solution to the problem. You should clearly describe (1) the problem(s) of the current solution(s), and (2) your proposed method.

5. Example

Let input.txt be

a:1 b:2 c:3

a:2 b:4 c:6

a:4 b:8 c:12

Let query.txt be

a 0 2

c 1 2

If we run your implementation as

java Proj1 input.txt query.txt item 16 2 1234

the output should be

1,2,4

6,12

6. Marking

 Your implementation will be auto-marked and contribute 70% of the final mark; the rest

30% comes from your report, with each component scoring 10%.

2This interpretation is exactly what the pseudocode of Algorithm 3 produces, except that it starts from timestamp 1. Do not modify the algorithm such that there is only one full-width cm-sketch.

4 DUE DATE: 23:59 15 OCT 2013 (TUE)

6.1. Auto marking Your Implementation. Auto-marking will be used to mark your

implementation on a Linux machine. Especially, we will

(1) Unbar the proj1.tar and run javac *.java to compile your .java _les.

(2) Run your program with some valid parameters and record your output and compare

it with sample output.

(3) Repeat Step b) several times, each with a deferent set of parameters.

Therefore, you need to make sure:

Your java program can be correctly compiled. Make sure you do not put some of

your java source files at a sub-directory.

  • Your java program can be run by java Proj1 DATAFILE QUERYFILE METHOD WIDTH DEPTH SEED. Make sure you do not use package for your code.
  • Your java program output results in a format exactly identical to that described in Section 3.3. Make sure you do not output does not contain other messages (e.g., debugging information).Be warned that you will receive 0 mark if your program cannot be compiled or run correctly in our testing environment (e.g., program exit due to some Exception).

7. Submission

Please tear your source codes and a report _le name proj1.pdf into a _le named proj1.tar. All your source codes must be in the same directory (i.e., no sub-directory allowed).

Your submission should contain at least a _le named Proj1.java. You are free to add any other Java classes.

You can submit your _le by giving cs9318 proj1 proj1.tar

Late Penalty. -10% for each of the first five days, and -20% for each of the following days.

Plagiarism. Make sure you read \8. Academic honesty and plagiarism" in http://www.cse.unsw.edu.au/~cs9318/13s2/intro.html

References

[CM04] Graham Cormode and S. Muthukrishnan. An improved data stream summary: The count-min

sketch and its applications. In LATIN, pages 29{38, 2004.

[MSA12] Sergiy Matusevych, Alexander J. Smola, and Amr Ahmed. Hokusai - sketching streams in real

time. In UAI, pages 594{603, 2012.

Java, Programming

  • Category:- Java
  • Reference No.:- M91592975

Have any Question?


Related Questions in Java

In ruby the hash class inherits from enumerable suggesting

In Ruby, the Hash class inherits from Enumerable, suggesting to a programmer that Hashes are collections. In Java, however, the Map classes are not part of the JCF (Java Collections Framework). For each language, provide ...

Assignment taskwrite a java console application that allows

Assignment task Write a java console application that allows the user to read, validate, store, display, sort and search data such as flight departure city (String), flight number (integer), flight distance (integer), fl ...

Can someone help me please with those question1what is the

Can someone help me please with those question 1:what is the best data type for student id datatime,currency,number,decimal 2:which relationshipis preferable? one to one,one to many,many to many 3:if you add table A's pr ...

Project descriptionwrite a java program to traverse a

Project Description: Write a java program to traverse a directory structure (DirWalker.java) of csv files that contain csv files with customer info. A simple sample in provided in with the sample code but you MUST will r ...

Assessment database and multithread programmingtasktask 1

Assessment: Database and Multithread Programming Task Task 1: Grade Processing University grading system maintains a database called "GradeProcessing" that contains number of tables to store, retrieve and manipulate stud ...

Assignment game prototypeoverviewfor this assessment task

Assignment: Game Prototype Overview For this assessment task you are expected to construct a prototype level/area as a "proof of concept" for the game that you have designed in Assignment 1. The prototype should function ...

Can someone kindly help me to consider whether java

Can someone kindly help me to consider whether Java provides the facility of operator overloading? If it does, may you kindly describe how overloading operators can be accomplished? If not, may you kindly describe why yo ...

Assessment instructionsin this assessment you will complete

Assessment Instructions In this assessment, you will complete the programming of two Java class methods in a console application that registers students for courses in a term of study. The application is written using th ...

Operating systems assignment -problem 1 sharing the bridgea

Operating Systems Assignment - Problem 1: Sharing the Bridge A new single lane bridge is constructed to connect the North Island of New Zealand to the South Island of New Zealand. Farmers from each island use the bridge ...

Assignment taskwrite a java console application that allows

Assignment task Write a java console application that allows the user to read, validate, store, display, sort and search data such as flight departure city (String), flight number (integer), flight distance (integer), fl ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As