Ask Question, Ask an Expert

+1-415-315-9853

info@mywordsolution.com

Ask Computer Engineering Expert

Huffman Coding based Compression Library
Background

Huffman code is used to compress data file, where the data is represented as a sequence of characters. Huffman's greedy algorithm uses a table giving how often each character occurs; it then uses this table to build up an optimal way of representing each character as a binary string. We call the binary string the codeword for that character. A property of Huffman code is that it is a prefix code, i.e., in Huffman coding, no codeword is a prefix of some other codeword. The advantage of prefix code is that it makes decoding easier, as we do not need to use delimiter between two successive codewords. Given the frequency of each of the character, we can devise a greedy algorithm for finding the optimal Huffman codeword of each of the characters. For details of the greedy algorithm, INTRODUCTION TO ALGORITHMS 3rd edition by CORMEN please read Section 16.3 of the textbook.

In this assignment, we will build a compression library that compress text files using Huffman coding scheme. This library will have two programs: compress, and decompress; compress accepts a text file and produces a compressed representation of that text file; decompress accepts a file that was compressed with the compress program, and recovers the original file.

Implementation

Input to the compress is a text le with arbitrary size, but for this assignment we will sippose that data structure of the file fits in the main memory of a computer. Output of the program is a compressed representation of the original file. You will have to save the codetable in the header of the compressed file, so that you could use the codetable for decompressing the compressed file. Input to the decompress is a compressed file, from that the program recovers the original file. For sanity check, you must have a specific magic word at some position in the header of the compressed file, so that decompress can recognize whether the given file is a valid Huffman compressed file.

You must pay attention to the following issues:

The file that we would use for testing can be very large, having size in Gigabytes, so make sure that your program is bug-free and it works for large input file.

prepare efficient algorithm, we will take off as much as 20 points if we feel that the program is taking unusually long time.

You must make sure that your program runs on a Linux Machine, and identically follows the formatting instructions. For formatting error, as much as 15 points can be taken off .

You should provide a Make file to compile your programs. Also, a README.txt file must be provided that will have the instruction to compile and run the programs.

Command-Line options

Compression:

C++:  ./compress  -f  myfile.txt  [-o  myfile.hzip  -s

Java:  sh  compress.sh  -f  myfile.txt  [-o  myfile.hzip  -s]

Decompression:

C++:  ./decompress  -f  myfile.hzip[-r  -v]

Java:  sh  decompress.sh  -f  myfile.hzip  [-o  myfile.txt  -s]

The command-line options that are within the square bracket are optional. The option \-f" precedes the input le name, which always has a .txt extension. The \-s" option prints statistics, such as for compression it prints, how many distinct characters are there, what is the compression ratio, and the wall clock time that it took for performing the compression task. For decompression, it prints how many character were written, and the wall clock time it took for performing the decompression task. The \-o" option precedes the name of an output le. If the output file name is not given, then we will append .hzip at the end of the input filename to create the output filename.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M91171

Have any Question? 


Related Questions in Computer Engineering

How does the excess-x representation for exponents of the

How does the excess-x representation for exponents of the scale factor in the floating point number representation of Figure 9.26a facilitate the comparison of the relative sizes of two floating-point numbers? (Hint: Ass ...

Explain the relationships among the following words system

Explain the relationships among the following words: system, environment, boundary, and interface.

Questionconsider the following normalized relations from a

Question: Consider the following normalized relations from a database in a large retail chain: STORE  ( Store ID,   Region, ManagerID, Square Feet) EMPLOYEE ( EmployeelD .  WhereWork, EmployeeName, EmployeeAddress) DEPAR ...

Design an efficient method that performs effective natiive

Design an efficient method that performs effective natiive Bayesian classification over an infinite data stream (i.e., you can scan the data stream only once). If we wanted to discover the evolution of such classificatio ...

Refer to example 440 an urn contains nine red balls nine

Refer to Example 4.40. An urn contains nine red balls, nine white balls, and nine blue balls, and sample of seven balls is drawn at random without replacement. Compute the probability that all of the balls in the sample ...

Suppose we have a stream of tuples with the schemaassume

Suppose we have a stream of tuples with the schema Assume universities are unique, but a course ID is unique only within a university (i.e., different universities may have different courses with the same ID, e.g., "CS10 ...

What are the big-o performance estimates for those

What are the Big-O performance estimates for those algorithms? When is a linked list a good representation of a stack or a queue?

Michael davis defines professions as follows a profession

Michael Davis defines professions as follows: "A profession is a number of individuals In the same occupation voluntarily organized to earn a living by openly serving a certain moral ideal in a morally permissible way be ...

Derive the transfer function h z of the lti-dt system

Derive the transfer function H (z) of the LTI-DT system described by the circuit given in Figure 2.9. Obtain the difference equation relating the input x(n) to the output y(n).

Anderson your textbook lists a number of considerations in

Anderson (your textbook) lists a number of considerations in choosing the right intervention strategy. In your view, how might you prioritize these? What do you think are the most important considerations in choosing whi ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

A cola-dispensing machine is set to dispense 9 ounces of

A cola-dispensing machine is set to dispense 9 ounces of cola per cup, with a standard deviation of 1.0 ounce. The manuf

What is marketingbullwhat is marketing think back to your

What is Marketing? • "What is marketing"? Think back to your impressions before you started this class versus how you

Question -your client david smith runs a small it

QUESTION - Your client, David Smith runs a small IT consulting business specialising in computer software and techno

Inspection of a random sample of 22 aircraft showed that 15

Inspection of a random sample of 22 aircraft showed that 15 needed repairs to fix a wiring problem that might compromise

Effective hrmquestionhow can an effective hrm system help

Effective HRM Question How can an effective HRM system help facilitate the achievement of an organization's strate