Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Huffman Coding based Compression Library
Background

Huffman code is used to compress data file, where the data is represented as a sequence of characters. Huffman's greedy algorithm uses a table giving how often each character occurs; it then uses this table to build up an optimal way of representing each character as a binary string. We call the binary string the codeword for that character. A property of Huffman code is that it is a prefix code, i.e., in Huffman coding, no codeword is a prefix of some other codeword. The advantage of prefix code is that it makes decoding easier, as we do not need to use delimiter between two successive codewords. Given the frequency of each of the character, we can devise a greedy algorithm for finding the optimal Huffman codeword of each of the characters. For details of the greedy algorithm, INTRODUCTION TO ALGORITHMS 3rd edition by CORMEN please read Section 16.3 of the textbook.

In this assignment, we will build a compression library that compress text files using Huffman coding scheme. This library will have two programs: compress, and decompress; compress accepts a text file and produces a compressed representation of that text file; decompress accepts a file that was compressed with the compress program, and recovers the original file.

Implementation

Input to the compress is a text le with arbitrary size, but for this assignment we will sippose that data structure of the file fits in the main memory of a computer. Output of the program is a compressed representation of the original file. You will have to save the codetable in the header of the compressed file, so that you could use the codetable for decompressing the compressed file. Input to the decompress is a compressed file, from that the program recovers the original file. For sanity check, you must have a specific magic word at some position in the header of the compressed file, so that decompress can recognize whether the given file is a valid Huffman compressed file.

You must pay attention to the following issues:

The file that we would use for testing can be very large, having size in Gigabytes, so make sure that your program is bug-free and it works for large input file.

prepare efficient algorithm, we will take off as much as 20 points if we feel that the program is taking unusually long time.

You must make sure that your program runs on a Linux Machine, and identically follows the formatting instructions. For formatting error, as much as 15 points can be taken off .

You should provide a Make file to compile your programs. Also, a README.txt file must be provided that will have the instruction to compile and run the programs.

Command-Line options

Compression:

C++:  ./compress  -f  myfile.txt  [-o  myfile.hzip  -s

Java:  sh  compress.sh  -f  myfile.txt  [-o  myfile.hzip  -s]

Decompression:

C++:  ./decompress  -f  myfile.hzip[-r  -v]

Java:  sh  decompress.sh  -f  myfile.hzip  [-o  myfile.txt  -s]

The command-line options that are within the square bracket are optional. The option \-f" precedes the input le name, which always has a .txt extension. The \-s" option prints statistics, such as for compression it prints, how many distinct characters are there, what is the compression ratio, and the wall clock time that it took for performing the compression task. For decompression, it prints how many character were written, and the wall clock time it took for performing the decompression task. The \-o" option precedes the name of an output le. If the output file name is not given, then we will append .hzip at the end of the input filename to create the output filename.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M91171

Have any Question?


Related Questions in Computer Engineering

Answer the following question research management in a

Answer the following Question : Research management in a field and compare it with the field in software. Compare the management techniques with that field you chose to software project management. The response must be t ...

San juan sailboat charters sjsbc is an agency that leases

San Juan Sailboat Charters (SJSBC) is an agency that leases (charters) sailboats. SJSBC does not own the boats. Instead, SJSBC leases boats on behalf of boat owners who want to earn income from their boats when they are ...

You are the senior consultant at abacus consulting tasked

You are the Senior Consultant at Abacus Consulting, tasked with the database project for Amadeus Real Estate client. The company employs real estate agents who work with customers to buy and sell properties (both residen ...

Question what is a ipsec ssl vpn dtls dmarc pki pem ssh

Question : What is a( IPSEC, SSL , VPN, DTLS , DMARC, PKI, PEM, SSH, Kerberos, DKIM) ?. Brifley and answer the following brief. Identify the security problems How the security protocol was used to solve the problems OR e ...

The contracts manager at a company needs to make a large

The contracts manager at a company needs to make a large legal document available to an overseas customer. However, she has some challenges: The document contains sensitive information; it is too large to send via e-mail ...

A study was conducted of long beach school district schools

A study was conducted of Long Beach School District schools regarding how many require school uniforms. In 2006, of the 296 schools questioned, 184 said they required s school uniforms. (Gentile & Imberman, 2009) Find th ...

Start your c development tool and view the swatthebugs16cpp

Start your C++ development tool and view the SwatTheBugs16.cpp file. The file is contained in either the Cpp7\Chap05\Swat TheBugs16 Project folder or the Cpp7\Chap05 folder. (Depending on your C++ development tool, you m ...

Assignmentsuppose you are asked to write a program to count

Assignment Suppose you are asked to write a program to count the frequency of occurrence of each word in a document. Describe how you would implement your program using: 1. A hash table to store words and their frequenci ...

What are some breakthrough events in the evolution of

What are some breakthrough events in the evolution of elearning?

Assume a normal distribution for n 300 how many cases

Assume a normal distribution for N = 300. How many cases would one expect to find between +1 and -1 standard deviations around the mean?

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As