Ask Question, Ask an Expert

+1-415-315-9853

info@mywordsolution.com

Ask Computer Engineering Expert

Huffman Coding based Compression Library
Background

Huffman code is used to compress data file, where the data is represented as a sequence of characters. Huffman's greedy algorithm uses a table giving how often each character occurs; it then uses this table to build up an optimal way of representing each character as a binary string. We call the binary string the codeword for that character. A property of Huffman code is that it is a prefix code, i.e., in Huffman coding, no codeword is a prefix of some other codeword. The advantage of prefix code is that it makes decoding easier, as we do not need to use delimiter between two successive codewords. Given the frequency of each of the character, we can devise a greedy algorithm for finding the optimal Huffman codeword of each of the characters. For details of the greedy algorithm, INTRODUCTION TO ALGORITHMS 3rd edition by CORMEN please read Section 16.3 of the textbook.

In this assignment, we will build a compression library that compress text files using Huffman coding scheme. This library will have two programs: compress, and decompress; compress accepts a text file and produces a compressed representation of that text file; decompress accepts a file that was compressed with the compress program, and recovers the original file.

Implementation

Input to the compress is a text le with arbitrary size, but for this assignment we will sippose that data structure of the file fits in the main memory of a computer. Output of the program is a compressed representation of the original file. You will have to save the codetable in the header of the compressed file, so that you could use the codetable for decompressing the compressed file. Input to the decompress is a compressed file, from that the program recovers the original file. For sanity check, you must have a specific magic word at some position in the header of the compressed file, so that decompress can recognize whether the given file is a valid Huffman compressed file.

You must pay attention to the following issues:

The file that we would use for testing can be very large, having size in Gigabytes, so make sure that your program is bug-free and it works for large input file.

prepare efficient algorithm, we will take off as much as 20 points if we feel that the program is taking unusually long time.

You must make sure that your program runs on a Linux Machine, and identically follows the formatting instructions. For formatting error, as much as 15 points can be taken off .

You should provide a Make file to compile your programs. Also, a README.txt file must be provided that will have the instruction to compile and run the programs.

Command-Line options

Compression:

C++:  ./compress  -f  myfile.txt  [-o  myfile.hzip  -s

Java:  sh  compress.sh  -f  myfile.txt  [-o  myfile.hzip  -s]

Decompression:

C++:  ./decompress  -f  myfile.hzip[-r  -v]

Java:  sh  decompress.sh  -f  myfile.hzip  [-o  myfile.txt  -s]

The command-line options that are within the square bracket are optional. The option \-f" precedes the input le name, which always has a .txt extension. The \-s" option prints statistics, such as for compression it prints, how many distinct characters are there, what is the compression ratio, and the wall clock time that it took for performing the compression task. For decompression, it prints how many character were written, and the wall clock time it took for performing the decompression task. The \-o" option precedes the name of an output le. If the output file name is not given, then we will append .hzip at the end of the input filename to create the output filename.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M91171

Have any Question? 


Related Questions in Computer Engineering

Design an algorithm that compares a random and sorted array

Design an algorithm that compares a random and sorted array and establishes the average distance that elements must travel in moving from random to sorted order.

Equation 1374 tells us on average how many vertices are a

Equation (13.74) tells us on average how many vertices are a distance d away from a given vertex. a) Assuming that this expression works for all values of d (which is only a rough approximation to the truth), at what val ...

Identify the different types of problem insurance claims

Identify the different types of problem insurance claims. Tell why you think a particular claim can be a "headache".Discuss follow-up techniques for tracking unpaid insurance claims

Another form of comb filter is given by the transfer

Another form of comb filter is given by the transfer function H(z)=1+ z -L For this problem, let L = 6. (a) Find the filter's impulse response h[n]. (b) Determine a canonical realization of this filter. (c) Find and plot ...

Write and test a program that extracts postfix expressions

Write and test a program that extracts postfix expressions from the user, evaluates the expression, and prints the results. You may require that the user enter numeric values along with the operators and that each compon ...

1 what is a policy how is it different from a law2 what are

1. What is a policy? How is it different from a law? 2. What are the three general categories of unethical and illegal behavior? 3. What is the best method for preventing an illegal or unethical activity?

Assume that the variables f g h i j are assigned to

Assume that the variables f, g, h, i, j are assigned to registers $s0 through $s4. Assume that the base address of the A and B are in the registers $s6 and $s7. What would the MIPS assembly code be for the following C st ...

Repeat prob 91-8 using the complex signalprob 91-8consider

Repeat Prob. 9.1-8 using the complex signal Prob. 9.1-8 Consider a complex signal composed of a dc term and two complex exponentials Plot each N-point DFT as a function of frequency fk = k/N. (a) Compute and plot the DFT ...

In the fourth lab we investigate smtp protocol in action we

In the fourth lab, we investigate SMTP protocol in action. We send an e-mail and, using Wireshark, we investigate the contents and the format of the SMTP packet exchanged between the client and the server. We check that ...

1 what is the relationship between d-amps and amps2 what is

1. What is the relationship between D-AMPS and AMPS? 2. What is GSM? 3. What is the function of the CDMA in IS-95? 4. What are the three types of orbits? 5. Which type of orbit does a GEO satellite have? Explain your ans ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

WalMart Identification of theory and critical discussion

Drawing on the prescribed text and/or relevant academic literature, produce a paper which discusses the nature of group

Section onea in an atwood machine suppose two objects of

SECTION ONE (a) In an Atwood Machine, suppose two objects of unequal mass are hung vertically over a frictionless

Part 1you work in hr for a company that operates a factory

Part 1: You work in HR for a company that operates a factory manufacturing fiberglass. There are several hundred empl

Details on advanced accounting paperthis paper is intended

DETAILS ON ADVANCED ACCOUNTING PAPER This paper is intended for students to apply the theoretical knowledge around ac

Create a provider database and related reports and queries

Create a provider database and related reports and queries to capture contact information for potential PC component pro