Ask Computer Engineering Expert

Programming Project Assignment: Data Compression using Huffman Codes

Introduction

This assignment focuses on building a lossless data compression algorithm using Huffman Codes. In 1952, David Huffman developed a way to find a set of optimal prefix codes for a given set of input symbols. Essentially, Huffman Codes are variable-length bit strings which uniquely represent a set of input symbols. The main benefit offered by this scheme is that the more common the input symbol, the fewer the number of bits that are required to represent it. Clearly, this is advantageous when compressing data.

Compressing ASCII Text using Huffman Codes

The algorithm always begins with the source set of symbols to be compressed. For this assignment, we will focus only on using ASCII text but the algorithm is applicable to any set of symbols. Below is an example source string to be compressed:

to be or not to be

The first step is find the frequency of each letter in the source text. These symbol frequencies are a fundamental aspect of the algorithm. Table 1 gives a summary of these frequencies for this example. One approach to compressing this data is to use a block encoding representation that uses the minimum number of bits that permit all the letters to be encoded, i.e. 3 bits per letter for this example. Since there are 18 characters in this string, 54bits (3 x 18) would be required to represent this sentence using block encoding. Using the Huffman Codes shown in Table 2, the same sentence can be represented with just 47bits - a 13% reduction in size.

Letter

Freq

Space

5

o

4

t

3

b

2

e

2

n

1

r

1

Table 1 - Letter Frequencies

The second step of the algorithm is to construct a binary tree using the data in Table 1. The pseudo-code below describes a way to build a binary tree. It makes use of a priority queue to construct the tree, i.e. in this kind of data structure items with higher priority are placed at the front of the queue. In this case the letter frequency is used as the priority indicator with lower letter frequencies having higher priority.

CREATE leaf node for each letter

ADD all leaf nodes to priority queue

LOOP WHILE more than one node in priority queue REMOVE two nodes from priority queue*

CREATE new node with two removed nodes as children

SET new node's priority to sum of two child node priorities ADD new node to priority queue

END LOOP

REMOVE remaining node from priority queue SET node as root

*when more than two items have the same priority, any node pair is permitted

The third step of the algorithm is to traverse the constructed binary tree to build Huffman Codes for each letter. To do this all links within the tree must be labelled 0 or 1. By convention, each left child link is a 0 and each right child link is a 1. Figure 1 gives an example binary tree for the example string.

1887_Figure-1–Example-Binary-Tree.jpg
Figure 1 - Example Binary Tree

Each Huffman Code is found by traversing the tree from the root node to the letter in question. Table 2 shows the result of traversing the tree for each letter in the example string.

Letter

Huffman Code

Space

00

o

11

t

011

b

010

e

100

n

1010

r

1011

Table 2 - Huffman Codes

Assignment Tasks

There are 8 tasks to complete for this assignment worth a total of 100 marks. Each task needs to be completed before the next one can be attempted (except the first task which can be completed at any time). You have been provided with a sign-off sheet to be completed during tutorial sessions by your tutor. Your tutor may give you partial marks for any incomplete tasks and so please try to complete as many as possible. It is therefore essential that you attend the remaining tutorial sessions. After you have completed a task you must ask a tutor to sign it off on your sheet.

NOTE: You should take extra care to keep your sign-off sheet safe because you will need to provide a scan or photograph of it when you submit your assignment to Blackboard.

Task 1

Huffman Coding is one example of a lossless compression algorithm. Write a short 500-750 word academic report describing the similarities and differences between lossless and lossy compression algorithms and under what circumstances you might favour one kind of compression algorithm over the other. Your report should include brief descriptions of one example algorithm that is used for each kind of compression (excluding Huffman Coding). You should try to use diagrams and illustrations where applicable to help your explanations of your chosen compression algorithms. Marks will be given for good UWE Harvard referencing, good grammar and spelling, and a good report structure.

Task 2

Write a function that loads the contents of a given text file into memory. Once loaded into memory you should output the text to the Console Window. You are expected to use the ‘ToCompress.txt' file for this part of the assignment (which is provided for you on Blackboard).

Task 3

The following code represents a letter-frequency pair structure:

struct LetterFrequencyPair
{
char character; int frequency;
};

Using this structure and the lecture slides to help you, develop a Linked List data structure that can be used to store a list of letters-frequency pairs. You should use the following test data to demonstrate to your tutor that you can store letter-frequency pairs in your Linked List:

Character

Frequency

A

2

a

3

{a space character}

5

;

1

1

2

Task 4

For this task, you need to write a function to find the frequency of each letter in the text supplied in the ‘ToCompress.txt' file. Remember this is a lossless algorithm and so your implementation must be case- sensitive and include all punctuation. You will need to make use of the code you produced for Task 2 and Task 3 to complete this task, i.e. a Linked List of letter-frequency pairs. Finally, you should print the letter- frequency pairs you have discovered to the Console Window.

Task 5

The following code represents a binary tree node that can be used to build a Binary Tree:

struct BinaryTreeNode
{
struct LetterFrequencyPair* letter_frequency_pair; struct BinaryTreeNode* leftChild;
struct BinaryTreeNode* rightChild;
};

Write a function that takes two BinaryTreeNode parameters, merges them according to the Huffman Coding Algorithm (i.e. it creates a new node, points the left and right child to the supplied BinaryTreeNode parameters, sets the frequency of the new node to be the sum of the parameter's frequencies and returns the new node). You should demonstrate to your tutor that this function performs correctly by using the following test data:

Character

Frequency

a

2

b

3

NOTE: you will need to initialise the new node objects leftChild and rightChild equal to NULL to indicate that they do not point to anything

Task 6

Write a function that takes a Linked List of BinaryTreeNode objects as a parameter and returns the BinaryTreeNode with the lowest frequency. This BinaryTreeNode should also be removed from the Linked List. You should demonstrate to your tutor that this function performs correctly by using the following test data:

Character

Frequency

x

4

y

2

z

5

w

3

NOTE: you will need to create a Linked List of BinaryTreeNode objects using the four letter-frequency pairs given above

Task 7

Write a function that builds a Huffman Tree making full use of the code produced in the previous 5 tasks (Tasks 2-6). You should make use of the algorithm given in the explanation above when building the Huffman Tree for the supplied ‘ToCompress.txt' file. You should demonstrate to your tutor that your program has correctly built a Huffman Tree for this file.

Task 8

The final task is to create a recursive function that prints out the Huffman Codes using the Huffman Tree created in Task 7. Your function needs to recursively traverse the Huffman Tree whilst building the Huffman Codes as it goes. All Huffman Codes should be output to the Console Window together with the letter they represent, e.g. A: 001

NOTE: you will need to use Dynamic Memory Allocation for your Linked Lists and Huffman Tree. To avoid Memory Leak, you must ensure that all allocated memory is deallocated before the program exits. Extra credit will be given if this is correctly done, i.e. there should be no examples of Memory Leak.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92755801

Have any Question?


Related Questions in Computer Engineering

Does bmw have a guided missile corporate culture and

Does BMW have a guided missile corporate culture, and incubator corporate culture, a family corporate culture, or an Eiffel tower corporate culture?

Rebecca borrows 10000 at 18 compounded annually she pays

Rebecca borrows $10,000 at 18% compounded annually. She pays off the loan over a 5-year period with annual payments, starting at year 1. Each successive payment is $700 greater than the previous payment. (a) How much was ...

Jeff decides to start saving some money from this upcoming

Jeff decides to start saving some money from this upcoming month onwards. He decides to save only $500 at first, but each month he will increase the amount invested by $100. He will do it for 60 months (including the fir ...

Suppose you make 30 annual investments in a fund that pays

Suppose you make 30 annual investments in a fund that pays 6% compounded annually. If your first deposit is $7,500 and each successive deposit is 6% greater than the preceding deposit, how much will be in the fund immedi ...

Question -under what circumstances is it ethical if ever to

Question :- Under what circumstances is it ethical, if ever, to use consumer information in marketing research? Explain why you consider it ethical or unethical.

What are the differences between four types of economics

What are the differences between four types of economics evaluations and their differences with other two (budget impact analysis (BIA) and cost of illness (COI) studies)?

What type of economic system does norway have explain some

What type of economic system does Norway have? Explain some of the benefits of this system to the country and some of the drawbacks,

Among the who imf and wto which of these governmental

Among the WHO, IMF, and WTO, which of these governmental institutions do you feel has most profoundly shaped healthcare outcomes in low-income countries and why? Please support your reasons with examples and research/doc ...

A real estate developer will build two different types of

A real estate developer will build two different types of apartments in a residential area: one- bedroom apartments and two-bedroom apartments. In addition, the developer will build either a swimming pool or a tennis cou ...

Question what some of the reasons that evolutionary models

Question : What some of the reasons that evolutionary models are considered by many to be the best approach to software development. The response must be typed, single spaced, must be in times new roman font (size 12) an ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As