#### Question Details

##### [solution] » could you help me implement the required methods of HuffmanTree

**Brief item decscription**

Step-by-step solution file

**Item details:**

could you help me implement the required methods of HuffmanTree**More:**

**could you help me implement the required methods of HuffmanTree with the given algorithm in java:1**

** **

**CSI 2110 Computer Science Fall 2016 University of Ottawa Assignment #2 Programming (75 points, weight 7.5%); Due: Sat, October 15, 11:59PM**

** **

**The assignment must be uploaded on blackboard; follow the submission instructions.**

** **

**Late submission is only accepted from 1 min - 24 hs late for 30%off (i.e. your mark *0.7). Huffman Compression**

** **

**Lossless compression is a type of data compression that allows the original data to be**

** **

**recovered from the compressed data. Huffman encoding is a compression technique that**

** **

**reduces the number of bits needed to store a message based on the idea that more frequent**

** **

**letters should have a shorter bit representation and less frequent ones should have a longer bit**

** **

**representation. Text compression is useful when we wish to reduce bandwidth for digital**

** **

**communications, so as to minimize the time required to transmit a text. Text compression is**

** **

**also used to store large documents more efficiently, for instance allowing your hard drive to**

** **

**contain as many documents as possible.**

** **

**In this assignment, you will implement the Huffman compression and decompression**

** **

**algorithm.**

** **

**Objectives: Practice implementing tree-based data structures. Practice analyzing different**

** **

**alternative implementations for auxiliary data structures (priority queues). Appreciation of**

** **

**how data structures can be used for practical purposes. Having fun while learning. 1 How the Huffman code works**

** **

**Suppose we have 4 possible letters that appear on a message, string or file: A,C,T,G. We**

** **

**could encode these letters with a fixed length code using 2 bits for each letter:**

** **

**letter**

** **

**fixed length encoding A C G T**

** **

**00 01 10 11 In this way, message: AACGTAAATAATGAAC that has 16 letters can be encoded with 32**

** **

**bits as 00000110110000001100001110000001**

** **

**Huffman encoding would instead choose a variable length code to take advantage of**

** **

**difference in letter frequencies:**

** **

**letter**

** **

**Frequency in text**

** **

**Huffman code A**

** **

**9**

** **

**1 C**

** **

**G**

** **

**T**

** **

**2**

** **

**2**

** **

**3**

** **

**010 011 00 With Huffman encoding the same message can be encoded with 27 bits (saving 15% space)**

** **

**110100110011100110001111010**

** **

**Now imagine a more drastic scenario where a text message uses a standard 8-byte encoding**

** **

**for each character (like in UTF-8 character encoding) able to support 256 different characters.**

** **

**However, our specific message is the same as in the example above, so that for the 4**

** **

**characters we have the same frequencies as shown in the table above but the other 252 other 2**

** **

**characters have frequency 0. Now the message above would be encoded in UTF-8 using**

** **

**16*8=128 bits or 16 bytes. Now comparing our Huffman encoding above that has 27 bits,**

** **

**which is a bit less than 4 bytes we can now save 75% space.**

** **

**Now if you are not impressed about saving 12 bytes of storage, suppose you want to store the**

** **

**entire human genome composed of a sequence of 3 billion nucleotide base pairs. This can be**

** **

**represented by a sequence of 3 billion characters each being one of A (adenine), T (thymine),**

** **

**G (guanine), C (cytosine), which could be stored using 3 billion bytes using UTF-8 encoding**

** **

**which is about 3GB. Huffman encoding for such a file would depend on the relative**

** **

**frequency between these four letters, but assuming about 25% frequency of each of these 4**

** **

**letters (and of course absence of any other of the 252 character), Huffman would encode each**

** **

**letter with 2 bits yielding a compressed file of 750MB.**

** **

**How to build the Huffman code from the letters in a text and their frequencies?**

** **

**The Huffman code is always a prefix code, that is, no codeword is a prefix of any other**

** **

**codeword. This generates no ambiguity. When decoding our example**

** **

**110100110011100110001111010**

** **

**we have no doubt that the first word is an A since no other codeword starts with a 1; similarly**

** **

**we decode the next A. Then the next bits will be read one by one until we conclude**

** **

**unequivocally that the next codeword can only possibly be 010 which is a C. Try continuing**

** **

**this example to see how the original text can be decoded using the Huffman code given in the**

** **

**table in the previous page.**

** **

**Huffman?s algorithm produces an optimal variable-length prefix code for a given string X. It**

** **

**is based on the construction of a binary tree T that represents the code.**

** **

**Here we quote the textbook by Goodrich et al. 6th edition page 595:**

** **

**Each edge in T represents a bit in a codeword, with an edge to the left child representing a 0**

** **

**and an edge to the right child representing a 1. Each leaf v is associated with a specific**

** **

**character and the codeword for that character is defined by the sequence of bits associated**

** **

**with the edges in the path from the root of T to v. Each leaf v has a frequency, f(v), which is**

** **

**simply the frequency in the string X of the character associated with v. In addition, we give**

** **

**each internal node v in T a frequency, f(v), that is the sum of the frequencies of all the leaves**

** **

**in the subtree rooted at v.**

** **

**For the previous example, where X=?AACGTAAATAATGAAC?, and the frequencies and**

** **

**codewords given in the previous table, we have the following Huffman tree:**

** **

**16**

** **

**/**

** **

**7 **

** **

**A(9) / **

** **

**T(3)**

** **

**4**

** **

**/ **

** **

**C(2) G(2)**

** **

**In order to understand how the tree can be used for decoding, use the tree to decode the**

** **

**encoded string 110100110011100110001111010 to obtain back X. You can find another**

** **

**example in page 596 of the textbook. Inspect that to better understand Huffman trees. 3**

** **

**Now, we just need to explain the algorithm to build the Huffman tree: the Huffman coding**

** **

**algorithm begins with each of the characters of the string X being the root node of a singlenode binary tree. Each iteration of the algorithm takes the two binary trees with the smallest**

** **

**frequencies and merges them into a single binary tree creating a root of the new binary tree**

** **

**having the sum of he frequencies of the two subtrees. This process is repeated until only one**

** **

**tree is left.**

** **

**The construction of the tree in the example above is illustrated next:**

** **

**Start: A(9), T(3) , C(2), G(2) Iteration 1: A(9), T(3) , 4**

** **

**/ **

** **

**C(2) G(2) Iteration 2: ) A(9), 7**

** **

**/**

** **

**T(3) **

** **

**4 / **

** **

**C(2) G(2)**

** **

**Iteration 3: 16**

** **

**/**

** **

**7 **

** **

**A(9) /**

** **

**T(3) **

** **

**4**

** **

**/ **

** **

**C(2) G(2) At each iteration, we need to remove 2 trees with minimum frequency and insert 1 combined**

** **

**tree with the sum of the subtree?s frequencies in the list of trees. These two operations**

** **

**indicate we need to use a priority queue. The algorithm to build a tree is given next:**

** **

**Algorithm Huffman(X):**

** **

**Input: String X of length n with d distinct characters**

** **

**Output: Huffman tree for X**

** **

**1. Compute the frequency f(c) of each character c of X.**

** **

**2. Initialize a priority queue Q**

** **

**3. For each character c in X do {**

** **

**4.**

** **

**Create a single-node binary tree T storing c**

** **

**5.**

** **

**Insert T with key f(c) into Q }**

** **

**6. While Q.size() > 1 do {**

** **

**7.**

** **

**e1= Q.removeMin()**

** **

**8.**

** **

**e2= Q.removeMin()**

** **

**9.**

** **

**Create new binary tree newT with left subtree e1.tree and right subtree e2.tree**

** **

**10.**

** **

**newFreq= e1.freq+e2.freq**

** **

**11.**

** **

**Insert newT with key newFreq into Q**

** **

**}**

** **

**12. e= Q.removeMin()**

** **

**13. return e.tree 4 Proving that the Huffman tree algorithm produces an optimal prefix code is beyond the scope**

** **

**of this assignment, but this can be found in other more advanced textbooks. This is an**

** **

**example where the ``greedy method? is successful in finding the optimal solution for a**

** **

**problem. 2 Implementation details and what you must do**

** **

**Compression Algorithm (Huffman encoding)**

** **

**When using Huffman encoding for compressing a file, the compressed file must include**

** **

**information in order to reconstruct the Huffman tree necessary in the decoding; that is, we**

** **

**need somehow to inform the character frequencies so that the decoder constructs the tree in**

** **

**exactly the same way as the encoder (breaking ties for equal frequencies in a consistent**

** **

**manner). In this assignment this information is stored in class LetterFrequencies. If we were**

** **

**generating a file, we would need to write this information in the header of the file (for**

** **

**example, the first bytes of the file would contain a sequence of pairs (letter, frequency).**

** **

**Another issue when creating a stream of bits to write to a file is that these bits must be written**

** **

**as bytes, and as it can be seen in our first example, the number of bits might not be a multiple**

** **

**of 8. Thus the last byte will need to be filled with extra bits that are not part of the sequence.**

** **

**So that this does not produce an incorrect decoding of these extra bits, we will introduce a**

** **

**special symbol that we call EndOfText for us to know when the sequence ends ignoring any**

** **

**trailing bits used to complete a byte.**

** **

**While the compression of files is a motivating application, in this assignment we will simply**

** **

**focus on the Huffman encoding and decoding parts. The bits generated will be stored as**

** **

**characters of a String for easiness of visualization and we will not be concerned with the part**

** **

**of packing the bits generated into bytes to be written to a file. (The interested student could**

** **

**take on this task of extending the current program to do file manipulation for their own**

** **

**interest).**

** **

**Huffman Compression Algorithm (steps are in TestHuffmanWithStrings.testStringEncoding()) 1) Read characters of the input text recording their frequencies by using the constructor of**

** **

**class LetterFrequencies:**

** **

**LetterFrequencies lf = new LetterFrequencies(inputText);**

** **

**2) Build the Huffman tree using the frequencies of letters of the input text. This must be**

** **

**implemented by you in the following method that is called by the constructor of the**

** **

**HuffmanTree class:**

** **

**private HuffmanNode BuildTree(int frequencies,char letter)**

** **

**This method must build the Huffman tree and return the root of such tree. The**

** **

**HuffmanTree is a binary tree with nodes as members of the class HuffmanNode. The**

** **

**pseudocode of this algorithm has been given in the previous page under the name Algorithm**

** **

**Huffman(X). As seen there, this will use a priority queue; methods to build a priority queue**

** **

**have been provided in class HeapPriorityQueue that implements interface**

** **

**PriorityQueue. 5**

** **

**In addition to the tree, class HuffmanTree will also keep track of the leaves where each**

** **

**letter has been placed by using the leafWhereLetterIs array indexed by the character**

** **

**unicode (16 bits) which is a number between 0 and 216 -1, plus the extra symbol we created**

** **

**?EndOfText? which will be coded by our convention as character 216. This array will be**

** **

**useful when encoding characters.**

** **

**You have been given the initial portion of method HuffmanTree.BuildTree (Steps 1-5**

** **

**have been implemented). You need to complete its implementation with steps 6-13 of**

** **

**Algorithm Huffman(X).**

** **

**Note that printCodeTable() has been implemented for you, and will do an inorder traversal**

** **

**of the Huffman tree and print the codeword for each letter. This will help to verify correctness**

** **

**of your method HuffmanTree.BuildTree.**

** **

**3)**

** **

**The encoding of the input text is done by traversing the input text again and encoding**

** **

**character by character doing individual calls to method:**

** **

**HuffmanTree.encodeCharacter(char c );**

** **

**This method must be implemented by you. Given a character you must determine the bits of**

** **

**its encoding and return them as a String. In the example, the result of**

** **

**encodeCharacter(?G?)**

** **

**will be ?011?.**

** **

**The running time of this method should be O(L) where L is the length of the output code.**

** **

**Note that the calls for appropriate methods to do the 3 tasks have been implemented in**

** **

**method testStringEncode(inputText) in class TestHuffmanWithStrings, which**

** **

**should be used as given. What you need to implement for encoding are methods BuildTree**

** **

**and encodeCharacter in class HuffmanTree. Decompression Algorithm (Huffman decoding)**

** **

**Huffman Decompression Algorithm: 1) Based on the letter frequency used in encoding, build the Huffman tree again. In our case,**

** **

**we have saved the letter frequency and we will use again. (If one was using this for file**

** **

**compression, we would read the character frequencies that would have been written in the**

** **

**header of the encoded file; this is not the case in this assignment).**

** **

**2) Decode character by character from the stream of bits of the encoded text. This is done by**

** **

**several calls to method HuffmanTree.decodeCharacter, as it has been provided in**

** **

**by method TestHuffmanWithStrings.testStringDecode, which is shown next: 6**

** **

**public static String testStringDecode(String codedText, LetterFrequencies lf) {**

** **

**HuffmanTree huffTree= new HuffmanTree(lf);**

** **

**BitFeedInForString seq=new BitFeedInForString(codedText);**

** **

**StringBuilder decodedText=new StringBuilder();**

** **

**while (seq.hasNext()) {**

** **

**int symbol=huffTree.decodeCharacter(seq);**

** **

**if (symbol == Integer.MAX_VALUE)**

** **

**break;**

** **

**if (symbol!=HuffmanTree.EndOfText)**

** **

**decodedText.append((char) symbol);**

** **

**else break;**

** **

**}**

** **

**return decodedText.toString(); // return the decoded string**

** **

**}**

** **

**What you need to implement is the following method of class HuffmanTree:**

** **

**public int decodeCharacter(Iterator<Byte> bit)**

** **

**Successive calls to bit.getNext() will give you the bits of the codedText. Method**

** **

**decodeCharacter must use the Huffman tree with root stored in root and traverse it by**

** **

**using the bits until it reaches a leaf which corresponding to a letter; the integer code for such**

** **

**letter must be returned. Invoking getLetter() for that leave will give the integer code for**

** **

**the letter.**

** **

**Any error encountered in this process must be indicated by returning Integer.MAX_VALUE,**

** **

**which is a value used by no letter or special symbol Huffman.EndOfText.**

** **

**The running time of this method should be O(L) where L is the length of the code for the**

** **

**character found. 3 Assignment Tasks and Mark Breakdown (75 marks)**

** **

**Examine the classes provided. In particular, you need to read the main program**

** **

**and other methods of class TestHuffmanWithStrings which are implementing the main**

** **

**steps of the encoding and decoding algorithms described above.**

** **

**This class is not to be altered; additional texts can be used to test your program by yourself or**

** **

**your TAs by adding extra strings to array String textsToTest in the main program.**

** **

**Part 1) Implement the required methods of HuffmanTree described in section 2 above,**

** **

**namely:**

** **

**A. (25 marks) private HuffmanNode BuildTree(int frequencies,char letters)**

** **

**B. (20 marks) private String encodeCharacter(int c)**

** **

**C. (20 marks) public int decodeCharacter(BitFeedIn bit)**

** **

**Part 2) (2 marks) Test your method with strings given. The TA may add more Strings when**

** **

**testing your code, so be sure to do extra tests in addition to the ones provided.**

** **

**You should show the output with TestHuffmanWithStrings obtained in your program (file**

** **

**output.txt) 7 Part 3) (6 Marks) What is the running time of Algorithm Huffman(X) in terms of big-Oh as a**

** **

**function of the number of letters ? Answer this question for the following 3 implementations**

** **

**of priority queue Q. Briefly justify in each case.**

** **

**A. Priority queue Q implemented using a Sorted List.**

** **

**B. Priority queue Q implemented using a Unsorted List.**

** **

**C. Priority queue Q implemented using a Heap.**

** **

**Part 4) (2 marks) Submit strictly following the instructions below. 4 What to handin and submission instructions**

** **

**1) Create one directory with name being the letter ?s? followed by your student number.**

** **

**If your student number is 564789 your directory will be called s564789**

** **

**2) Inside the directory include the following: the source code of your program, i.e. all the *.java classes for your program.**

** **

**(these includes all files that we provided and your modified HuffmanTree.java) a file called output.txt showing the output obtained with the main program**

** **

**TestHuffmanWithStrings A file called README.txt containing:**

** **

**o Your answer to the questions in Part 3)**

** **

**o Optional: Any information relevant for the TA to mark your program. For**

** **

**example, if not all parts of the code are working you may inform the TA here**

** **

**which parts are working, explain known bugs, etc.**

** **

**3) zip the directory containing all the files above and submit (in the example of the student**

** **

**number above the file submitted will be called s564789.zip) When the TA unzips the file, it**

** **

**should see the directory s564789 containing all files described above.**

** **

**About this question:****STATUS**

Answered

QUALITY

Approved

ANSWER RATING

Pay using PayPal (No PayPal account Required) or your credit card. All your purchases are securely protected by PayPal.

Answered

QUALITY

Approved

ANSWER RATING

This question was answered on: * Feb 21, 2020 *

* * Solution~00065704191.zip (18.37 KB)

This attachment is locked

We have a ready expert answer for this paper which you can use for in-depth understanding, research editing or paraphrasing. You can buy it or order for a fresh, original and plagiarism-free copy (Deadline assured. Flexible pricing. TurnItIn Report provided)

### Need a similar solution fast, written anew from scratch? Place your own custom order

We have top-notch tutors who can help you with your essay at a reasonable cost and then you can simply use that essay as a template to build your own arguments. This we believe is a better way of understanding a problem and makes use of the efficiency of time of the student. New solution orders are original solutions and precise to your writing instruction requirements. Place a New Order using the button below.