Huffman Coding - Greedy Algorithm

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi Kareem I'm as new here and today we're going to talk about the Huffman coding algorithm the Huffman coding algorithm is a compression algorithm that uses the greedy choice property to compress data so that it doesn't lose any information so let's dive in thank you for joining us today we're going to talk about the Huffman coding greedy algorithm now the Huffman coding algorithm is a data compression algorithm that strives to compress data in a lossless form a definition of lossless is without loss of information so it's based on the lengths of assigned codes which are based on frequencies themselves variable length codes are known as prefix codes and we're going to go over those in more detail as we walk through an example so here we have a chart featuring the characters a II I s T P and the newline character and we've assigned codes for them that are all three digits in length in the code column as well as their frequencies which is how often they appear in a certain block of text the last column is the total bits column now to get this value we take the length of the code so in all of these cases it's equal to three and we multiply it by the frequency so ten times three is equal to 30 and so on so when we add up the total bits column we have a total bits used of 174 we want to try to do better and that's what we're going to use the Huffman coding algorithm for to try to reduce the number of bits used without losing any of our information so first we're going to go over what a code tree is so here I have a tree and along the bottom I have the leaf nodes AEI STP and new line which represents the characters that we had in the original chart a traversal down a left branch of a tree has a value of zero whereas a traversal down a right branch of a tree has a value of one so I'm just going to go ahead and fill in those values in the tree so now if I want to get the value of a we Traverse down left branch of the root and we add a zero to the prefix code then we Traverse down one more time adding another zero and then we Traverse down one last time adding the last zero giving us the code zero zero zero for the character a now if I want to find the character code for P I Traverse down a right branch adding the one to the code then I Traverse once down the left branch from that node adding the zero and then I Traverse again down the right branch getting to P so now my code for P is 1 0 1 so the goal is to reduce the traversal through the code tree to the nodes that have the highest frequency values so let's get started on the algorithm and we'll see how this works so along the right side I included a chart which features the characters and their frequencies so step one is to take the two characters with the lowest frequency in this case it's the newline character and the character s so now I need to create a two leaf node tree for these two characters so I create one node which is newline another node which is s and then they have a root node now this root node holds a value and the value is the sum of the frequencies of the nodes beneath so 1 plus 3 is equal to 4 we can treat this little subtree as an individual node with a frequency value of 4 so now that we're done using s and newline we can scratch them off of our character list and now we can move on and find the next lowest frequency character and add it to the tree so in this case the character is T with a frequency of 4 so I create a node for T I put its frequency value beneath and now we create a parent node which has a value which is equal to t4 plus the value of frequency of the entire right subtree which is 4 so 4 plus 4 is equal to 8 now that we're done using T we can scratch it off our character list and now we can move on to the next lowest frequency character in the list which is a which has a frequency of 10 so I create a node for a and I put its value down for ten now we create a parent node which is equal to 18 which is equal to the frequency of a which is 10 plus the frequency value for the right subtree which is 8 so now we connect those trees and I've created this new subtree so now that we're done a we can scratch it off the list and now we can move on but wait a minute 18 is really high in fact it's higher than all three of the remaining characters in the list so we can no longer add to this subtree so now we need to create a new sub tree using the next to lowest frequency characters so the next lowest frequency characters I and P so I with a value of 12 and P with a value of 12 I make two leaf nodes out of that and then I connect them to a parent which has a value of 25 which is equal to eyes frequency of 12 plus peas frequency of 13 now that we're done using InP we can scratch them off the list so now I have one node left and it's e so now I can actually attach e to one of the two sub trees now I want to attach it to the smallest valued sub tree so in this case we're going to attach it to the tree with the value of 18 so I create a leaf node E with its frequency value of 15 beneath and now I create a new parent node with the frequency value of 33 and I connect the root with the tree so now I have these two sub trees so now I've done with E so I can scratch it off the list and we were no longer adding any leaf node characters to our trees but I'm still left with two sub trees that I need to connect together to form one larger tree to do so I connect the node add 25 with the node at 33 to a parent node which is now the main root node for the entire tree which has a value of 58 so that concludes our tree and a left traversal has a value of 0 and a right traversal has a value of 1 so I'm going to go ahead and fill in these values here so now we want to see what this does to the codes for each individual leaf character so I now have that chart that we saw earlier but now the codes are blanked out and the total bits are blanked out so let's fill those in so if I want to find out what the code for a is I Traverse down the true the right branch once I Traverse down the right branch again and I do one left note traversal so my code is 1 1 0 the code length is 3 times the frequency which is 10 using us a total bit value of 30 now what I want to do the character code for II I Traverse down a right branch once and then I Traverse left once to get to e so I've only Traverse two branches which is a shorter distance to e which is a high frequency character so the code for this is 10 which is of length 2 and it's frequency is 15 giving us total bits of 30 so you continue on and you fill out the rest of the codes and you fill in the rest of the table now you can see for the newline character in the character s they actually have a longer code of 5 characters this time of 5 bits this is because they are seldom used so we can afford to have a longer code for these particular characters but now we have a new total bit column and when I sum them up we get a total bit value of 146 this is a big improvement over the total bits of 174 that we had without the Huffman coding algorithm where all the codes were of equal length so that concludes our slides on Huffman coding thanks for joining us and have a great day for watching go to see us break down calm for more subscribe to our videos and like this bye
Info
Channel: CSBreakdown
Views: 574,784
Rating: undefined out of 5
Keywords: Greedy Algorithm, Algorithm (Literature Subject), Huffman Coding, Computer Science (Field Of Study)
Id: dM6us854Jk0
Channel Id: undefined
Length: 8min 27sec (507 seconds)
Published: Sat May 16 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.