3.4 Huffman Coding - Greedy Method

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi the topic is Huffman coding Huffman coding is a compression technique it is used for reducing the size of data or message suppose you want to store the data in a file so you can store the data in compressed form to reduce the size of a file but the data is sent over a network data can be compressed and sent to reduce the cost of transmission in this video I am going to explain you if a normal message is sent what will be the cost and what is the fixed size encoding then what is variable size encoding that is Huffman coding now let us study the problem I have a message here now this message I want to store it in a file or I want to send it over a network then let us understand what will be the court cost of a message so the cost of this message or the size of the message will be measured in terms of bits let us check first of all there are 20 letters in this one so the length of this message is to indie next how this message has to be sent it has to be sent by using ASCII code electronic devices or computers all use ASCII codes for characters or English alphabets so these are all character ASCII codes are 8 bits I have used alphabets A to Z here so that is easy to understand for us so if I say a its ASCII code is 65 and the binary form is this one so total eight bits these are eight bits that are you presenting decimal 65 and that is representing a similarly if I say B then it is 66 this will be 4 6 2 6 likewise C D E for Everyone we have there is a six 67 and 68 and 69 and here the binary code will be there so eight bits are there for each alphabet now if normally this message is sent 8-bit for each alphabet that total number of bits will be how much as there are 20 alphabets in this message so this will be 8 into 20 that is 160 bits size of the message will be 160 bits now this is the simple message sent without encoding the size will be 160 bits 8 bit for each alphabet now let us look at can we use our own codes instead of ASCII codes how it is possible see I am not using all English alphabets or all possible symbols on the keyboard I am not using every symbol from the character set I am using only 5 alphabets that 2 capital letters then do I need 8 codes 8-bit codes for representing just 5 alphabets no few bits are sufficient so can I use my own codes with less amount of bits yes so nuts we will see fixed size codes here I have taken alphabets in this message and how many times each alphabet is appearing I have written the count here or you can also call it as frequency a is appearing 3 times 1 2 3 likewise you can check all how many times they are appearing I have written them here total there are 20 alphabets if you want to see the frequency then you can say this is 3 by 20 that is 3 by 20 or this is 5 by 20 because total number are 20 so this is 6 by 20 and you can take the ratios I prefer taking counts that is easy for working even what should be the code see I have 5 characters I don't need 8 bits for just 5 characters 8-bit ASCII codes are for twenty-eight characters or 128 symbols but for only five let us see if I have just one bit one bit then either you can write zero or one so how many symbols I can represent how many characters only two characters if I have two bits then how many I can represent I can write 0 0 0 1 1 0 1 1 so I can have a 4 different combination and each combination can represent one symbol but I have five so for that I can take three bits so in this how many combinations are possible 0 to 7 that is 8 combinations are possible so I will take 5 out of that so let us say in 3 bits I will say is 0 0 0 B is 0 0 1 C is 0 1 0 D is 0 1 1 and E is 1 0 0 I have given my own 3 bit codes I will use this for encoding this message and I will read this will reduce the size of the message so in me it means here instead of writing ASCII code I will be writing 0 0 1 for B and 0 1 0 for C likewise go on when the message is encoded using this one then what will be the size of the message as we know well that there are 20 characters and each takes 3 bits so this will be 60 bits so size of the message will be 60 bits and one more important thing we cannot neglect what is that when I am sending a message suppose encoded in this one now how the receiver will know what are the codes even if I store it in a file and then later I want to retrieve how do I remember what are the codes use 0 0 means a only is it C for making it easy I have taken the alphabets A to B here the alphabets can be anything this can be this can be M can be all this can be cute then how I can know which code is for what so I should also have the codes that is the stable also along with the message so this is encoded message and this is the table use this table to decode the message yes so message is 16-bit then you need the table also the chart of codes that will help to decode this message so let us see you have to send this one also how much size this will take see I have five alphabets this alphabet should be an original form now now standard is what ASCII code so I must write ASCII code of all these as he could of all these so for this 8 bits so total how many 5 so 5 into 8 bits this is for what characters or symbols then for each 3 bit code for a this is the code B this is the code for everything I should also have the code so total how many 5 and how many bits for each code 3 so this is 4 codes so this makes how much this is 40 plus 15 this is 55 and what is the size of this original message so message is 60 bits right and table or a chart of codes is 55 bits so total how much this will be 115 bits yes so total size of the message along with the chart is 115 bit is not just 16 bits it is 115 be no doubt the size of the message is reduced but table is also important so it was 169 became 115 so around 35 to 40 percent of reduction in size from 160 to 115 right so this is one method with fixed size course now we'll go to variable size code and let us follow Huffman coding we'll still reduce the size of the message of man encoding Huffman says that we don't have to say take fixed size codes for the alphabets some characters or alphabets may be appearing few times less number of times you may be appearing more number of times so if you give small size code for the more appearing characters then the size of the entire message will be definitely that used so let us see what is his idea already we have this table all the alphabets and how many times they are appearing in a message so that gives the frequency of those alphabets now Huffman code follows optimal Marsh pattern now we have to generate our own code so how what should be the method for the code so Huffman has given an approach for getting our own variable size course so he says that all those alphabets should be arranged in the increasing order of their count so this is the least one and this is the next one and exponents fall so they are arranged in the increasing order of their count or frequencies so the first the smallest one then the highest one so this optimal Morse pattern should be followed on this one so what I should do now is to smallest one and make one node so this is 2 plus 3 this is 5 now next select to smallest and form a node so 5 4 5 6 these are remaining out of this which are two smaller either I can take this 5 4 or that 5 4 so I will take this one so I will combine this fun with this so this is 9 now what are the lists I have or the nodes I have 9 5 6 these are the two minimum so combine these two this is LAN then combine these two two are remaining now so this one and this so this gives what 20 so here we got our optimal motion pattern tree so actually for the optimal merge pattern that is greedy approach always select two minimum one and we got this fun now this tree will help us define the quotes how on the left-hand side edges mark them as eros and on the right-hand side it just mark them as one that's it now what are the codes for each alphabet follow a path from root onwards so so for a what is the code 0 0 1 so is 0 0 1 then B is 1 0 1 0 C 1 1 1 1 D 0 1 0 1 and E 0 0 0 0 0 0 that's it these are the new course we have now you can see the Sun code is of three bits and one code is of two birds like that so even we may have the codes with one bit also not here in some examples so variable size codes are there now I can use these codes for encoding my message like B is 1 0 C is how much 1 1 C is 1 1 and a is 0 0 1 B is 1 0 B is 1 0 and D is 0 1 0 1 gone if I use these course what will be the size of my message so for that how many bits are there for a 3 bits and how many times a is appearing this is 3 into 3 9 and 2 bits and disappearing 4 5 times 5 into 2 10 2 bits 2 bits 1 1 appearing for 6 times so 6 into 2 12 2 bits appearing for 4 times so 4 into 2 eight and three bits appearing for two times so 2 into 3 is 6 so for entire message I have how many times each character is appearing and their sizes of codes I have taken and I got this one now total number of bits are 45 bits size of them as say ages 45 bits means if i encode all I just I just need 45 bits so the message size is 45 bits now one more thing we should consider along with the message I should also send this chart or a table or a tray then only decoding can be done means along with the encoded message what is the codes what are the codes used that should also be there then only decoding can be done so somehow this straight should be preserved or the table should be preserved or you see what will be the size of the table all these alphabets I should send their ASCII codes the standard code so 8-bit of ASCII code how many alphabets are there five so this makes how many 40 bits then a course also I should send so this course when they are stored so how many codes are there five course but what are the number of bits 1 2 3 4 5 6 7 8 9 10 11 12 so this is forming 12 bits so together this 40 plus that 12 so 52 bits are required for tree or a table tree or table so what is the total size message is 45 bits and the table or a tree is 52 bits and this will be 97 bits yes now you can see that it is still reduce from fixed size codes so that's all about Huffman coding now if you want to know the size of the mesh saij that 45 you can know it from the tree also how see this is the distance how much is the distance 1 2 3 and frequency is what 2 so 3 into 2 so distance into frequency of each or count of each you can take Sigma sum of all these so this will be distance is 3 and frequency is 2 plus distance is 3 and frequency is 3 plus distance is 2 and the frequency is for distances to frequency 5 and distances to and frequency is 6 so this is nothing but this calculation that we got and this will be 45 it's only so if you want to know what will be the size of the entire message from the tray also you can find out by using this formula if you refer a video on optimal merge pattern you find that foreign lies define let us see how decoding is done this is the message now in them completely in the binary bits form for the message for this you must have a tree for decoding so start from root 0 left side 0 left side 1 right side this is B be 0 0 0 1 completed then again start from the root 1 right side again 1 right side so here is the leaf let us see so these 2 are 1 then again once you have read the leaf again start from the root 1 right side 1 right side so this is C the next 0 left side 1 right side it has a D so these two are completed in this way this message she can be decoded and from binary bits we can get back the original ask cool so the receiver can convert the message into ascii code and use it in its machine so inside the machine it while using it will not be using the code so it has to convert into ASCII codes like when you have any zip file or compressed file you have to decompress for using it same way it should be decoded and used so that's all about Huffman coding
Info
Channel: Abdul Bari
Views: 710,624
Rating: 4.9186859 out of 5
Keywords: algorithm, algorithms, greedy method, huffman coding, optimal merge pattern, decode huffman, abdul bari, bari, gate
Id: co4_ahEDCho
Channel Id: undefined
Length: 17min 33sec (1053 seconds)
Published: Thu Feb 08 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.