9.1 Huffman coding example -Greedy Method |Data Structures

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys welcome back today's topic is Huffman coding Huffman coding what you can say Huffman algorithm is what it is a data compression technique or compression or you said it or you can say data compression algorithm or it can also be known as variable length coding algorithm in which data the size of data is to be reduced or the messages to be produced see sometimes when you store something in your computer the sizes size of that file is very big then what you do is you compress that file and then you store that file in your computer or computer so that it will take less time ok so that's what the compression means reducing the size of the message ok secondly when you want to transfer something on network when if then if the you know you want to transfer a file from sender to receiver then if that file or that message is compressed then obviously it will take less ghosts or you can say the bits which are used to represent for that message will be less that is what the compression means so in this video I am going to discuss and discuss with you fixed length coding and also the variable length coding technique see simple funda is what when something is to be transferred on a network then what is the what is the funda that suppose you want to transfer this message and each character will be coded in a sky form okay and then ask suppose a is to be transferred first of all then SK folding for a is what that is estee code is 65 and then this SK code will be converted into this binary form 65 Comcast looking is 0 1 1 ok this is the binary form for this a to transfer this a first a will be this sky coding will be used and that s Kai could make out the 8 bit will be used to represent a character ok so this data is to be transferred for a single character how many bits is to be used 8 bits fine in case of a normal message okay now see in this message suppose we want to transfer this message on a network then how many bits is to be used see I was saying the coast when data is to be compressed then for transferring that data or that message the coast will be reduced Coast means number of bits to be used ok here goes to means the number of bits to be used now you want to transfer this message without any without applying any compression technique then there's simple fun dies by default what is the technique every character would be transferred into eight bits binary form okay how many characters are there twenty characters are there these are twenty characters okay then how many bits will be required twenty characters are there and for each character how many bits will be used to represent that character because computer you know computer only understand the language of 0 and 1 ok 20 into 8 that is 160 bits will be used to transfer this message normally without using any compression technique 160 bits is to be used ok now see this is also known as what fixed length coding why this is known as fixed length coding because each character see each character is simply a string of zeros and ones okay and each character is having eight bits fixed length we have see is is represented by this one B is what B is a sky code is what 66 and 66 would be converted into this binary form and binary form and this would be Z 1 0 1 something like that this is the binary code for 66 same with CC is what 67 see in this message I am taking only a b c d the only five characters I am taking just for a simple example a B C D and E 68 and this would be 69 same 67 would be converted into binary but does it okay binary form and each character is represented with 8 bits so this is what fixed now each character is representing 8-bit representation that is why it is known as fixed length coding fine and in fixed length coding or to transfer a simple message without any compressing any compression technique the coast is what it will take 160 bits now after applying some compression technique we can reduce the length of this message so you can say the cost of this message to some visits may be 160 cm Hokie 120 IJ 100 as a it depends okay yeah we will we will see that technique also now see when we are using 8 bits I am just telling you a simple funda before discussing this one suppose you have two bits take a 1 m2 now with the help of these two bits how many combinations are possible 0 0 0 1 1 0 and 1 1 for different combinations are possible yes or no now with this four different combination you can represent four different characters let us suppose with 0 0 I am representing a b c and 4 1 1 i am taking this d ok now if you have 3 bits 1 2 & 3 how many different combinations are possible with these 3 bits 8 combinations I hope do you know this concept ok to raise to power 3 that is it something like this 0 0 0 okay so with the help of three mates if you are taking 3 rather than 8bits then we can represent how many characters eight characters are different characters fine similarly if you are taking four bits then how many different characters you can represent to raise to power 4 that is 16 here we are having 8 bits then we can represent how many different characters to raise to power 8 I think that is 256 characters fine but we are taking one simple example in this example how many different characters we have only five characters we have because is repeating these also repeating series repeating these repeating and is also repeating okay different characters are only five a B C D and E so with the help of two bits can you represent these five characters no we cannot represent because with the help of two bits only four different characters can be represented that is why we will take how many bits three bits with the help of three bits how many different combinations are possible eight combinations we have only five characters so this one is ideal case for us okay so rather than taking eight bits we will take what three bits to represent each characters suppose for this one I am writing a b c b and e these are five characters we have these are still unused or you can say left okay now suppose we are taking this technique this is our own technique okay now how many to transfer this message how the strong message will be transferred for a we will write what 0 0 and 0 next is B for B what is return 0 0 1 for B again 0 0 1 and something like like this we can write the coding now how many total bit are required to transfer this message total bits we have what 20 into for each character how many bits are to be used 3 bits for each character 20 into 3 that is 60 bits fine hair bits was 120 160 bit sorry now in this case one more thing is what the main funda is the problem will be there when decoding that message suppose we have transferred we have encoded this message and we have transferred now how the receiver will know that 0 0 0 means a how that receiver will decode that message message receiver ko kya mallika 0 0 0 0 0 1 and this something like this now how the receiver will know that 0 0 0 means is 0 0 1 means B that is the problem now in the case of decoding so whose problem for so alpha Nikoli what a Agha we will also send this table with this message fine now when the receiver gets this message as well as this table then easily that is your will see the message and then will verify with the that table and then he can easily say that 0 0 0 means what a 0 0 1 means what B because in this table we have already written 0 0 0 means a 0 0 1 means B si UD n e something like this so this table will also be sent now for sending this table obviously some bits will be required see the 60 bits are just for this message now you are we are sending this table also in case of normal message of the transfer correct a no need of transferring the table because by default 8 bits required who me your sub cop at the higher value of a is what 65 sky coding will be used simply 160-bit ho jayega no need to send any table but in this case we have to send this table okay now for sending this table how many bits is to be required see a VCU D&E these are characters obviously these five characters is to be sent then characters cases and only characters will be transfer will be converted into that a sky core then into the binary form and binary form a kowai each character would take what how many bits eight bits now we have got five characters into for each character how many bits would require eight take a four Tibbets plus C this these bits are also be to be transferred now because the a bit uptick is scaling only for this 5 characters now corresponding to this 5 characters we are also sending 0 0 0 corresponding to this B we will also send 0 0 1 so a B the OP kibbutz Amina fine now how many bits are there these are how many bits for five characters how many bits will be required three bits 5 into a Kaleb Athene BK limits which will basically be 3d caleb is Yuri Kelly b3 5 into 3 plus 15 make it now how they up got 55 I guess total number of bits will be transferred 60 plus 55 means 5 1 1 5 bits okay when you use this technique this is also fixed length coding because we have used 3 bits for each characters 3 bits are fixed now but we haven't used 8 bits here we have using 3 bits this is customized code on code ok in this case 1 1 5 bits has to be used now what is there in Huffman coding we will discuss now Huffman coding this is what variable length coding variable length coding this is what we have discussed was fixed length coding now what is variable length coding Huffman coding make yoga may be a would a would be represented with 3 bits and B would be represented with two bits maybe C would be represented with two bits they would be represented with three bits e maybe with four bits so this is variable now because in this case every character is represented with three bits that is fixed but in Huffman coding maybe a would be represented with three to then be with to be with two bits and C with two bits like something like that so this is variable length coding that is not fixed okay now how we will compress this message using Huffman coding rather than a sky code will not use the sky coding value to visually we'll use what Huffman coding now we have drawn a table and understable we have written the frequency or you can say count of each characters see different how many different characters we have five characters ABCD and E and a is ribbiting four times B is repeating seven times C 3 times D 2 times and E 4 times in this message so this is what frequency or you can say count okay now the first step is you are supposed to design a Huffman tree and how that tree is to be designed using Huffman algorithm first of all arrange all the characters according to their increasing order of their count ok very first the least frequency or the count is what - then we'll write D first then C then e a and B fine and the count of D is what 2 4 CH 3 4 E it's 4 4 8 4 & 4 beats 7 you can either write down URI it's up to you because we have 4 & 4 both are same now first of all take 2 characters which are having least frequency or least count which are two characters D and C because two out of two three four five and seven which are least two or three or while writing a program or something like that you can say we arrange these characters in priority queue and priority is what depends on the frequency count if frequency counter is 2 then it will have more priority then C then a then a and B so this one you can say priority queue or this can also be implemented using min-heap tree okay I am just going to tell you in the layman language okay what is the funda take B and C like this C we are taking B and C for D we have count is 2 for C it's 3 now merge these 2 and create a parent node and the count will be 2 plus 3 that is 5 now we are done with this B and with this C okay now in our priority queue what we have we have four four seven and this five C one knew that the parent of 20 that is 5 now arrange 5 4 4 7 in increasing order of their frequency fine so first of all what you will write e a after a and a what is there this one this is complete node now then it would be 5 and then 7 okay now out of 4 4 7 & 5 take the two characters having least frequency we are having a and a 4 4 ok now see e a and these are having but count is four and four and after merging what is the count that is 8 now we are done with this e and this a now we are left with three frequency count 1 is the 7 1 is 5 and 1 is 8 now arrange this 7 5 & 8 in increasing order of their frequency counter first of all what we will write this node 5 then 7 and then 8 now out of these 3 take 2 characters having minimum a frequency we will take this wife and this 7 minimum see up 100 now always the minimum count frequency count would be left side okay and you know that greater than that minimum would be on right side then we'll take five seven eight out of these what we'll take five and seven okay and when you are designing a tree this five would be left left side and this seven would be right side okay now you know we don't have so much space for may seem a scope date Kareem we have this node 5 and we are merging the 7 with this 5 now then we will write this 7 at this level only this level only will not write 7 here okay you'll write 7 here 7 Kaylee what we have B and Moore's these tubes and the total would be what - well now we have done this with this B now only left frequencies are this node 1 is the snowed having frequency 8 frequency count and 1 is to n now these two are against these two in increasing order of their frequency the first node would be this one 8 per la I got 12 ki Kowski bad ayah or just we are left with only two nodes so I made oath on old lady had Oh correct terminal any Heather will take a 1012 but will not write like this well up car you're happy here Cusco at coomer's ki Orakei on 20 um yeah no this one is wrong always the rule is what the minimum frequency will be on the left side and the greater than that would be on the right side then 3 would be will take a tenth 12 now tree would be something like this here we have e a 4 4 and this one was 8 and 8 this side what we have this 12 we are merging these 2 or 12 impressive hemara wanna have with a 5 you happy top now 7 5 can you check your head to or reschedule any J that is 3 or you can say here what we have D here you out we have C or 7 k epic you have any past B and after merging of these two this one what we have 20 okay so this is a final drink this one the characters will always be in the at the leaf node EA d CN B now we have drawn this tree now second step is what you have to traverse this tree and you have to assign 0 & 1 now 0 would be assigned to the left side and 1 would be assigned to the right child so 2 n take a left child here this one 0 would be assigned here to the left side is 0 would be a sign right side 1 right side 1 now see this one this is right of it then one would be assigned now 12 here it is what this is left then 0 left 0 right side here then 1 now we have sine 0 & 1 to each this edge okay now you have to find out the coding Huffman coding for each characters now how to find out the Huffman coding okay now this is a Huffman tree okay find out the coding for each character for a how to find out C character would be at the leaf node a how to traverse the tree always we cannot reverse the tree from the leaf node we will traverse it from the root 1 traverse it from the root end traverse the tree till you reach the node a now 0 & 1 here is what a the coding for a would be what 0 1 0 1 for D it's what 1 1 1 1 for C what is the this coding Huffman coding here we have 1 0 & 1 1 0 & 1 for D 1 0 0 1 0 0 for e 0 0 okay now this is the Huffman coding see every character is having not having the fixed code having variable code is represented with two bits C and D are represented with three bits so that is why it is known as what variable length coding and one more point you have to take care of it see a poiiceman thickness a capital rather frequency of what you consider account of B is what highest so the letter or the character having the highest or the maximum pound the character which is generating you know many times which that character will have the Huffman code having less number of bits Huffman code of this B is worth having two bits and for to see D is occurring two times and Huffman code for this two is what having three bits maybe some character is there which is occurring one times and the Huffman code of that character would be of four bits so Jo character subset Java time I can't escape frequencies of Sujatha who use Kaufmann code code code my bits can we come homie okay that is the rule and that's half man coding will also satisfy the prefix rule I will discuss that role in later first of all let us find out the size of what you can say the you know number of total number of bits in transferring while you are transferring this message how many total number of in bits will be transferred for this message okay now you cannot say keep 20 characters are there 20 into polemic I had a twenty into eight when we were taking eight bits for representing a character those two time honey katha twenty into three at that time you are taking three bits for representing of the character but we happier by solution except a because we are not having the fixed bits we are having variable number of bits now how to calculate the total number of bits or you can say the length of this message how we will calculate C frequency of a is what for four into number of bits today use to represent to represent this a is what one and two 4 into 2 plus for b7 into 2 + 4 C 3 into 3 + 4 D 2 into 3 plus 4 E 4 into number of frequency into the number of bits 1 2 the total would be the number of bits would be 45 bits for only this message now how the message will be decoded at that place of a what you can write 0 1 at the place of B what you will write 1 1 at the place of being able to write 1 1 at the place of C what you would write 1 0 1 in like this this message or the message in binary form will be sent from sender to receiver when the receiver gets this message then how the receiver will decode this message how the receiver will get to know that 0 1 means a 1 1 means B how skinny can operate either you have you will send this table or you will send this 3 okay 3 in serialized form because obviously you can't send this tree something like this is serialized for me I put recuperation operator okay now either this table would be sent or this tree would be sent so that by taking by traversing or by looking at this tree or the stable this receiver will get to know that the 4 0 1 is forward a the code 1 1 is 4 what be something like this so 45 bits of kaga for only this the length of this message this is the length of this message only okay plus we will send this table suppose okay now how many bits would be required to send the stream in this table will send the characters and corresponding code Huffman code for these characters now for sending these characters how these characters will be sent C characters will be saying what the characters would be converted into a sky code than a sky to go code second characteristic converting into 8-bit binary form okay now we have many characters five characters into each character would be represented with Gator bits in a sky code then 5 into 8 that is 40 now along with these characters these Huffman code will also be sent four characters for Tibbets would be required for sending these codes how many bits would be required 40 plus 1 2 3 4 5 6 7 8 9 10 11 12 how many bits are there 12 bits won't be required the total would be what 52 now total bets to be transferred from sender to receiver would be 45 + 52 that is 97 bits so in Huffman coding when you compress this message any message using Huffman coding or I'm taking example of this message then 97 bits would be required 97 bits would be used from Sen sent from sender to receiver value 1/16 use givethem place created now 1 1 5 and using Huffman coding it becomes what 97 bits okay now see how the that you know receiver will decode this message suppose receiver is having this tree in the serialized phone ok now how that decoder will decode this message when the you know when receiver will see the message 0 then traverse corollary you 0 0 safe copy I go to it did you get any character no so 0 cm kiss boba character - Alyssa things check out the next next bit that is 1 0 1 traverse this tree 0 1 of course we have reached to a character a then the receiver will decode this 0 1 and will write what hey now again this one is done character Malkowski bar next bit is what 1 traverse this tree 1 we have reached to this 12 but is there any do you reach - have you reached to the leaf node because characters are at the leaf node only no we're still in the middle of the tree so we cannot write any character here now next one one one who you find a character that is B then receiver will decode it like B again one you see one again one one then again big fine like this he'll the receiver will check out this tree and then decode this message for to see how this would be done next bit is what one one because because here a here is no character the next bit is 0 1 0 here also we have no character the next bit is 1 1 0 1 yes we have reached to a character C like this the message will be decoded with the help of this Huffman tree so along with the data along with the message either the tree or the table would be also be sent okay obviously it way would be sending serialized form only ok so these are the number of bits would be required when you compress this message or even send this message using Huffman coding now the time complexity of this would be theta n log n now see I have told you 1 funda that is perfect school that Huffman coding works on the technique that is prefix code what is the prefix code algorithm or what is the proof what is the prefix code rule you can say it says no code is prefix of another code let us take a simple example then you will get it file or in example we have only 3 characters and the code for these characters are for a it's 0 for B it's 1 for seeds 0 1 I am taking the simple example 0 1 and code for C is 0 now suppose from sender to receiver when he message has been transmitted be there in the encoded form and that message is what 0 0 1 now the receiver will decode this message with the help of this table because this table will also be sent with this table sorry with this okay now there will be ambiguity in recording this message how that is healer will decode this message first bit is what zero receiver will check in this table zero is for what a now this zero first a zero is for what a now next bit receiver will check the next bit 0 again this zero is forward a now one one is forward B and the second second option would be what four zero it's a and again 4 0 1 4 0 1 that is Hever will decode it see now this one message will be decoded may be decoded in two forms a B or a C so this is ambiguity because we have sent only one message and the receiver kun-hee Petacci legacy which one is right this one is right or this one is right and why the sign bigotry is there because this zero code of this a is prefix of this code this code 0 okay for C code is what 0 1 and a kala 0 case Kelly hey Kelly so it's a zero for moles at then this one is what prefix and this code of this is what prefix of another code so this should not be happened no code should be prefix of another code okay only if this condition is satisfied then only there would be no ambiguity no ambiguity in that the receiver side while decoding the message and the Huffman coding the Huffman coding will always ensure the prefix rule the prefix code or prefix rule you can easily check kuiba code TCB another code chi have a prefix in a ho okay so if my coding Kereopa about beach and rock near Half Men code will always Huffman coding will always follow this prefix rule and what is the prefix ole many a boys example get through Battaglia now in a sunrise to me you I tell you the huffman coding then these are the four points it was discovered or developed by David Huffman in 1951 next is it in the encoding follows the prefix rule I have already discussed what is the prefix rule no code can be a prefix of another code next is the most generated character we'll get the small code and the least generated character will get the larger code we have already you discussed in that example okay now the time complexity of this Huffman coding is or theta n log and in next video I will discuss you an example of Huffman coding with probabilities sometimes in exam they can give you the characters and they will give you not the frequency or count they'll give you the probabilities of those characters and then they will ask you what is the average length of the message so in next video I'll discuss with that question okay till then take and take care bye bye
Info
Channel: Jenny's lectures CS/IT NET&JRF
Views: 137,067
Rating: 4.900681 out of 5
Keywords: NET & JRF Coaching Videos, GATE Coaching Videos, huffman coding, huffman tree, huffman coding algorithm, huffman coding example, ugc net, net exam, ugc net syllabus, data structures and algorithms, data structure, ugc net computer science, net exam papers, jenny's lecture, jayanti khatri lamba, best book for data structures, binary tree in data structure, ugc net preparation, GATE computer science, best youtube channel for ugc net, algorithms, it, cse
Id: saofdNsZiYY
Channel Id: undefined
Length: 34min 4sec (2044 seconds)
Published: Thu Feb 28 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.