9.1 Knuth-Morris-Pratt KMP String Matching Algorithm

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi the topic is K&B algorithm KMP algorithm is used for pattern matching if a string is given and a pattern is given then the problem is to find out whether this pattern is existing inside this string or not if it is there then find out its index now for this problem there are various algorithms where kmb is also a one of D algorithm there is also a basic algorithm nave algorithm so let us look at how nave algorithm works then we can understand the problem for the drawback of naval Gorem then we'll see how KMP is better than basic or knife algorithm let us see the basic algorithm see if there is a string G 1 and we have to find out this pattern then we will check if it is there or not so we'll start from the first letter of the string so I'll read on this pattern here d e f compare the first letter of a string with the pattern it's not matching if it is not matching then shift the pattern to the right by one step d e f now compare this one with this it's not matching then shift the pattern by one step this is not matching so shift the pattern by one step this is matching if there is a match no need to shift check the next that is also matching and check the next this is also matching so yes a pattern is found inside the string at the displace at this index that is 4 to 6 index this example was very simple I'll take one more example and show you now one more example here this is a string and the pattern that I have to search is this one so I have to find out whether there is a match for this pattern in this string or not so I'll know draw a string and shift it but instead of that I will take index I and J here so first letter I and J this is matching there is a match so link J as well as I so i'll just write ing they are matching so move j as well as i they are matching so move j as well as i again they are matching now move j as well as i there is a mismatch see we started this pattern comparing from this position onwards as I just now said that in the previous example I have shown you that if there is a mismatch this pattern will move by one step so there is a mismatch so again J will come in the beginning but what about I I will start from this place now already we have started from the beginning a now we'll start from B that is index 2 so this is the problem in basic algorithm already I have each till 5 and it is again shifted back so I is shifting in the string this is the waste of time this is the extra work in basic algorithm so again we'll start from here so this means that we have started the pattern from here now shifted the pattern to the right by one step there is the meaning so I came here again there is a mismatch so go to the next so we will start appearing from here so it means we have moved the pattern from here onwards now from here we'll be checking mismatch there is a mismatch move I there is a match so lose J as well as I there's a match means J as well as I there's a match so means J as well as I now this is mismatch this match means move J again in the beginning and bring again on this part see just now we have started from here now we have to shift this one so we'll start from here so I will come at this place so already it has shed three alphabet so it should come back we alphabet so J should start from here again see just now already we have checked these alphabets we are checking them again so this is the drawback of basic algorithm we are repeatedly checking the characters in a main string let me finish it quickly there is a mismatch I is moved there is mismatch I mean no there is a match with a a so in the in this way now if I continue then I can find this pattern here available here every character will be matched till here till here so we learn one thing here that the characters in the string are compared repeatedly whenever there is a mismatch it is shifting I backtracking I are moving back I so this backtracking of I we want to avoid let us see a worst case of basic or knave algorithm worst case of basic algorithm I have a string and a pattern let us run basically got among this one I starting from here J is studying from here there is a match if there is a match move I as well as chain see you started from index 1 of strength remember this there's a match so move I as well as J so first I will move J and also I will move I there's a match so you J and also I there is a mismatch when there is a mismatch move J to the beginning and move I to the second position this is the starting point just now we started from here that I should start from here now start from here again again start this is matching noise J move i matching move J move I again a a matching so move J and move Y a B is not matching so again move J in the beginning and a move I on the third place in this way continue see the problem in this one is e is repeating why I should move I should move just J one place and see that this is a this is a both are same just check that next alphabet is B or not you don't have to move back and start again from the beginning because all these are same only that's it that's it the basic algorithm doesn't study the pattern doesn't observe what is there in the pattern as blindly it is a checking every alphabet and just shifting this pattern by one step in case of mismatch as this is backtracking I what is the maximum time it is taking if the size of the string is N and the size of the pattern is Amma then almost this pattern is repeatedly comparing the string and this will be M into n order of MN time so this is the time taken by basic algorithm or name algorithm basic algorithm takes sort of MN time now can we study the pattern and reduce some time let us see what is the idea of kmb there is some terminology used in KMP algorithm first let us understand that terminology see if there is a pattern given here then we will can find prefix or suffix of a pattern so what does it mean by prefix of a pattern we have to take the subset of this alphabets in a pattern but we must start from the beginning of the pattern only how many subsets how many characters that should take in a subset as many as you want so I just take a okay yes it's a prefix no I will take a and D also yes this is also prefix I want to take ABC yes this is also prefix I will take ABCD yes this is also prefix all these are valid prefix then what about suffix I have to take the subset of this string but I should start from the right hand side only right and as many I want I can take so let us start can he takes he just serious can I take B and C yes BC can I take ABC is ABC can I take D ABC is B a B C and more but let us stop here now the idea of KMP is that inside this string is there any perfect same as suffix let us see a do you find an even know a be a be a be knows not there ABC yes it's there C this is matching so this is the prefix that is also there in suffix also there in suffix so this is the observation what kmpl go Adam will do on a pattern to avoid the number of comparisons instead of confusing with the perfect sense of X I'll make it simple is the beginning part of a pattern is appearing again anywhere else in the pattern or not that's what we have to observe I'll take few examples and show you let us look at the pattern and in KMP algorithm for a pattern we generate a pie table that pie is also called as longest prefix which is same as some suffix that is LPS we name it as pie or LPS whatever the name you like so how that table is prepared I will also show you that just observe it carefully let us take first pattern is it appearing in the middle sheis o being is also appearing a B so this a B is appearing here also then again a B again a is appearing here you see a as well as B is also appearing okay both are appearing again so what we do is we'll prepare a table we will write 0 0 0 0 this a B is matching that first and second as I said the indices we will start from 1 2 onwards so this is matching with 1 & 2 this is not matching this is matching it 1 & 2 this is not matching this is the pie table now there is an algorithm I don't want to show you I'm not tracing algorithm simply I'm showing you how to prepare it if you trace the algorithm you get the same results next let us look at this pattern is it appearing anywhere yes it's appearing OB is also appearing so this is matching so this is one and two then a anywhere yes it's appearing here so BC also bearing so these three are matching one two three rest of them are all zeros there's a pie table now let us check is it appearing even here itself so mark 1 is it appearing yes it's appearing then next alphabet nodes not same it is appearing here the next cells are bit also matching too then rest all of them are zeros what about this is appearing here 1 and this are you also appearing so from here if you take this is matching with this 2 so 2 and these three are matching with this 3 so take 3 then anywhere else he has a is appearing here the next day is also same so 2 and rest of all the values are zeros so this is the base of kmpl garden to prepare the table once you prepare the table the algorithm is simple then I will take one example and show you how algorithm let us look at the working of kmpl Gorda my explanation will be based on the tracing of an algorithm so one thing you have to observe that here in the string will not backtrack in the main string so let us start the working so this is a pattern so first of all prepare a pie table or NP r--'s so prefix table a this is appearing here so this is one the B is also there so two then D is not matching here so not just just two alphabets are matching so take this as 0 this and this all are zeros so let me put it in the table like this and also till indices 1 2 3 4 5 and also I will take 1 index 0 that will act as a starting point of a pattern now let us start first alphabet I is there then various J now it's it is not on the first alphabet we will take it on zero this is a special thing you can start from here also but the working will change little bit I am starting from zero this is easy let us watch J is on zero eyes on one first alphabet now compare this string of I with the pattern of J plus one being here check the next alphabet they are matching if they match then move J as well as I know again compare stringify with a pattern of J plus 1 this alphabet so wherever J is pointing you try to check the next alphabet that's B G is matching here so take J on the next and I also on the next now this J plus 1 yes this is matching this is matching J plus 1 so move J as well as I now J plus 1 is B and is B they are matching so move J as well as I now J plus 1 is not matching with I so now there is a mismatch what to do observe it carefully when there is a mismatch with the J plus 1 and I then mu J to this index to bring J here here move it here this means that this a B a B is repeating so he became 2 times so already we have checked one time so maybe we have to start from here so let us observe now J plus 1 but of this one is it matching more still there is a mismatch so again you j to displace J is here now is it a match with J plus 1 no still there is a mismatch but J's on 0 so we cannot leave jail further so move so in case of mismatch you observe that only J is moving I is not backtracking I have simply moved to the next alphabet let us continue J plus 1 is a mash with this I so move J move I J plus 1 B and I is B match so move J and move I J plus 1 and I see a mismatch so move J to displays J and still compare J plus 1 a and I miss match now there is a mismatch and even J is at 0 so we cannot move J further on the left so just move I now his this party observe carefully J plus 1 a I this is matching mu J mu I decided Mac here now J plus 1 is B and I they are matching so move J and move i J plus 1 is a and I is a matching so means J and move I J plus 1 that is B and I they are matching so means J and move i J plus 1 is a D but this is a they are not matching they are not matching then we're to move J at 2 2 then again compare J plus 1 a a match match C a B a B so a B already this portion is a B and this is also there here so we need not compare that we can continue from this portion here this was a mismatch you can again start from the beginning means we reach till here means this a we have came two times that came two times so one time you can skip on the second time on what you can continue checking so J plus 1 is matched with that one so Miss J move I J +1 match with that one so move J and move i J plus one is a D and I is also D so move J and i's incremented notice the end of a string actually this we have reached the end of a pattern so there is a pattern available inside the string that's all this is all kmpl for the work or the basic idea behind the kmpl go out almost that if the beginning part of a string is appearing again somewhere else in the string then don't again compare the characters right avoid the comparison and don't move i back again like it was happening in basic algorithm you might have observed that we were moving i only in forward direction we never back to at i then what is the time complexity of this k mb see the size of the string is and the size of the pattern if I say Emma then how many times we have gone through this only one time how many alphabets and so n is the time taken for parsing throw this string then for preparing a table we have parts for this one so how many alphabets here Emma so for preparing table M and for searching a pattern and this is for a table that is PI table and this is for search so the total time is n plus n so this is one of the faster algorithm for pattern matching
Info
Channel: Abdul Bari
Views: 692,854
Rating: undefined out of 5
Keywords: algorithm, algorithms, kmp algorithm, kmp string matching algorithm, knuth-morris-pratt algorithm, string matching, prefix of string, abdul bari, bari, gate
Id: V5-7GzOfADQ
Channel Id: undefined
Length: 18min 55sec (1135 seconds)
Published: Sun Mar 25 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.