Hello friends, my name is Tushar and
today I'm going to talk about KMP Substring Search so what is a substring search? Suppose I
have a text and i have a pattern. I have to tell that does this pattern exist in this
text or not? So, here my text is "abcbcglx" and my pattern is "bcgl" so this
pattern does exist in this text at this point here so in this case my substring search should
return 3, the index of the point from which this pattern exists. so what is our usual Algorithm
to find the substring search what we usually do is we start from this
from the 0th index of the text and 0th index of pattern and we compare 'b'
with 'a'. Since they are not a match, we go to the next index of text which
is 'b' so we compare 'b' with 'b', since it's a match we compare 'c' with 'c', again it is a
match so we compare 'b' with 'g', at this point it's not a match so what we do is we start from the next letter to our start point. So then we start from 'c'
and then we compare 'c' with 'b' again it's not a match so again we start from the next point. then
we compare 'b' with 'b' so this is a match and then 'c' is a
match, 'g' is a match and 'l' is a match so in this case we would return 3 which
is the index of the point at which text exist and if our pattern was something
like 'bcgll', in that case this pattern would not be found in the text
and we would return an exception or -1 or something like that. so how much time would this
algorithm take in the worst case, this algorithm would take O(mn) time where m is the length of the
text and n is the length of the pattern so this is where KMP Search comes into
the picture. the KMP Search can do the substring
search in O(m+n) time and let's see how KMP
Search works let's understand KMP algorithm using
this text and this pattern. So we start from here. 'a' and 'a' is a match, 'b' and 'b' is a match 'c' and 'c' is a match. 'd' and 'x' is not a match so this is the point where we find no
match so our aim is to not go backwards in
this text so what we are checking is right before
'd' in this substring, is there a suffix which is also a prefix of this
substring. Since all the characters are unique there is no way there can be a suffix
which is also a prefix so what that means is our next
comparison will start from 'x' and 'a' again we'll understand it little bit more in
this example so we compare 'a' with 'x' it's not a match so we move to the next
character this 'a' and this 'a' is a match this 'b' and this 'b' is a match this 'c' is a match, 'd' is a match, 'a' is
a match, 'b' is a match 'x' is not a match so again we check in
this substring before 'c' from 'a' to 'b' Is there a suffix which is also a prefix so if you look here, this "ab" is both
the suffix and the prefix of this pattern what that means is, since we have matched
till 'x' till this point . The character right before 'x' must be "ab" which is "ab"
so what that also means is that since this is also a prefix we don't have to match "ab" again and
the next match can start from 'x' and 'c' so what happened here is we didn't have
to look backwards in the string to see where to start the, where to start the next match from. we can
start from right here and here since this suffix is this prefix and since this suffix is this
prefix and since this "ab" exist here, we've already matched them and so we start
comparing 'c' with 'x'. So 'c' and 'x' is not same so we chevck, in here,
is their a suffix which is also a prefix? it's not, so finally we start
matching 'x' with 'a', so 'x' and 'a' is not same so we go to the next character, 'a' and 'a' is same,
'b' is same, 'c' is same, 'd' is same, 'a' is same,
'b' is same, 'c' is same and then, here we have a 'd' and here we
have 'y', so again we check in this substring before 'y'. Is there a suffix
which is also a prefix? so this suffix is also a prefix what that means is, since we've matched till
'y' this "abc" must be right there that's "abc". so what also means is that, since the suffix
is also a prefix there is no reason to compare "abc" again
and the next check can start from right here so this 'd' has to be compared with
this one So this 'd' and this 'd' is same then
'a' is same, 'b' is same, 'c' is same and 'y' is same so this pattern exists in this text from
this point onwards. Hopefully this helps you understand how the KMP Search
works using suffix and prefix information. So the only thing is how do
we do this efficiently? so in the next section, let's look at that
so let's see how we'll efficiently compute if suffix is same as prefix and what point
to start the check if there is a mismatch in the character between text and
pattern so here we have a pattern "abcdabca", so first
is always zero so this is my i and this is my j. if there is a mismatch if i is not same as j so we put zero here
and we increment our i, again i is not same as j, so we put zero here and we
increment i.i is not same as j, so we put zero here and increment i now i is same as j so what we do is, the value we put here
is whatever is the value of j which is zero plus one so
this becomes one and then we increment both i and both j so what this one means? So let's suppose
our text was "abcdax" and so on and our pattern is "abcdab" so there
will be a match till this point and this will be a mismatch right here between 'x'
and 'b'. so point is where do we start the next comparison in the pattern so since the value here at the index of
'a' is stored one it means that there is already, there is
a prefix which is same length as the suffix one, so we have already
compared this 'a' and this 'a' so the next point to check will be this 'x' and this 'b' so this is where this one will come into
the picture so again 'b' is same as 'b' so the value stored here will be the j which
is one plus one that is two and then we increment i and we increment j. 'c' is same as 'c'
so we store value two plus one three and then i goes here and j goes here so now 'a' is not a match with 'd' so this is little tricky so what you
do is you go to the value on the previous character so that's zero so j will become zero and then you compare, is
this guy same as this guy so those guys are same so the value here
will be whatever is the value at j which is zero plus one so this is
our temporary array and this array will help us doing the substring search
efficiently. The Time complexity to build this array is O(n) and the space
complexity is also O(n) next let's look at another example of how to
build this array so here i have a pattern "aabaabaaa"
so first is always zero, so i is here, j is here, since this 'i' is same as 'j' the value
here will be the value of j plus one so this is one and then we increment both i and j
so i comes here j comes here this value is not same as this value so
the next point of j will be the value in the previous character which is zero so j
becomes zero this is not same as this so this value becomes zero and we
increment i. this is same as this, so value here will be j plus one, so this is one and then
we increment both i and j. Again this is same as this, so this becomes two and we
increment both j and i. again this is same as this, so the value here will be
two plus one three and we increment both i and j again they are same so value here
will be three plus one equals 4 and we increment i and then j. Again they are same, so value
here will be five and we increment i and then j at this point 'b' is not same as 'a' so the next point j will be the value in
the previous character which is two so j jumps directly to two, again this 'a'
is not same as this 'a' so j will jump to the value in here so that's one
so j comes to this point. At this point this 'a' is same as this 'a' so the value stored here will be
whatever at j is one plus one, so this is two so this is how we build a temporary array
for this pattern so again the time complexity is O(n)
and the space complexity is also O(n) In the next section let's use this array
to actually do the substring search so finally let's apply the substring
search on this text for this pattern so i've prebuilt my array
using the logic which we discussed in the previous section of this video so
now we compare 'a' with 'a', so they are same then 'b' is same, so this 'c' is not
same as this 'x' so then what's the next point, so since we are
not going to look back in the original text we check what is the next point we compare this
pattern and this text so the value, we go back to 'b' and
check the value, the value here is zero so the next point of comparison will be
this 'a' and this 'x' because the index because we're start looking from index zero
which is 'a', so this 'a' is not same as this 'x' so we move forward this 'a' is same as this 'a', then this
'b' is same as 'b','c' is same as 'c', again 'a' is a match, 'b' is a match and then at this point,
it's not a match so again we go back here and then two so what this two is saying is that there
is a prefix of length two which is also a suffix which is clear, this "ab"
and this "ab" are prefix and suffix. what that also means is that there must be an "ab" in the
original text, this "ab", so we can safely ignore, we can safely ignore this two "ab"
and start comparing this 'c' with this 'c' onwards which is how this two
is useful so we start comparing from the position
two in the substring. So 'c' is same as 'c' so we move forward, 'a' is same as 'a',
'b' is same as 'b' and 'y' is same as 'y' so this pattern exists in this text the run time complexity for this algorithm
will be O(m) and where m is the length of this text the time to build this array,
the temporary array is O(n) and so the total time for this
algorithm will be O(m+n) and the space complexity will be O(n)
the code is simple and it's in the description section of the video so this
is all I have to talk about KMP Search I would ask my viewers to like this
video, share this video and then comment on this video, also check out my Facebook
page and my Github link github.com/mission-peace/interview/wiki Thanks again for watching this video.