Longest Common Subsequence (2 Strings) - Dynamic Programming & Competing Subproblems

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so welcome back to the channel and today we have another dynamic programming question the longest common subsequence and this is going to be dynamic programming this is a pretty famous problem so I'm not afraid to say it off the bat because you probably already know that that's the approach for this okay so how does this problem work so we're looking for the longest common subsequence when you see subsequence that means that what we are looking for does not have to be contiguous what does contiguous look like what I just underlined is a contiguous sequence that is all together there's no gaps what can I do to make it non contiguous that sequence right there is not contiguous so this is a subsequence a F is a subsequence a subsequence does not have to be contiguous but it can be contiguous a sub sequence can be contiguous and might be broken up like this it could be both of them all right so our job here is to return to our caller the length of the longest common subsequence between two strings so what is the longest common subsequence here so we see that they share ace and we also see that these strings share these and we also see that these strings share ages so the longest common subsequence the longest subsequence that these strings have in common is ADH ADH how long is that three so what we return to our caller is three so looking at this example what is the longest common subsequence so the longest common subsequence is we see they share G's but we're gonna underline this G we see that they share T's we see that they share ace and we see that they share B's so what is the length of the longest common subsequence for these two strings it's going to be four okay so you see here that this is contiguous but again this guy right here is not contiguous these are sub sequences they don't have to be contiguous so this is our job and as I said what we're uses dynamic programming for this problem we could try to brute-force this problem we could try to generate all sub sequences this would be exponential in time so what we're going to do is we're going to use dynamic programming and I'll explain why or how we would get to this in an interview because most of the time you probably wouldn't and these are difficult problems to do but we'll see how we can wrap our mind around this question so I created a little example here and I want to help you wrap your mind around this problem and gain a deep understanding of what we're gonna do later we'll get into our DP table we'll do all of that stuff but what am I really doing what are the subproblems whenever I do dynamic programming I care about subproblems remembering the answer to subproblems and things like that so what we have here is two strings so my job is my job is to find the longest common subsequence of these two strings so what I'm going to do is I'm gonna break this down into a call recursion tree so what I'm gonna do here is I'm gonna break this down into what each call looks like and what decisions we make at each point based on what we see so stay with me so watch what happens okay so we have a longest common subsequence called on these two strings this is just pseudocode or like this isn't even a real function but let's just imagine this is our function so what we have is string 1 and string 2 so what I'm gonna do is I cut this up into subproblems this is the hardest part about solving dynamic programming problems so I'm gonna look at the last character that is all I care about okay I have my eyes on the last character and what do you see are these characters the same or are they different so we see they're the same so what does that mean that means I have a lengthening of my longest common subsequence by one value so what happens is this is what happens so do you see what happened I broke it down into subproblems my eyes were on the last character and what I'm doing now is I said what's the answer to him the answer to him is one plus the answer to this guy right here and you see we broke it down we cut off the last character we cut off the last character so the answer to this is we chop off the B's and now what is the longest common subsequence problem what is the answer to the sub problem for these guys and where do my eyes go my eyes go to the last character are these characters the same they are not the same so my job now is is to see if I delete either of these characters if I delete either of those characters which which sub problem is going to yield me a better longest common subsequence so stay with me stay with me it's it's confusing let me draw this out so that you can see okay so can you see this now so you can see how I broke this sub problem down into if I remove this a and I take the longest common subsequence between these two strings and then I remove the Z and then I take the longest common subsequence between these two strings whichever of these yield a bigger longest common subsequence whichever of these calls yield a larger answer I'm going to take that as the longest common subsequence given this sub-problem and then I'm gonna add one to this in the stack frame but the key here is the best answer to this sub problem when we have mismatching characters is I try removing this character I solve it I try removing this character and I solve it so this is confusing and we'll see it's less confusing when we get to the DP table but what we're trying to do is we're trying to find the best sort of path to take and decomposing the string so that we can yield to the longest common subsequence and we're chopping off character by character by character by character we're cutting off characters and we want to see which combinations of chopping off characters is going to yield us the longest common subsequence so how do these problems decompose so this problem we look at the last character are those a match and we see that these guys are not a match so this decomposes into the max of ripping this character off and ripping this character off so this is what it looks like okay so that's how that decomposes and then let's decompose him let's look at the last characters and then we see that they're a match so what this means is from this subproblem this is a problem alone this subproblem has no idea what's going on up here what we can do is we can do the same thing one plus removing this last character because these guys were a match there's no competition that needs to happen we don't need to compete about whether we need to remove this guy or this guy we can just remove both because it's a match okay so this is where our subproblems now stand and you can see that this is kind of recursive in nature so what we're gonna do is what we're gonna do is evaluate this guy if I have the empty string and I'll have another string if either of my strings are empty what does that mean that means that there's not going to be a longest common subsequence I can't have anything in common with an empty string if that makes sense I can't have anything in common between empty string and AZ so the answer the length of the longest common subsequence is zero this is there's no magic here it just makes common sense that nothing versus something we'll have nothing in common so we're going to put zero here we've found a base case so we can evaluate this to zero the longest common subsequence between these two guys is zero okay and then we can decompose this guy and now what we can do is we can go downwards and again what do we do we look at the last character and it's a match so we do 1 plus removing both of these guys remember no competition needs to happen I don't need to do a max competition between removing him and him I know that I found a match so these characters can both be removed so I can do more work and I can evaluate against without these characters so what happens is I turn this into this ok so we found a match we added one and now we're evaluating against without either character we evaluate against empty string and empty string well empty empty string will have a longest common subsequence of zero because the empty string verses an empty string there's nothing to have in common if either of them are empty we immediately have a longest common subsequence of zero because an empty string cannot relate to anything it cannot have a shared subsequence it has nothing so what we do is that becomes zero so again we're not tracing the recursion exactly because this can be modeled recursively but we have one plus zero and we can evaluate this guy this guy becomes one and notice the longest common subsequence between a and a is 1 the length is 1 and notice that's true a and a are a shared common subsequence it's 1 so that makes sense so what is the max of 0 & 1 it's 1 so do you see here do you see how we're drilling down sub-problem until we get to solid answers so our solid answer here is 1 so do you see this subproblem a versus a Z the answer to the subproblem is 1 is that true and it is true a and a are matching they match and that is the longest common subsequence a and a the answer to the subproblem is 1 so let's materialize that so what's happening is this sub-problem is still being worked on down here we've materialized this guy so let's work over here so what is the longest common subsequence between an empty string and a character remember if I have an empty string I can't relate to anything so what happens is this becomes 0 1 plus 0 is 1 and is this true the longest common subsequence we've seen a a and a is it 1 and yes it is 1 do you see we either can match those days or we can either match a to a we were limited by 1 a so it can only be 1 so what we see here is what we do is we evaluate that to 1 that is the answer to the subproblem right here so we have our competition our competition is finished we tried removing him we tried removing him the yielded answer from removing him was one the remote the answer yielded from removing him was one or I may have swapped that but do you see we're competing and the competition is won by whichever sub-problem is going to yield a longer common subsequence based on who we chopped off because these guys didn't match if they did match we would chop both of them off but because they did not match we had to do a competition so what we see here is 1 & 1 what's the max of 1 & 1 it's just going to be 1 so what is the length of the longest common subsequence between a a and a Z do you see how it's 1 do you see how we can match this a to that a or we can match the final a in the first string to this a all we have is this aid to play with over here so the answer to this subproblem is 1 the subproblems answer is 1 and what is 1 plus 1 2 and this is the magical thing about this our answers - is that true am i right so look at these strings a a B and a ZB so what we see here is a a B so we can see we can match the A's and then we see the B's match and what is our output our output is right there the answer to this overarching subproblem is - and i want you to know that when I did this walkthrough I did not look at any code I did not premeditate this I literally just wrote this on the board and from my understanding of how this problem fundamentally decomposes I just wrote the recursion how the recursion to pan out and we got our answer so this is what I want to communicate to you yes we can memorize problems we can memorize how to do certain things but if you have a real deep understanding of how these subproblems caught up into each other then you can do this problem without needing to see the code and then it becomes easier to go into implementation with understanding so what we're going to do now is we're going to look at the dynamic programming table to really reinforce what this idea of subproblem decomposition is all about so let's look at that table right now okay so what I'm going to you here is we're gonna walk through the dynamic programming table and how we would do this to cache subproblems so this problem is very similar to the levenshtein distance I have a video on that and I really feel bad because I was not completely clear about what each of these cells meant it is very very simple what does this cell mean that cell right there means this so do you see that right there do you see how it is a function call all each of these cells means is a different substring chopping that we call this function on their subproblems what I'm doing here is not different at all from what I just did in the walkthrough that I showed you it's just we're formalizing this because our code will have to iterate and store stuff in a table so this is what that cell right there means what does this cell mean what does that cell right there mean it means this it means the sub-problem a G a G and just the G just the G so that is what it means when I'm breaking these into subproblems they're separate function calls they're separate function calls were breaking things down into so the key here is to understand our subproblems so let's define our base cases here this is not how the code would do it but code is in the description if you want to see the concrete implementation I have put it there the code is in the description but I'm going to walk you through this just as but you know you would normally think about it so let's see how that goes so we have a versus an empty string what is the longest common subsequence remember what I said if either is an empty string they cannot have anything in common an empty string cannot share anything it does not matter what string you give me if I have an empty string I cannot find any common subsequence between them because it's an empty string so this is going to be 0 and then a G a G versus an empty string again 0 and then a G G versus an empty string zero and then a GG T versus an empty string that's going to be zero and then again we're going to compare against an empty string we're gonna call that LCS function on an empty string zero and zero okay I just realized I forgot a column right there my bad so yeah empty string versus an empty string that's going to be zero okay so now we're back to normal so G versus an empty string anytime I'm comparing against an empty string what is the answer to the longest common subsequence problem zero g x versus the empty string zero GX t versus the empty string zero G X T X versus the empty string nothing in common so the rest of these guys compared against an empty string are zero okay so now the work begins so what we're going to do is what I'm gonna do is I'm gonna see are these characters equivalent I'm gonna do a normal iteration row by row column by column like that and what I'm gonna see is at this cell are these characters the same a ng remember how we cared about the last character we're caring about the last character here these characters are not the same so what I do is I compete I remove the a if I cut the a off I'm going this way if I cut the G off I'm going upwards if that makes sense if you really understand what's happening in terms of our substring matching here we see that we're comparing A to G if I want to remove G I just go up and cut it out if I want to remove a I just go this way and cut it out so what I do is what is the max of 0 & 0 so the answer is 0 so the answer to the subproblem of a compared to G is 0 so what we do is we continue our iteration G versus G we have a winner so G is the same thing as G so we do 1 plus removing both of these characters how do I remove this G go up how do I remove that G go to the left and now our sub problem is the g-string and a so we do one plus the answer to this subproblem and that is the answer to this subproblem remember what we're doing right now is not different at all from what we did in the beginning the beginning was the understanding now this is just how we would implement it and do comparisons in a bottom-up manner so what we do is we compare g and g it's a match we go here 1 plus 0 is 1 G and T that's a mismatch what is the maximum item 1 and the reason I'm speeding up here is because the understandings were nailed down in that initial portion where we talked about the recursion and I talked about the subproblem e now we're just worried about implementing this and getting things going in terms of filling this table out so now a versus g mismatch i want to compete subproblems what's the max of these two guys one ok so now i have b and g they're not the same thing we compete subproblems the max is 1 ok so we're just gonna continue like this and our answer to the grand subproblem will be down there so it's very straightforward X and a mismatch we put 0 here the max of these two guys is zero X and G these are mismatches the max of these two guys is 1g and X mismatch the max is 1 T and X mismatch the max is 1 and X mismatch the max is 1 between these two cells and then B and X the max between these two cells is 1 and again we did that because that was a mismatch so on to the next row we see that mismatch mismatches happen all the way up to the T so we're gonna take a max here the max between 0 & 0 is 0 the max between 1 & 0 is 1 again G is a mismatch against T 1 and 1 the maximum is 1 and then T and T we have a match we can extend the longest common subsequence between the T's removed if we remove the T's so to remove this T I go up to remove that T I go to the left and I have a subproblem answer of 1 1 plus this problem answer here with the t's removed is 1 plus 1 which is 2 so the longest common subsequence between AGG T and G X T what is the length of the longest common subsequence it's 2 and as you can see for a fact it is 2 right there you see there's the longest common subsequence is that match up there so the tiene it's a mismatch we need to do a competition who's the winner the max of 2 in 1 is 2 so T and B is a mismatch the max of 2 in 1 is 2 on to the next row a and X mismatch so what we need to do is max of 0 and 0 is 0 G + X mismatch the max of 0 + 1 is 1 G + X again mismatch 1 in 1 the maximum is 1 T and X it's a mismatch the max is 2 between 2 and 1 the max is to a and X a mismatch the max between 2 & 2 is 2 so B and X it's a mismatch so 2 into the max is 2 a and a we have a match what we do is remove the a remove the a the answer to the subproblem with the A's removed is 0 0 + 1 because I just got a match and I can extend the longest common subsequence at the subproblem right here so it becomes 0 plus 1 is 1 a and G mismatch the max of 1 in 1 is 1 a and G mismatch the max of 1 + 1 is 1 T and a mismatch the max of 2 in 1 is 2 so a and a is a match so what we do is 2 plus 1 is 3 so B and a is a mismatch so what we do is compete 3 vs 2 3 is the max so y so what we see here is why does not match any of these guys so y is just gonna have max operations happen the max of 0 + 1 is 1 the max of 1 + 1 is 1 the max of 1 + 1 is 1 the max of 2 in 1 is going to be to the max of 2 + 3 is 3 the max of 3 M 3 is 3 and why did I just rush through that and do these maxes why does not match any of them so we would never do an operation where we do a plus we don't extend any longest common subsequence subproblems so what we're gonna do is be and we see that we do have a match be at the final position so every position up to that we're going to be doing max comparisons so max of 0 & 1 is 1 max of 1 & 1 is 1 the max of 1 in 1 is 1 the max of 2 in 1 is 2 the max of 3 & 2 is 3 and we see that the B's match here B's match so what I do is I come over here and what I do is I do 3 plus 1 I remove both of the B's 3 plus 1 it becomes 4 and the final answer for this problem is 4 and is that true am i right am I wrong I don't know whether I did this right so let's see what these strings actually look like okay so do you see them so what is the common subsequence between them what's the longest common subsequence and you see that our answer is correct the answer of 4 is correct and what we have here is the concrete answer of what this looks like so here's our dynamic programming table I don't want to make it about the table I don't want to make it about memorizing this I want to make it about understanding what is really happening and okay this is not intuitive at all I'm going to be outright and honest this how are you going to get this in an interview the thing is the more you practice problems like this if you know how to do the levenshtein distance then you're going to have a much easier time tackling a problem like this that deals with substring comparisons and decompositions like this so the more problems you do the higher the chance that you have that you can tackle a problem like this and use your algorithm knowledge to solve other problems but it's very difficult and and that's one of those things I have to be honest this is a difficult class of problems so anyway let's continue on and let's talk about time and space complexity very briefly ok so the time and space are pretty straightforward for this one we had M times n time at M times n space so n is the length of string 1 M is the length of string 2 so the time we're going to spend is we're to spend time solving all of these subproblems how many subproblems are there we can upper-bound that amount by this factor M times n so the space can be upper bounded by the amount of subproblems that we're cashing and the amount of subproblems we're cashing is M times n we're upper bounding that asymptotic aliy and again I have a video on asymptotic analysis if you want a deeper understanding of what I mean by that word so that's all for this video if you liked this video hit the like button and subscribe to this channel if you want to see more videos like this so yeah [Music]
Info
Channel: Back To Back SWE
Views: 133,995
Rating: undefined out of 5
Keywords: longest common subsequence, dynamic programming, longest common subsequence 2 strings, common subsequence, Back To Back SWE, Software Engineering, computer science, programming, programming interviews, coding interviews, coding questions, algorithms, algorithmic thinking, Google internship, Facebook internship, software engineering internship, swe internship, big n companies
Id: ASoaQq66foQ
Channel Id: undefined
Length: 25min 31sec (1531 seconds)
Published: Thu Feb 14 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.