Find the k'th Largest or Smallest Element: From Sorting To Heaps To Partitioning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Okay so what we're going to talk about today is a  very interesting problem that's going to challenge   our thinking skills and challenge our ability  to remember what we know about our fundamental   sorting algorithms about when we hear key words  like the largest of something or the smallest   how do we think about this right it's going to  challenge whether we've internalized this stuff   but first before we get into the question as  always if you haven't subscribed to the channel   subscribe to the channel and like this video I'm  wearing a Virginia shirt I don't go to Virginia my   brother went there he graduated I go to Maryland  to study in computer science but schemed out the   way because this shirt is very orange and large  yeah so what this problem asks of us is to find   the k'th largest item the array might be sorted  it might not be sorted so here's an example we   have an array K is 2 that means we want the  second largest item the first largest item is   the largest item in the array the nth largest item  is the smallest item in the array which would fall   right there so that makes sense and actually  let me write n so for clarity so the sixth   largest element is 1 because 1 is the smallest  element right so what I want to do is whatever   I want to say that we already know every single  thing we need to know for this problem on this   channel we've covered every single fundamental  sorting algorithm we've covered basically all of   the fundamental ones besides like the nuanced  ones we've covered how to do heaps how binary   heaps work we've covered how to do especially  quicksort which is going to be useful for this   problem because of the partitioning scheme  quicksort uses we've already covered how to   analyze recursive time complexities how to analyze  recurrence relations and and solve them using the   tree method right we've seen all of this we've  seen our fundamental algorithms we've seen data   structures like heaps we've seen how to analyze  and what we're going do is we're gonna tie all of   that together in this video which I think this is  a very appropriate question to tie it all together   first off if we're dealing with the largest of  something the smallest of something if we want   to find that those kinds of values immediately  we think of sorting and remember if we have an   array of items or just a sequence of items and we  know nothing about those items then what we can do   is at best a lower bound of n log n so let's put  our time complexities there I've done this several   times and I hope we really internalize these  by now these are like the key time complexities   these key classes of time complexities you'll  see and a really good resource to check out is   the Big O cheat sheet that link right there I  think I'll link it below these are the common   time complexities and if I do any type of sorting  I'm going to immediately fall right there n log   n that is where I start so immediately if I  want the kth largest item i sort it and i do   a backwards iteration from the end of the array  that makes sense so let me demonstrate okay so   we have this sorted array here and that's all we  do we have K is 2 we start at the end if it were   1 we would just pick that item the largest item if  it's 2 well we need to move back and then we come   to the 5 right so okay so bring those back we hit  n log n so what is the next best thing we can do   well I mean yeah we're going to get to linear time  eventually but what we can actually do is a heap   based approach and remember largest smallest we've  seen this pattern several several times and ok   maybe this might not lead you to the most optimal  solution but this is something to bring up in the   interview as soon as you hear these words you  need to immediately be thinking this and seeing   how it can adapt to your problem so why don't we  investigate what kind of heap we're going to use   ok so we want to use a heap and our interest  is the largest so I need you to invert this in   your brain if we want the largest items we want a  min heap if we want the smallest items we want a   max-heap and why why does this make sense so this  makes sense because if we want the largest items   what do we not care about we don't care about the  smallest items a minute gives us access to that if   we want the smallest items we don't care about the  largest items that's what a max-heap helps us with   ejecting those largest items right so what we need  to see is we want largest so we go with a min-heap   so why don't we make a min heap okay so what we're  going to do is we're going to iterate through the   array slowly going through it and we're gonna add  an item and when we hit overcapacity when we have   more than K items we're going to eject the item  that we don't want so start at the first item add   it to the heap we're on to the next item add it to  the heap we still haven't hit capacity which is K   which is 2 and then we reach that one so what  we need to do is we add 1 to the heap and then   we eject to the smallest item because we're over  capacity so one would get ejected and we advance   that that pointer and I know what you're thinking  we had the 1 there and what we could have just   done is we could have just peaked to the top of  the heap and seen that 1 is less than the smallest   item 2 so it has no chance to survive in the heap  it's going to end up being the smallest item that   gets ejected right well actually yeah we could  just do it that way but anyway let's just finish   up with this approach that would be the more the  smarter way to do it check if this item is already   smaller than the smallest item which is a constant  time operation but let's just keep doing it   this way and I want you to notice I'm this isn't  really scripted and you're seeing like the thought   process like you would slowly be like yeah I would  do it this way but wait that has this complexity   because if we inserted we would take logarithmic  time insertion and removal right and then you   would say wait we don't need to do that we can  just use constant time access and just peak the   smallest item and see if it even has a chance to  stay in the heap if it goes over capacity right so   that that's that and let you continue it this way  we're gonna throw this approach away anyway add   the five we went over capacity eject the smallest  item which is the two and then we have the six add   the 6 and then eject the smallest item we went  over capacity of to eject the three and then we   have the the last item four add the four and eject  the smallest item and again we would have known   that this item had no chance if we had just done a  peak but anyway we just remove the four and now we   have reached the end and the item that we want  is at the top of the mini well the theoretical   top depending what kind of implementation is  underneath so we have the five that's what we   wanted so all along what we were doing was we were  making sure that we kept out the smallest items so   that only the two largest items would stay so now  we know that the item sitting there at the top is   the item that we want so that is the heap approach  this is what we would think next we thought   sorting next we thought heap because we know when  we hear largest smallest we think heap so can we   do better than this and this is what we always get  asked oh I've said this before but we've we always   get asked this can we do better and it's kind of  like a cool little challenge to you know beat the   algorithm or you know do so let's let's go back  to the drawing board and see how we can do better   because these are the time complexities that is  where we stand right now if you change that end   to a K and the next jump is going to be linear  time and we want to see if we can make that jump   and we're gonna see how we can do that right now  okay so what we need to realize is when whenever   we're jumping between a time complexity we need  to understand what is wrong with where we stand   right now again little animation what is wrong  with what we're doing and I mean there's nothing   inherently wrong but the thing is we're doing  more work than we need in the first case when   we sort every single item that's essentially  finding the final position every single item   and I can answer the question what is the first  large second largest third largest I can answer   any of those but what was our original question  our original question was only for a single thing   what is the K 'the largest item okay so sorting  was too much okay and then we did a little better   with the min-heap we said okay only give me the  K largest items don't give me all of the items in   their proper positions so we can access any K any  cage number only give me the K largest and we saw   that this improved the time complexity because K  is often going to be less than N and that's going   to prove our asymptotic you know behavior but  the thing is we're still doing more work than   we need to the original question was what is the  K 'the largest element and what we need to realize   is that we are doing more work than we need to  and what we really need to do is something that   we already know how to do and we cover this on  our quicksort video but the key key nature of   how we solve this in linear time comes down to  partitioning partitioning is the key thing from   quicksort that we need to pull here and I would  highly recommend watch the quicksort video before   you watch this because this will be very confusing  unless you deeply deeply internalize what it means   to find a pivot partition around that pivot and  then return the index of that pivot if that is   confusing at all then I would highly recommend you  watch the quicksort video first which is somewhere   on the channel and that will help you understand  this so first let's look at the array and see what   what the final positions of these items really  are okay so what we need to realize is what is   the relationship between n K and where the final  item sits and and K and where the final item sits   so what we need to see is if K is 2 K is 2 and a  6 6 items second largest elements where will this   item sit what index will it sit at when everything  said and done if it were sorted so you see the   indexes right there you see 0 1 2 3 4 5 right so  where is the second largest item it's right there   that's clear to us but I want you to notice what  index is the 5 sitting at the value we desire the   second largest element is sitting at index 4 so  I mean this should be straightforward to get the   index of what we desire we need to just subtract  K from n I mean that makes sense because if the   array were sorted the third largest elements which  is b3 behind the end of the array and the end of   the array can be represented by n right that  makes sense so what we do is we say n n minus   1 and minus K will give us that index right there  it will give us the final index of the item that   is the the kaif largest item and this is important  to us because what we're going to utilize is quick   sorts partitioning scheme or not it's not owned  by quicksort right but it's just the idea of   partitioning around a certain pivot element and  it's going to allow us to bring this solution to   linear time we'll analyze all the stuff later the  recurrences and again I highly suggest look at the   quicksort video because I analyzed the recurrence  there and it'll help you understand the recurrence   here because they're a little different but anyway  so okay we've established that relationship and   minus K and again I want to say the code is in  the description so you can investigate that as I'm   going through this for this solution but now we  know n minus K is the index where the k'th largest   item sits so okay now we know that so why don't we  go through an example of trying to find this kaif   largest item and let's see how our decision space  needs to change as we partition and evaluate so   just follow me this is a little confusing but  we're gonna start with an example we're gonna   have a left bound and we're gonna have a right  bound and what we're going to do is we're going   to pick a pivot value the pivot can be anything  it can be the last item the first item and well   we don't want to choose a bad pivot a bad pivot is  the largest item or the smallest item for reasons   I explained in the other video on quicksort we we  want to choose a pivot randomly and the code is   done randomly in the description so why don't we  choose the three as the pivot so the pivot value   is three all right so what we're gonna do is we're  gonna partition the space around the pivot so swap   the pivot in to the right boundary so we don't  mess around with this value so swap the three   this value the pivot we just chose at that index  with the right boundary get it out of the way and   so where our partitioning will happen is from the  left bound all the way to our minus one we won't   touch our because that's our pivot sitting over  there our pivot value we pushed it out of the way   so that we can do our partitioning over here so  that it doesn't get in the way of anything right   so we're going to iterate from here to there and  we're gonna keep track of where the lesser items   the tail of the items less than the pivot lies  where that exists so what we're gonna do is we're   going to place a pointer there we'll just call it  I and then we'll have an iteration variable J okay   so what we're going to do is we're going to do a  comparison see if this item is is less than the   pivot if it's less than the pivot then we know we  want to move its to the tail of the items lesser   than the pivot because we want to section things  off okay so this is roughly how this is gonna go   again code in the description I might mess this  up as I often do but I and J so is the item at   J less than the pivot it's not so that means we  just advance J there's nothing we need to move   to I so is the item 2 less than the pivot 3 yes it  is so swap it into the section of items less than   the pivot the tail of that list is represented  by I swap I and J and we advanced the tail of   the list and we also need to advance J okay and  then okay so now we compare the item at J which   is J's iterating it's scanning it's scanning to  see items it can throw back to I which is keeping   the segment of items less than the pivot so okay  one is less than three so it deserves to be in the   section less than the pivot so move it to I which  it's if it signifies that section the end of that   section of items less than the pivot that's what I  signifies so throw the one back into that section   that's where it should be it's less than the pivot  so swap I and J and we advanced I and J so is the   item at J less than the pivot five it's not less  than three advance J is 6 less than the pivot 6   is not less than the pivot advance J and as we  see we hit the boundary and J has no more items   to hit and we see I kept I kept the tail and  that's important because what do we swap into   the tail of the items less than the pivot we swap  the four in the three the right boundary where we   kept the pivots safe we kept it safe over there  we swap it with the tail of this list and watch   what happens so this is the the deep deep deep  understanding on when you take away we covered   this in the quicksort video but what we see here  is these are less than the pivot remember what I   was saying we were generating a section less than  the pivot keeping the tail of it these items are   more than the pivot by default because those just  got pushed away because we're throwing back these   lesser lesser items and we sandwich we sandwich  the pivot item between these two sections and I   want you to notice where three is sitting three  is sitting in the final position in its sorted   arrangement so notice if this were sorted we go  one two three four five right oh one two three   four five one two three I put a box around three  because three is in its final position and this   is important because remember what we got here  we said the K the largest element K is 2 in this   case the k f-- largest elements must be at index  4 we found that our pivot item ended up at index   0 1 2 so where can the k f-- largest item be can  it be in this left partition or will it be in the   right partition the items more and we immediately  see it is impossible for the K largest item to be   in the lesser partition we have effectively  eliminated half of the search space and on   average if we pick a good pivot on average we're  going to eliminate half of the space we might not   do it sometimes we might do it other times but  on average if a we have a truly random pivot   we are going to be eliminating half of the search  space and this is what we just did our new search   space is 1 past the pivot because that item we  throw it away it can't be the case largest and   we keep our right boundary the same so again  I highly highly highly encourage if you want   to see the stricter rules look at the code in the  description but this is the overarching principle   behind the concept continue it's almost like a  binary search you know how you do a binary search   and you're like if this item is greater then I  want to go to the left because I we overshot in   value if this items less we under shot in value  this is exactly the same thing we undershirts   in the index that the item must exist in and that  allows us to effectively reduce again on average   reduce the search space in half so this this is  more effective we're gonna analyze this but the   reason this is more effective is we're going  to get to exactly what we want with much less   wasted work and yes if we do partitioning like  this and we choose pivots we have an N squared   runtime in the worst case but that only happens  when we choose the largest or smallest item in the   partitioning space and that is very very unlikely  to happen when we have large large inputs when we   have very large inputs it's very unlikely we're  going to pick up pivot that causes that splitting   to happen as badly as it could happen so on  average we're going to have well we'll see the   run time but on average we'll have the run time  we're about to calculate so let's calculate the   run time right now and by the way I don't have  notes for this I'm literally doing this on the   fly so if I make a mistake there'll be something  in the description but I hope I don't make a   mistake anyway let's analyze what we just did ok  so what we're going to do is we're gonna draw a   recurrence and then analyze it using our tree  method that we always use so T of n so we need   to think back we're going to make another call to  our function the the amount of work we do for an   input size n is going to be we're going to call  the function again on what size of input though   well we said on average we'll cut it in half so we  can just put n over two and again this is probably   gonna be a really loose analysis but imagine if  you're just like in the interview on the fly and   you need to do this I don't know if you'd want  to go this rigorous but I mean you could if you   want to but okay we're gonna split our input in  half but how much work are we going to do in the   actual level that we need to partition with so  think about this in the call we're going to do   our partitioning which is going to take this many  comparisons let me write it n minus 1 comparisons   Y and minus 1 well if the length is 6 and we're  going to go from index 0 to index 4 then we know   that we're going to do 1 less than the length of  the array in terms of comparisons remember this is   the for loop this is the for loop in our partition  and again code description you can see the exact   amount of iterations is going to be 1 less than  the size of the space that we're partitioning   with and that's going to be one less the length of  the input so that's the N minus 1 so after we do   that we're going to continue searching and when we  continue searching our search space diminishes by   about 1/2 so that's the call again with a reduced  input size we make another call so this is the   overarching recurrence and what we need to do is  we need to try to solve this so let's start with   our overarching input size which is n we're gonna  make another call with an input size of n over 2   we're gonna make another call with the size of n  over 4 we're keep cutting in half until we get all   the way to 0 I don't know if that's on the screen  so our base case is if we have just one item we're   going to do 0 comparisons so T of 1 is 0 so what  we need to do is we need to generalize the work   we do so at the top level and again remember  we've done videos on analyzing these trees   there's the merge sort one which is the best one  I think because that's the first one we did and I   talked a lot of introductory stuff we also have  the quicksort one so just saying anyway one sub   problem at the top level how many subproblems in  the next level one sub problem and then the next   level one sub problem and what we're gonna do is  we're gonna see that at the top level how many   comparisons do we do n minus 1 and then at the  next level how many comparisons do we do we do and   over to minus 1 why is it n over 2 it's because  our input size changed and that's going to change   how many comparisons we do remember I mean if we  had an input size 4 cut that in half we'd have   input size 2 2 minus 1 is 1 we do one comparison  so next level and over 4 minus 1 and so on so our   job is to generalize this work so the amount  of work we do we Vaart we already know how to   do this generalization is going to look like this  the generalized work at any level is going to be   n over 2 to the I minus 1 where I is the level and  on every level we'll only have one sub problem so   I don't even need that one and then the question  we need to ask ourselves is how many levels will   we have so we're going to have logarithmic level  amount of levels log log base two of n because   we keep cutting input in half and we can only do  that how many times we can only cut the input in   half log n times until we get to one which will  yield zero right so the depth of this tree will   be log n so what we need to do is we need to do a  summation and we can do a summation like so again   I'm not going from notes so I might make a mistake  but we can split this summation and it's going to   look like this okay so what you now see is what  we did was we split the summation we put all the   this head over to the I brought it over into that  guy right there I'll explain and then we brought   the the minus one over here we know how to solve  that so all we do is subtract top out from lower   bound and then add one multiply that by the inside  number which is one all we're saying is how many   iterations do I do ok I know how many iterations  how much work does each iteration count for one   so what we say is this that's what that becomes  right and so now what we need to do is we need to   take the guy right here and simplify him we can  just bring the end down so 1 over 2 to the I is   the same thing as saying 1 over 2 to the I well  let me reformat it you'll see what I mean we can   reformat it like that because if we do 1 to the  I it's the same thing as just saying having that   1 because we can multiply 1 oh we want stays the  same and what we do is we can simplify that using   the formula I was talking about and it'll become  this it becomes that by this formula ignore the   details again so what happens is we have this and  then we have a logarithm one have to log in over   this bottom guy and what we can do is simplify  1 minus 1/2 to just 1/2 and then on top we can   distribute that logarithm so then we'd have 1 log  n over 2 to the log n so what that just becomes   is 1 over N because 1 to the log n is just 1 and  remember this is log base 2 we have 2 to the log   base 2 of n we just reaped the end if he comes 1  over N it's like 10:25 p.m. this place is about to   close at 10:30 I don't even know if this is right  but we're rolling with it so it becomes 1 minus 1   over N over that 1/2 so we can just multiply  the two we can just actually factor the 2 out   because this is a 1/2 on the bottom and so if we  distribute the 2 n into there we get this we get   that and then extract the 2 again I have no clue  what I'm doing at this point I'm really rushing   yeah so what we see is on average the amount of  comparisons we do is going to be 2 times n minus   1 minus log n again if there's something wrong  I'll put in the teachers notes because I kind   of rushed to do this this was all in my head and  none of this work is checked so I would not rely   on it but what we see is that n right there this  is linear in run time we can upper bound this to   linear time because this is the dominating term  the end right here so that's what's going on right   there that's why the complexity is linear when we  solve the recurrence this is what happens we get   that it stays to linear time and again this might  be wrong by some factors but the dominating factor   is what gives us the answer it's what tells us  that this runs in linear time asymptotic ly so   yeah this is some rushed math and I have to leave  this place before they kick me out and get really   mad because I told them I leave in 20 minutes  30 minutes ago so if you have not subscribed   like so if you have not subscribed to the channel  subscribe to the channel like this video my goal   is to make more videos like this and make this  channel really big really helpful for people so   that they can study get the job of their dreams  do that stuff and have a happy life good life   that's what it's all about here and yeah that is  all that is this video oh and it's constant space   because everything is in place right console space  console space console space yeah alright peace
Info
Channel: Back To Back SWE
Views: 167,284
Rating: undefined out of 5
Keywords: sorting, searching, sorting algorithms, kth largest element, kth smallest element, min heaps, max heaps, heap data structure, Back To Back SWE, Software Engineering, computer science, programming, programming interviews, coding interviews, coding questions, algorithms, algorithmic thinking, Google internship, Facebook internship, software engineering internship, swe internship, big n companies
Id: hGK_5n81drs
Channel Id: undefined
Length: 29min 12sec (1752 seconds)
Published: Fri Apr 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.