Finding the Closest Pair of Points on the Plane: Divide and Conquer

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we were looking at the sort of the most basic the most common recurrence relation that comes up where you're dividing into two different pieces subproblems that's the two and then the division is is half each these two could vary and other kinds of problems and then you're combining the answers in something which is in time proportional to n the number of these so we've seen merge sort and last time we saw the problem of computing the minimum number of we computing at the minimum but the computing the number of inversions in a given permutation and then you have the problem of a related problem where the permutations are generated on a tree all right and all you have available to you is the flipping at each node each internal node of the tree to generate those permutations okay so today I'm going to show you another start off by showing you another divide and conquer algorithm that has the same recurrence relation so the analysis will be a Time analysis will be very similar the issue is always here this tends to be the the problem how do you actually do the combining or the merging of the two subproblems in time that's actually proportional to n that's that's usually the trick and this is a particularly application closest point problem and so here the problem is that you have points on the plane okay so on like on the blackboard so endpoints and between any pair of them there's some distance so whatever distance straight line distance between these two this would this straight line distance is shorter obviously you can just see that by looking at it this looks even shorter and just by looking at it that looks like unless this is a contender one of these two looks like the shortest distance among all pairs of points that are on the plane there so we want to find the pair or in case there's a tie I could say a pair with the smallest distance between them Euclidean distance straight-line distance okay well obviously you could just compute this directly for every pair as their entries two pairs roughly N squared for tional 2n squared and you could just grab a pair at a time each point is specified by an X and a y-coordinate also nothing too surprising there how we're representing the points and you could certainly just if I grabbed a point here and a point there then I use a green whatever you can yeah you all know how to compute the direct the Euclidean distance between a pair of points in terms of their x and y coordinates okay so you could do that so certainly order I would say theta N squared operations suffice okay for each pair of points the amount of arithmetic that you're doing to figure out what their distance is is some finite number independent of n just depends on knowing their two coordinates and doing a little rithmetic squaring taking a square root whatever but so that's some fixed number of operations per pair but if you did it that way you would have something on purport proportional to N squared here's to look at and our goal is obviously going to be to do this faster I should mention something that the book mentions divide and conquer often and then not always but often is a method that speeds up algorithms or speeds up gives you a faster solution to problems that already have reasonably fast solutions so a solution that runs an N squared time is reasonably fast it's polynomial time that's the distinction that's made this is a polynomial in N and being the size of the input and the divide-and-conquer is going to improve this from N squared down to n times log n ok now it's not always true there are some examples where divide and conquer is a technique that's used that gets us from something which is exponential time down to something that's polynomial so the clay the comet they were making isn't always true but it's it's often true still a very valuable it's a valuable improvement to go from N squared down to n times log N and the book has some gee-whiz numbers at the beginning in most books like this do where they tell you they show you how fast n squared grows as compared to n times log n and they're definitely applications we're having an algorithm whose running time is proportional to n log n made the problem practical to solve where as in N squared it was not practical of course it all depends on the size of n that come up in the in the real applications I'm just emphasizing this because in fact I didn't have a good intuition about the difference between N squared Nimmo again gut feeling appreciation for that until I actually ran some ruts and programs and ran them intentionally one running an N squared time the other one the algorithm with n log n time and it's you don't have to push up in those applications didn't have to I didn't have to push up and rail are before I really started to see the difference one I was sitting there waiting for the output or the program to finish and it was really annoying and the other one came back in the blink of an eye and and that was as n got bigger I ended up waiting more an hour two hours and the other one was just finishing instantaneously so moving from N squared and n to n log n can really be important depending on how big n is alright so we're going to find this in n sin n log n time I make another comment about these things why another reason why we go from N squared down to n log n is because we can and and because we can it it informs us about the nature of computation so I once went to a lecture given by a statistician that was essentially about this kind of a problem they needed they needed to find the closest pair of points these points represented something had some meaning and he and they were big you know these are like all the stars in the sky that's a hundred billion or so that we can see and that makes n very large and he was saying it's really unfortunate that to solve this problem you you have to look at all pairs of points you know it was just absolutely obvious to him that you had to this is just what nature required there may never even occurred to him that that might not be true and and so again I think it's useful therefore to have this kind of thing that you can pull out and say well you know your intuition about computation isn't that well-developed it doesn't necessarily require N squared operations you don't always have to look at every at every one okay so having said all that how do we solve this problem well I said it was divide and conquer so what's an obvious kind of division yeah left or right or up or down or some diagonal that would be a little more clumsy but yeah left or right tonight we'll do so how are we going to do that efficiently how to provide efficiently by left and right I mean obviously we want about half to be on the left and not have to be on the right and how do we figure that one out quickly how do we grab about half of them to be on the left and half them to be on the right now if we take the leftmost point in the rightmost point and divide that by two that won't necessarily capture half on the left and half on the right because their spacing is uneven yeah well okay the word is that's the right word is the median but what do we how do we find the median of do a median of medians you what do a median of medians do a median of medians too complicated yeah you want to take that line but again we wouldn't necessarily have half on the side and half on that side we don't know yes yeah a voice from the wilderness no way right we could take the x coordinates of all the points and sort them I heard the I heard the expression real quick what does that mean okay we know we can sort an n log n time we saw we saw a merge sort and and everybody should know that anyway before we from a previous course so we can sort endpoints in n log n time and we can take advantage of that so n numbers in n log n time so we can go through the points that represented each point went x and y coordinate and so as a first step we could sort the points by their x coordinates okay so let's call that into a list I call X okay so they're sorted now by their x-coordinate and we can certainly do that in order n log n time and log n time and then certainly we can just find the middle entry now we're not we're not talking about the middle in terms of its position but the median entry in the sense that half are to the left and half are to the right that's certainly doable it's going to turn out it's also useful it's going to be also useful to have a list of the points sorted by their y-coordinate may not be obvious well it shouldn't be obvious yet why we want that but we can certainly do that at this point though so we're sorting the original set not not this one we're sort the original points well I could have set this one too because we're going to resort by by their y-coordinate by their y-coordinates into a list let's call it Y and again you can do that in order n log n time and two n log ends are still order in log n so so far we're within the time bound that we're that we've advertised and we're shooting for okay so this is going to be reduced to n log n okay having done that well we just said what we want to do is divide the points into a left side and a right side roughly half on the left and half on the right so divide the points into sub lists P and Q sub lists with half well all points in P to the left of all points in q2 equal size roughly equal size okay so just looking at this this is going to be P this is going to be Q got about 1/2 over there 1/2 over there and well this this is drawn so it looks like it's also the middle of the X range that wasn't intended it's the middle it's the median point of the points sorted by their x coordinates okay and then divide and conquer what's what's a name for this problem closest point okay well a closest point problem I'm not following I'm following the logic that's in the book if you read the section on this algorithm you'll you'll see it's the same but I'm not using their notation that then so I'm just calling this the closest point so we want to find closest point among all n of those let's see what's the original set called I don't know capital n of size n all right n points set an of points and so over here once we've divided them divide and conquer means we're going to find the closest point problem in P which has n over 2 points and we're going to solve the closest point problem in Q which has n over two points so solve sub problems like that and then we're going to somehow use the solutions that we got here plus some additional work so that in order n time we're going to find the closest pair of points over all of all of these points in the set n so use the to salute two sub solutions and something on the order of n new operations to find order to solve closest point problem and n ok so as in all these divide and conquer kinds of things this is where all the real work is being done where the interesting aspects interesting details of the algorithm we're going to be done and just as in the counting of the of the crossing lines we looked at we know what the closest pair is among points where both points are in P that was solved as a sub problem we know what the closest pair is among pairs of points that are in Q so what's left to consider is just pairs where one point is in P and one point and cue okay well how many pairs of points are there how many pairs in P and how many I mean how many pairs of points where one is in P and one is in Q it's N squared over four so that doesn't really help because we're trying to do something which is proportional to n and so you know yes we've we have fewer points pairs than before because before we had something like N squared over two but now we have something that looks like N squared over four but that still doesn't that isn't going to be helpful enough any other ideas well you have two sets of shortest closest pairs that you can use that to limit the points you have to consider okay yeah and others you have another comment yeah I sort of intuitively you don't want to look too far to the left and too far to the right I mean somewhere along the border here is probably the where the candidate pairs are going to lie of course it might it may be that all the N over two points in left side really are very close to the border and so on so they are on in terms of the right side so you can't necessarily that that comment alone will not limit the number of pairs below n squared and we could still have in worst case some kind of N squared behavior so we're going to have to look a little more closely at the geometry and make some comments make some observations all right well let's say that the closest pair of points in P had a distance Delta P and the closest pair of points in Q we found had a distance Delta Q those are the two closest distances that we found in the two sub-problems and let me just assume that Delta P is less than Delta Q so and I'll just call that Delta so remember the recursive structure you have to convince yourself that we really know this at the time when we're trying to do step 5 so we've divided recursively we're making these recursive top-down calls divide divide divide divide you finally get down to something a base case which I haven't written up but you know it's clear enough if you get down to a side that only has two points you definitely know what the shortest distance is and so then you cut you're coming back the recursion is finishing some level and starting to come back and so when you come back and you get to here you have in your hands a an actual Delta that was the smallest that you've seen so far so you can use that at this step okay there is a delta that you that you're aware of all right so let's just start looking at some geometry and what we can what we can say about the candidate pairs with respect to Delta so here we are here's P here's Q and in very quickly looking at the book before the lecture I couldn't figure out what they intended Q to be on this side and P to be on that side but so again when you look at the actual writing the way they've written it it won't it may not exactly correspond to the way we're doing it here but certainly the ideas are the same okay so this was the this was the median dividing line and there's no loss in assuming that one of the points here is actually on that line half of this side and half of that side let's just say that there is a point in P that's actually on that line okay this line is called L and the point has x-coordinate x star it has some like word it also but we don't need to pay attention to that right now all right so the first observation that they take is that there's pee let's assume this is distance Delta and this is distance Delta okay so X star is obviously the rightmost point in P because it's on that dividing line there's really nobody else beyond there and P and this is a distance Delta that we've gone back to this is distance Delta okay so here the first claim is if X I guess there's a point X Y is a point where X is less than L minus this Delta then X Y can't possibly be in the closest pair crossing L remember we already know what the closest pair is strictly in P and we know what the closest pair is strictly in Q so we're looking for those pairs that are the closest among that that cross L so 1 1 is going to be in P and 1 is going to be in Q and if we have a point here let me to make it the same same L is simple why is that well let's take a point here which is to the left of L minus Delta okay so there's no way that this is can this can be among the oops I didn't mean to say that can't be in the closest pair crossing ah you might be actually I can't be in a pair crossing L that has a distance smaller than Delta that's the way I wanted to put it okay so this point is to the left of capital L by more than Delta and therefore what's the shortest distance from this point to any point in Q it's got to at least be Delta there's no way for this point to get to L and beyond to the distance less than Delta okay it could go off at a different angle but that only makes things worse makes it longer so the distance from X from this point over to anything in Q is at least Delta and therefore that pair will not be the overall winner okay we're looking for the overall pair that's shortest among all the pairs of points and so you don't have to think about such a point that's useless well same thing over here any point where the where the x coordinate is bigger than L plus Delta so X is bigger than L plus Delta that's a loser also okay all right so we know we when when people said and somebody said earlier you just look close to the border this quantifies what close to the border means at least in terms of the Delta that came from the subproblems okay but again that by itself is not so it's not enough because you might have N squared of these pairs might have all the points in P here and all the points in Q there so until we were until we look at that possibility we don't know that that's the case all right and at a point of implementation how would we find all of the points that are within Delta of L okay well remember we're trying to do this merging or coming back from the recursion in a time bound that I just erased this is our goal T of n number two T and over two plus some constant of n so we're in step five over there and we're trying to implement that in something that's proportional to n and we've just realized that we don't have to look farther than Delta to the left and we don't have to look further than Delta to the right how do we find the points that are in there within this time bound well we just look through all the points in P we have P in some lists there's time to look at them all and we just check with their x-coordinate is we know what L is and we just which ones are close enough and they're plenty this time to do that because this is our time bound and there's only n points and similarly on the right we can do that so restricting the points to the ones that are in this interval is not that's not an issue and not difficult okay so now you have let's say a point here X Y it's in this interval and so it we can't exclude it yet as being in a pair that actually has a difference smaller than but it looks like that angle here maybe this distance is less than Delta so we can't we can't exclude this one yet and the the obvious thing we could do of course is just take a look at X Y and every other point that's in this slice right but again there may be too many of them we don't we still haven't rested how many there are over here okay so that and we won't we don't restrict the total number that is over here but we will restrict how many of the points over there we actually have to consider together with this point and we'll also have to do look at the implementation of how we find those quickly alright so the book says that for any point here there's at most 15 points on the other side that you have to look at okay now the 15 is a little mysterious I think it's easy to see 30 you can be even more clever and get down to a bank or something anyway it doesn't matter is a fixed finite number that's independent of N and so that's what I want to show you at this point and then we'll have the implementation question of how do you find those quickly but what you can see though is that for every candidate on here in this slice if it only has to look at some fixed finite number of points over here whether it's 30 whether it's 15 whether it's eight whether it's whatever it is it's a fixed finite number independent of n then for it it looks at those fixed finite number figures out those distances and for every other one that's in here it does that for its list of 15 or 30 and therefore the total amount of work that would be done in in this phase five would be proportional to n there's n points over here and they're only looking at let's say 15 neighbor 15 other points in Q then we definitely can achieve this bound and the solution of this recurrence is M log N and we've done what we claimed we would do okay so where does this 15 come from so I'm going to basically draw this again but without all the clutter so here's this is this is the dividing line L and now I'm going to take another line here which is Delta over two okay and another line which is Delta over two okay and and then I'm going to divide up horizontally in increments of Delta over two all so okay so let's just look at just two let's look at some sort of in the middle a little bit easier to look at Delta over to Delta over two so we've divided up we divided up this slice over here a little more finally both in terms of the x-coordinate and the y-coordinate okay so now you can ask how many points oh actually I think I think they give a name to the set of points over here so s equals the subset of P in the Delta slice so s is a is the set of points and P of course itself was a subset of capital n capital n was the whole set of points all right now here's the question one question how many points and s can be in one of the one of these boxes Delta over two by Delta over two boxes any answer yeah just one is that ways that what you said and why is that yeah if you have two in here well let's just make them as far as apart as possible let's say they're in the corner okay so what's the distance between them this this is Delta over two this is Delta over two so that distance two points in a box have distance yeah less than or equal to Delta over two so Delta over two squared plus Delta over two squared square root okay which is equal to Delta over two right - a square bit - anyway it doesn't matter less than Delta okay but we said the Delta was the smallest distance among any pair of points in P or Q in fact so you can't have two I picked the two that are farthest away obviously any two they're not on the corners or even closer so if we had two points inside this box or even at the at the corners of the box we would contradict that Delta was the smallest distance that's found so far among the points in P or the points in Q alright so now let's consider a candidate there's some point which is in this Delta slice and arrange the boxes so it's on the same x-coordinate does that point all right so I don't think we anyway here's a point XY okay we wanted to argue that oh I should make one of the point the same kind of conclusion holds on the on the right side okay so if we look at these boxes over here this is Delta over 2 2 l 2 over 2 and we similarly know don't / - we similarly know that we can't have I think I want one more I want to go down this way this way to getting kind of cluttered here but I hope everybody is seeing what we're doing ok so here's a point a candidate point in this Delta slice and we have to consider some points over in the other Delta slice in queue that might be together with this point to have a distance less than Delta it might be a winner in comparison to the best we found so far all right we just argued that over here you can have at most one point per these boxes right and I've drawn these out so that there's 1 2 3 4 5 6 7 8 I mentioned an 8 earlier right a 15 and 8 let's see if this works so there's at most one point in these boxes if you look beyond those boxes well we know if we look beyond that way the distance is too big it's bigger than Delta I mean the point the other point together with this one has to be within this Delta slice there's no point in looking that way if we've gone up here two of these boxes so this distance is Delta right so I put this I put this these boxes so that they're aligned with this point XY all right is there any Yi to go up in this direction a point that's over it's in that's above I've written four boxes here is there any reason to look above two boxes that are above this one well that's this is Delta over two Delta over two so this is Delta I guess by diagonal yeah okay so by diagonal you could certainly possibly get to some point over here that's distance less than Delta all right well but if I go up so that's this is the eight and if I do another two levels okay now this distance is two Delta okay that distance is two Delta and if you go by a diagonal and I think that's enough anyway there's some finite number of these boxes where even along a diagonal well yeah the diagonal is going to be bigger than the than that coordinate anyway so certainly going up four boxes is of no use right and going down four boxes is of no use and if there's one at most one node 1 point per box now we're talking about 16 where did the 15 come from I don't know by looking at this a little more closely but if you look at more closely you can get down I think to 8 but it doesn't matter in terms of our business here we're trying to get this CN so certainly now 16 boxes is enough to argue that any point outside of these boxes is there's no point outside of these boxes in queue it could be the paired point with XY to have a distance less than Delta it's just by simple geometry not possible okay and as I said you could be a little more precise and look more closely and go from 16 down to 15 which is what they do the book but I think if you can get to 15 you can also get down lower so good you know enjoy yourself if that's what you want but anyway for our purposes we have shown a finite number like 16 which is sufficient so we're going to get the idea I think I'm really blabbering this just because the geometry yeah or he'll do the range of Y the number of boxes came from Delta and Delta was it was the value that gets inherited as your issue going up the recursion or you're coming back from the recursion but the point is that what we're seeing is for each candidate point in this Delta slice there's at most 16 points on the other side that could possibly be paired with it to have a distance less than Delta and therefore be a winner but your question was what Delta depends on oh okay maybe you're anticipating the next point all right the next issue yeah okay so what we've proven is that for every point there's at most 16 other candidate points on in queue for every point in P this is most 16 in Q that could possibly be paired with the point in order to be have a distance less than Delta now how do we get our hands on those okay yeah all right so so it was a little premature in saying that this we can achieve this time bound without worrying about that that issue okay so how does the algorithm get its hands on it knowing the fact that's there all right so here's the question so we let me just rewrite currently we know that for every point XY in P there are at most I'll use the word values 16 16 points in Q that might be paired with XY to get a distance less than Delta and any other pairs are relevant because they have distance bigger than Delta or bigger yeah you're saying the maximum numbers xr5 well as I said you could look more closely at these things and and get a number smaller than than 16 that's for sure I don't know I've stopped my head whether that number is 5 or not it's a good thing to think about but yeah so remember that certainly 8 is I remember it that 8 was a bound that you could put all right but this is a finite number independent of M ok but the issue is but how do we find we being the algorithm find those 16 for each now you have to do it for each one of the points in the slice for each actually this is more precisely in capital S for each point XY that's in capital S and within this time bound well at the very beginning I said it would be useful to have the points sorted by y coordinate as well as by x coordinate and in fact when we do this splitting into P and Q what we're going to do is also that the time it took to split that could be proportional to N and we therefore have enough time to look through all of the points once we've identified what's in P for example but we have time to look through the sort of Y list and pull out in order the ones that are in P and similarly pull out while at the same time pull out in order the ones that are in Q so you have at every level of the recursion you have the the elements in P and the elements in Q in X sorted order and also in what's ordered order so we'll want that at every level of the recursion now we're looking at coming back from the recursive calls and so we have the list of points in Q that are sorted by Y by y coordinate okay so how do we get let's say now we're going to look at the each of the candidate points in this P slice and we want to find the points over in Q that are among its 16 candidates well we we also have we have the points in in the slice yeah we have the points in P sorted by y coordinate right that's what I just so saying so let's in in order to implement this step let's just look through them in order when we look at a point we can also determine is that in this slice or not okay so let's say we found a point that's in the slice that's over in this picture X Y it's in this slice okay and we've been working our way up in the sorted list the Y coordinates were here okay and we've been finding for each point that's in here its candidates over in Q remember those points are also we have a sorted list by their y-coordinates so we're working our way up in that list too okay so when we get to when we get to this point the candidates oh okay we also get yeah we also have to a filtered for sure Q to be in its slice all right which we have time enough to do so imagine we have the sorted list of guys in queue that we've looked through to find which are in in this in here okay so we have this guy's y-coordinate and we have the y sorted list of the elements in queue that are over here all right with respect to this one's y-coordinate this tells us that in terms of the y sorted list over here we only have to go down 16 places and go up 16 places at most okay again if we filtered so that we only have in our hands and we have them in sorted order we only have in our hands those points over and Q that are in this Delta slice and we know that in that in terms of the that there can only be 16 actually there can only be 8 going down and 8 going up points that are potential that are in these boxes ok that's what we argued earlier so if we know the y coordinate of this point and we can look inside this list over here at that position if we go below that by actually just eight positions will have captured all the ones that are in here and if we go up by eight will have captured all the ones that are in there and even more perhaps but and then if we do the the distance calculation between this point and every other of those 16 points then we will have definitely considered all the pairs that were necessary yeah so you're running down the P list at the same time as you're running down the Q list as if you were going to merge them so you're at the same so the point you're at in P is that near the same value that the point you're adding Q so you could just go up keep a queue and back down cute look yeah so I think the comment is that the way I described this was a little clumsy and that one could even be a little more efficient and implement it more efficiently but but the way I did it it still works too you're working your way up and so for the next one you you know you only have to look that many positions down of that many positions up or if they're more than those that haven't been removed here you can remove them up until there's just eight left here and then you have to look forward at most eight as well okay so those are implementation details that are necessary in order to get the to get this result but quite simple ones all right everybody get the big picture of how this works so there's that implementation to tail at the end of keeping the Y lists sorted and as you do the recursive subdividing to keep those as well so again we have now we've really considered all the details of the algorithm and we're in the CN and the recurrence relation solves to cut n log n ok so I'll be back here at 6:00 anybody wishes to join you that's fine thank you
Info
Channel: UC Davis
Views: 48,516
Rating: 4.8210292 out of 5
Keywords: Closest, points, divide, and, conquer
Id: xi-WF07rAQw
Channel Id: undefined
Length: 49min 28sec (2968 seconds)
Published: Wed Oct 23 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.