Find the Skyline Problem with C++ Solution Explained

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome i'm james murphy in this video we're tackling a hard interview question the infamous skyline problem i'll explain my c plus solution which beat 98.24 of other submissions in speed and 100 in memory in case you've never looked at a beautiful city view and thought i wonder what mathematical function describes this skyline don't worry the more practical view of the problem is finding the pointwise max of a bunch of piecewise functions but thinking about it like buildings is just easier to visualize we're given a bunch of buildings represented by their left right and height and we're supposed to produce the skyline formed by those buildings which are represented by xy points that description is a bit dense so let's look at it visually first we have the buildings defined by their left and right x coordinates and their height their outline together with the implicit ground at height zero forms a skyline because the buildings are perfectly level and two-dimensional the skyline itself is piecewise rectangular described by its corners since the rectangles are perfectly level we don't need all the corners to describe the skyline since whenever there's a change in height the x-coordinate doesn't change that means we can fully specify the skyline using just the points where the height changed which is what the problem statement for this problem expects so we take in the buildings as a list of left right height coordinates and we output the skyline defined by those points where the height changes okay let's get to the code the first thing i want to point out is this horrible decision that leap code has made according to lead code buildings and points are represented by standard vectors of ins even though both of these things have fixed sizes where structs like these or at the very least arrays would be the fastest choice we're forced to use a vector i'll use these using declarations for now but at the end i'll show you how we can go way faster if we just use fixed size types i'll start off defining these getters for the building and point attributes don't risk using the wrong index by accident these are all going to be optimized away anyway again if buildings and points were just strucks we wouldn't need any of these functions okay let's get to computing the actual skyline first let's make a vector of points to hold the skyline to return we're going to reserve two times as many points as there are buildings because a building can contribute at most two points to the skyline the algorithm i'm going to describe is a single pass algorithm which means that we're just going to look at each building once and by the end of it we'll have the entire skyline the idea is as follows we loop through the buildings one by one when we read in a building the goal of that iteration is to compute all the points of the skyline up to and including the left endpoint of the current building as you can see in the animation sometimes that means adding a point at the current building sometimes it means filling in points from right edges of previous buildings and sometimes no new points are added first let's focus on how do we tell if we need to add a point for the current building's left endpoint well we only add a new point to the skyline if the tallest building so far changes so in particular we need to be keeping track of what's the tallest building so far we'll use a priority queue that compares buildings by their heights in order to keep track of which building is the tallest that we haven't accounted for yet here's how you define a custom compare function and here's the priority queue that we're using we're just using a standard vector as the underlying container for the queue we're also doing a little trick here in order to reserve space for the priority queue we create the vector that we want to use as the underlying storage for the priority queue and reserve the number of elements that we want in this case we reserve enough space for the total number of buildings since at most all of the buildings can be active one of the constructors of the priority queue then allows you to steal that data we chose a priority queue because we want constant time lookup of the tallest thing and it only costs us login time to insert and remove okay back to the loop whenever we see a new building we go ahead and push it on to our active list then we go ahead and try to update the skyline with the current x value and whatever the new tallest height is getting the maximum height is pretty straightforward just look at the top thing on the priority queue or zero if the queue is empty here's where we're using that the skyline is 0 in the absence of any buildings updating the skyline just involves pushing the xy coordinate onto the back of the vector of course we only add a point to the skyline if it changes the height so if we try to push something that has the same height as whatever is on the back of the skyline already we just do nothing if all of the buildings had unique left endpoints then this would actually be fine but it is possible the two buildings both have the same left endpoint if multiple buildings all have the same left endpoint i'm still going to add just one point to the skyline for that reason i just push all the buildings that have the same left endpoint in the same iteration of the loop so this works for finding the skyline points that are left endpoints at the beginning of buildings but what about ones that are defined by the right endpoints of buildings when we go to the next iteration of the loop it's possible that we missed many right endpoints whenever our current x exceeds the right endpoint of the tallest building that means that the tallest building is about to drop off and that's a candidate for when we need to add a new skyline point the logic for this one is a little bit complicated so let's put it in its own function so here's the function the purpose of this function is to handle the case when the tallest active building is about to drop off i know it's just a few lines but there's a lot of thought that goes into making sure that they do the correct thing the idea is simple just pop off whatever the tallest thing is and then try to add a new skyline point at whatever the new tallest thing is the subtle issue here is that for speed we don't want to go through the entire priority queue and get rid of all of the things that should be deactivated that means that whenever we pop something off the queue the next tallest thing might be something that should have been removed a long time ago we keep popping things off the top until we see something whose right endpoint is strictly further than the tallest building that we started with any tallest building whose right endpoint is left of our original tallest building's right endpoint cannot be part of the skyline it was either a shorter building that had the same right endpoint or it was something that should have been removed from a previous iteration that we didn't remove in order to avoid traversing the entire priority queue that's why we're removing a bunch of things but only update the skyline once that also explains the name of the function we keep popping off the tallest thing until the right endpoint that we find is past the current tallest right endpoint returning back to the main loop we keep popping off the tallest building and stale buildings updating the skyline until we find that the tallest thing remaining has a right endpoint further than our current x or until nothing active is left this takes care of everything up until the end but there's a little bit more work to do after we get to the end of the buildings we basically just pretend that there's a building at infinity and then pop off all of the remaining active buildings then of course don't forget to return the skyline i wrote a bunch of tests and i'm timing both how long it takes in total and how long it takes just to run the actual get skyline function running it a bunch of times we see that it passes all of my tests and does decent in terms of timing but i think we can do better than that i looked back at the code and once we put something into the active queue we never actually use anything except for its height and its right endpoint we never use the left endpoint additionally since this active q is something that's internal to our algorithm we can choose a better data type than a vector here we define a fixed size struct that's just the right and the height of the building and we give it an explicit constructor to convert a building into just the right and height then we change our compare function and our priority queue to take in these fixed size structures instead of vectors this allows us to get rid of two of our getters then everywhere that we used get right or get height we change to dot write or dot height also instead of pushing building objects onto our active queue we're now pushing these fixed size structs onto the active queue making this change took the total test time about 5 milliseconds down both in the total test time and time spent in the get skyline function okay remember the comment that i made at the beginning where i said that using a vector events here was a mistake watch what happens just by switching to fixed size arrays making no other changes our times are now down to 6 milliseconds and 1 millisecond we went from 40 and 15 down to about 6 and 1 doing nothing but changing to fixed size arrays and this i think is one of the biggest problems with leak code this was supposed to be a competition for how fast you can make this algorithm go but by telling you that a building and a point are supposed to be vectors of ins they've already slowed your program down by a factor of five this just goes to show that designing a good interface and choosing your problem statement as carefully and as correctly as possible can really make a difference if you do something like choose an inappropriate type for your function or return type you can end up really slowing down the execution of your function and then you also end up slowing down other things that now have to pass data around reformat data into vectors in order to call the function anyway i hope you enjoyed this solution if you did don't forget to subscribe as always thank you to my patrons and donors i really appreciate your support if you especially like the video please consider becoming a patron thanks and see you next time
Info
Channel: mCoding
Views: 22,581
Rating: 4.9358072 out of 5
Keywords: C++, Cpp, skyline
Id: XhzHXj7wrwo
Channel Id: undefined
Length: 10min 49sec (649 seconds)
Published: Sun Aug 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.