Ross Kippenbrock - Finding Lane Lines for Self Driving Cars

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
as you said I'm Ross skip and Brock we're going to talk about finding lane lines for self-driving cars so a little bit about me despite my German sounding last name I'm from the United States I grew up in the middle of the country it just happens to be host to one of the largest motorsports events the Indianapolis 500 so from an early age I got interested in motorsports and went to college for mechanical engineering got involved in Motorsports there worked in NASCAR for four years doing simulation software development and on many other things I traveled to the racetracks with the team and did a race strategy so I met the guys from Y hat when I was working in NASCAR and decided to go up to New York and work for them and we were just recently acquired by ultrix so talking about self-driving cars there's a lot of news these days about this company suing this company over self-driving car technology or this company teaming up with this other one there's a lot of new jobs in the space so it's a really hot topic but today I just kind of want to show you that it's it's not as scary as it seems to do a simple task that they the big guys do so at the core autonomous cars they need to carry out three tasks they need to perceive the world they need to make a decision based on these perceptions and then ultimately carry out some action so for perception these cars are equipped with many sensors one of them being a lidar unit on the top of the car that's kind of a big ball you'll see when one of them drives by you so that's a laser range-finding system it's going to be scanning 360 degrees around the vehicle and measuring distances to objects and then for closer in detection there is radar sensors to measure distances to objects around them and then also they're going to have cameras all around the car we're going to be looking at the forward-facing camera today and analyzing those images so that's the perception piece which feeds a number of algorithms to make decisions and here's a few samples from the Google self-driving car making decisions about stopping at a red light maintaining that stop position while the light remains red that upper picture is it's deciding to make a lane change when there's an obstruction and then this lower one it's actually switching lanes when there's a cyclists in the right lane so making decisions about where is safe to drive and if we need to come to a stop so the final piece is carrying out some action this is actually steering the steering the car or stopping the car applying more or less throttle so here here's an example of the Tesla autopilot system coming to a stop when there's a accident in front of it this top one is the Google car stopping when it looks like there's a wheelchair and possibly chasing some dog but I think they're pretty excited that their car came to a stop there even though that's that's not data that they had trained with so finding lane lines is part perception we're going to be using that forward-facing camera and then we're going to make a decision about where those lane lines are and where it's safe to drive so here's an example of the kind of thing we're going to be building it's it's just processing the video as a series of images we're going to produce this nice green area that's safe safe space to drive in so in order to carry this out there's there's going to be five main steps we'll walk through the first is to undistorted jizz the raw camera images or have a little bit of distortion so we'll carry that out we're going to warp the image so we have a bird's-eye view perspective looking down on the road we'll isolate the lane line pixels from the surrounding image we're going to fit a curve through those pixels and then take all that information overlay it with that original image and we'll get that nice video so the first step removing distortion so camera lenses cameras use lenses that distort the image you can fit more more of the surroundings into a smaller frame so this is the typical type of distortion that you'll see with these cameras it's called barrel distortion where it looks like those straight lines are boat on the outside of the image so if we were to take this image and do that transformation warping to get a bird's-eye view the the lines that should be straight would have kind of a bow to them so we need to do some distortion Corrections luckily OpenCV has a number of functions that'll help us do this if you take some pictures of chess boards at various various perspectives and they suggest at least ten photos I use twenty-two to calibrate this camera but you just take a series of these chessboard images and then let OpenCV find those corners in the image using this fine chessboard corners function if you if this returns that it's found those points you add them to a list of these corner points and then pass it into the calibrate camera function which just takes those image points that you found on the chess boards and it's going to return a set of distortion coefficient which will transform the image based on like a series of different distortions and then also there's a camera matrix so there's some linear algebra behind the scenes I'd suggest looking at the opencv documentation if you really want to get to know know and love those functions they do a really good job of explaining kind of what's going on under the hood but now that we have our camera calibrated we can use another function called undistorted image those two matrices that we found previously and you apply it and it will return this undistorted image so you can see that the those outside lines that were kind of bowed are now pretty straight so we should be able to get that bird's-eye view we were talking about so we really want this for fitting fitting curves and finding lane lines it's going to be easier to find those lines if we can look down on the subject instead of looking out on it so to do that we we have to have a set of source points from the original image so those are kind of those are going to look like lanes going off to infinity and we want to take those points and warp them into straight lines so this is just a straight road so those lines should end up straight when we do the the actual image warping and to do that we call this get perspective transform function which returns another matrix more linear algebra who knows what's going on but then you take the output of that that matrix apply this Worr perspective to that original image with that matrix and it's going to return this warped perspective so as you get towards the top of the image you notice that it gets a little more pixelated obviously we can't create resolution that didn't exist in the original image but there should be enough fidelity there to isolate those pixels from the surrounding image so how do we go about doing that there's going to be two approaches that we'll take one is color selection and the other one's going to be edge detection so for color selection we're going to first kind of look at what an image is inside of a computer so taking a look at the first ten pixels of an image just zooming in on that you can see that it's just a series of these little squares in each one of these squares if we look at the value that prints out is a set of three values in a list so these these do mean something one of them is a red value green value and a blue value and these go from 0 to 255 in most images just because they're 8-bit images 2 to the 8th is 256 so 0 to 255 if all these are 255 it's going to output white if they're all 0 it's going to be black and then for each channel the full value would so if it's 255 0 0 that will be all red and if anyone's messed around with one of these sliders that's really what you're doing so looking at the first 2 by 2 pixels on this image you can see that it's really just a set of columns and rows so knowing that it's going to be a little bit easier to understand that we can isolate different color channels within this so there are more color spaces than red green blue and some of these might help us to isolate those Lane line pixels from the surrounding image a couple others are these hue/saturation light and value color spaces so those they just take RGB colors and then them into this other representation of that same image and then there's a number of other ones there's this one is called lab it uses lightness and then there's two axes for the color and these these are going to prove pretty helpful so luckily OpenCV makes it really easy to convert between these you just use this convert color function pass in an RGB color or or any other color as long as you have the correct conversion coefficient which is that color underscore RGB to HLS to those outputs in a hue light saturation image which looks a little different but it's really the same thing it's just a different representation of that same image so from here we can isolate each layer of that image so there's the red layer the green layer the blue layer if we look at those individually using this notation where we're looking at all the rows all the columns of the image and just that first value in each of those RGB triples and then plot all those you can start to see that that yellow left lane line is showing up in the red and the green Channel but not the blue channel if we go back to that Venn diagram of how colors work red and green the combination of those two produces yellows so those are going to be good indicators for a yellow lane line so if we do this for all of the hosts of different color spaces in that warped perspective and start to see that some of these look like they might help us to find those lane lines like those bottom two are just producing a white value where that left lane line is some of them aren't going to be so useful like those Q channels in the upper left and then here's just another example of a different image that's a little bit harder so if we just pull out the best those and take a look at them it looks like some of these might be pretty useful here the saturation channel in the upper right looks pretty good it's isolating those white lane lines even though the original image is kind of a white roadway typically we're like a bridges in the US but in this in this image because there's a shadow across the road that saturation channel doesn't look quite as good so there's going to be some trade-off here between which channels we use and how we use them but getting into edge detection OpenCV has a couple functions that will detect edges for us and all this Sobel operators really doing is taking the derivative of one of those color channels so if we pass in that red channel if we have like a red meeting white it's going to detect a line where that where those two meet it's just going to detect the derivative across that line so the value of that derivative is what this this plot is so it does it in the X direction and then the y direction and then you can calculate the magnitude just as the sum of those two squares and this bridge would be pretty useful on occasions where our color detection system might not work so well so it's a good substitute and then there's another one called canny edge detection which just uses that same Sobel system but applies a couple nice little tricks that throws out pixels that aren't part of the edge and then also thresholds the image to try to connect edges that weren't connected before so we're going to use both of these to produce our final guess at where the lane lines are to do that we look at this binary threshold inquiry the red Channel saying there's a white pixel a high value for the red channel it looks like there's a lane line there so if we just apply a threshold to this red channel image and write the output of that fresh holding to a new image we're going to get an image where everything is black except the lane lines so here the threshold is between 200 and 255 which indicates wider on that grayscale image but we can adjust these thresholds to try to capture more or less of the of the lane lines and then once we do that to all the color channels and then add in the edge detection as just a series like a linear addition so anywhere that there's a white pixel on one of these binary threshold ajiz we just add it to it the final image so then we go we do our red channel this is just two but you could have n number of channels here we do it for the red we add the Sobel edge detection and then we output this final binary which is our guess it where those lane line pixels are I built this kind of interesting maybe somewhat interesting little slider thing here to show you that you can tune these thresholds and try to add more or less content so if you flip these little knobs eventually you'll just guess like the whole image is aligned see like it's a it's upset because there's too much so yeah like these look pretty good where's the food guy so yeah there the s moved and we got a little bit more of that left lane line so you you can mess around with these thresholds enough where you you're capturing just the lane lines and none of the surrounding image but it's a bit of a trade off some images they look better than others and some of the thresholds you need to tune them like two for that shadow image you need to tune them for that image but in other images they'd be a little bit better if if you dialed back those thresholds a little bit but now that we have this binary image we can just run a some little algorithm to find the left and the right lines and then we can apply a curve fit to those lines so method one is can be applied on any image and method two here it needs a previous fit but it runs quite a bit faster so we try to use method two or possible but if we don't find a fit we kind of fall back to this first method which is little slower but a little more robust so for method one we take a histogram of the bottom half of the image and wherever that histogram Peaks that's where we think that the line starts so the y-value here is just the sum of the pixels at that X location and the two maximums are where those lines start this is a good indicator of how close it can be between a false value and a real value so here the the lane that's to the right of ours we're detecting that line two so throwing out some of these extraneous points would be probably pretty useful for this algorithm but anyway we have our to our left start and our right start so we're going to draw a box around those starting points and any of those white pixels from the binary image that fall in that box we'll just put them in a big list of x and y values and then for the next one we'll just add a box on top of that and do the same thing and as we move up the line we'll move the boxes based on the average of the previous line so if the average starts moving to the left we'll know that the box the next box needs to go to the left that's that's how these boxes start following that line so at the end of this we have a big list of all of the Y values and all the x values of the pixels that were in that line and you just run a numpy poly fit second-order polyfit to those pixels and that gives us the curve fit in pixel space you can measure the dislike the pixel two meter conversion by knowing that in the United States the lanes are I think they're like three meters wide 12 feet I don't I don't know what that is in meters but to just do that conversion and then for the y direction you know that there each of these dashed lines is 10 feet long on the right and there's a set gap so the federal government likes to control all this on the highways so it gives us a good conversion so we can go from pixel measurements to meter measurements so we're looking at method two it just relies on a previous fit we take that previous fit and move it to the left and the right by a hundred pixels and that creates this green boundary and any pixels that lie in that green boundary we throw those in a list and then run that same poly fit system and that produces these yellow lines which they seem to look pretty good so the final piece is creating that that nice image we had a little bit of text on there one of them was a radius of curvature so going to help the car to guess how how tight the turn needs to be how much steering wheel he needs to put in he or she it what our computers so this is you know there's a little bit of calculus here we don't need to get into that but this just uses those fit those fits that we found previously and then calculates the radius of curvature also useful is the distance to the center of the lane so if the car starts veering to the left if we keep track of that you can just have like a simple controller so that the car will add a little bit of right steering wheel angle to bring it back to the center of the lane and it's fairly straightforward to get this value if you know where the X locations of the right and the left lane are and you assume that the cameras on the centerline of the car then the centerline of the car is at the center of the image so that's that's how I got those values obviously if it's not mounted on the center then there just need to be some offset here so getting that that nice green picture you just take the two curve fits you found plot those on the on the image using this polylines function and then fill in the space between those lines with the the green fill which is that first function the fill poly but this is in Warped space so this is from that bird's eye view which we can't really overlay on that final image it'll kind of look a little goofy but it's pretty easy to go from that warped image to the original image if you just swap those source and destination points we talked about so you swap those apply the same function that under store phone and it'll output that nice overlay if you just use this ad weighted function which adds a transparently weighted layer to that image so finally you can just throw some text on there with this put text function and that's that's how we got that nice picture and then to run this on a series of images I just used a couple a couple tools for that but it basically applied this same pipeline to all the images in a video and saves it at the end of the day so that's that's how I did that but working on how do we make this a little bit more robust this doesn't work super well on off highway driving so when the corners are really tight sometimes the curve fits kind of wander around or if there's a car passing you that car can throw off our like where we can't see the line anymore so a couple ideas for how to make it a little bit better the image warping is assumed to be constant across all the images all the images so we just use those same source and destination points but in reality as the car moves like to the left in the lane the start of the lines start to move so we get a little bit of distortion when we go to that bird's-eye view so having a dynamic system where it picks out the start of the lines then uses those for the image warping would be useful also being able to dynamically control those thresholds based on how much signal is in that image so if there's something that's creating a lot of white in our binary threshold image being able to tune those thresholds so that we scale back the content of that image also the speed this obviously needs to happen in real time on a self-driving car but it took me about three and a half minutes to process that video which was I don't know maybe 20 seconds long so obviously we need to speed this one up the smoothing system I use I just keep track of the last five previous fits and do an average of those fits and that's kind of my best prediction of the the current line but if there's some times in the United States they'll divert the lanes around some construction so as those lanes shift in [Music] encourage really fast the system's going to be slow to respond to that so you know having just a weighted average where you wait the most recent line fit the highest would be useful here or maybe some exponential smoothing but other than that that's the little system that I built so are there any there any questions also the the that ipython notebooks available on this get in this github repo so if you want to play around with it yourself I should probably put some install instructions because you need like open CV and a bunch of other tools but um if you go back to the walking the first step yeah you need to find the full edge points I didn't quite understand how you find these four can you say something about that lady's computer so the question well I think everybody heard that image warping yeah this one so this is really just guess and test so I fiddled with these points until I used a straight a straight piece of road so I knew that the final warped image was going to have straight lines and I just fiddled with the points until I got them to be straight and then the amount of like the height of that image is based on the y-value of those points so if you move them further out you'll get more perspective I could have look further out on the road but obviously there's a little bit less signal out there so this is where like dynamically choosing these points would be useful because they're not always in the same position Thanks yep so you seem to rely a lot on the kind of empty roads that one on littering is using car commercials and this wouldn't really be realistic for the kind of heavily trafficked roads that we have in cities because cars common on the structure view of the lanes I was wondering if there's any attempts to turn that noise into an actual extra information right so yeah this is basically a system that's only going to work in the United States on highways but it turns out that that's probably some of the most useful and easiest to solve problems in the self-driving car space but yeah I mean just driving around Germany here this system would break down really quickly there's sometimes there's no left lane it seems like in the United States there's a little bit more regulation as to you know there's always a double yellow which indicates I can't cross the middle and there's always a right lane line so you definitely need different models for different countries and then the in city driving would would definitely be a challenge I live in New York and you know it's just it's mayhem out there so this would this would definitely not work in the city but yeah the techniques that they use to solve the city piece are really interesting maybe next time I'll do I'll do one about that did you just how did you test your algorithm do you have any ground truth data set or how do you collect your data to test the algorithm or do you just look at the pictures manually and say okay algorithms performs better or less good than your previous stage of the algorithm how do you do that yes this is all pretty manual at the moment I should note that all this all this data comes from Udacity has an open source self-driving car system they're trying to build so they've got a couple hundred gigabytes of video and images that anybody can play around with via all the testing it was pretty manual you know like manually moving those sliders around and looking at output images but I was trying to think about that how to like automate some of those systems I think that'd be pretty challenging because you need you need to know what's a good lane line fit and a computer doesn't really know that until you show it so some part of the pipeline is going to have to be a little bit manual but I think once you have this system figured out you could train your model on you know the output of the system and kind of create your own training data so a quick question you were using projections of your environment so my naive idea would have been you know to build a three-dimensional model to do this kind of task is it common to use projections or is it even possible to do 3d modeling in real time yeah so I I don't think they use cameras to do 3d modeling because they have that lidar system so the lidar is gonna if I shine lidar like it's just a beam of laser light if I shine that around this room it's going to give me a 3d a set of 3d points of this room so that's continually running on the car so they they overlay that with like Google has all of their map data and that's one way that they localize have you have a set of like two cameras you can create a pretty it's a pretty rough 3d image but that is possible but with one camera it's a little bit more challenging as you can see like you lose a decent amount of image fidelity as you start like warping around so yeah it'd be a little bit challenging I think but it's a neat idea i-i can't formulate this in several ways but let's put it let's put it like this we're thinking in some future tensorflow will replace OpenCV in doing this and if if deep learning will take over this field how would you see the next steps orb yeah yeah I think that's that kind of gets back to the previous a previous question for sure running a series of these images through some deep learning network could give you some idea of like where these edges are it's going to start to develop a model that way but I think I think the training data is going to have to be some manual process of like picking out here is the lane or or whatever so this you know maybe you could use this to generate your training data and then to run it on you know billions of images versus just a couple hundred by hand so yeah I think for sure like the future is going to be more in that direction rather than this pipelines really slow so it'd be hard to run in real time yeah thank you for the presentation so the way I understood it do processing each image separately and then you apply the smoothing after the techride but given the way the data is structured wouldn't be a natural way to use the last few predictions or the last few well data points to include them in the prediction for the next data point so kind of you have very strong prior of where you're there lanes are from the previous image because you're taking picture every 50 milliseconds or so right right so with this would make it much more robust maybe to compare to processing each image separately yeah I so I didn't really want to talk about like classes and annoying Python things but yeah so I have like a lame class where I keep track of previous fits at any given point in time so I use that and then we you so you know like kind of where the line should be and then you just guess in that area and then if you don't find it that's when you fall back on kind of the the more computationally intense process but via the smoothing is happening like in real time like in that space or that image VF for sure I should be like waiting that for that latest fit a little bit more than the previous four or whatever when detecting lanes and a camera image you always have the problems with shadows and lightening conditions have you ever thought of using Lane detection or using multiple sources of sensors like you have mounted on your car for example leader Sanders yeah so like when there's rain or snow on the camera lens it's gonna be really hard to detect them so this is just one system that these companies use to really this is all about finding exactly where I am on the roadway so GPS is a sensor that they use but that's only accurate to 10 feet so you use that to kind of get you in the ballpark you can use this to know where I am in the roadway but also that that lidar system because Google has such good Maps they just overlay the lidar data with the map data and that's that's pretty good for saying you're here plus or minus a few inches and then there's a there's a number of other sensors they can use but so this is kind of just one piece of the localization function as a whole of finding where you are in the world yeah thanks guys
Info
Channel: PyData
Views: 23,526
Rating: 4.9736409 out of 5
Keywords:
Id: VyLihutdsPk
Channel Id: undefined
Length: 38min 4sec (2284 seconds)
Published: Wed Jul 26 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.