OpenCV Python Tutorial - Find Lanes for Self-Driving Cars (Computer Vision Basics Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone in this tutorial I will teach you to use effective computer vision techniques with open CV and Python ultimately to detect lane lines for a simulated self-driving car this video was done in collaboration with the programming knowledge YouTube channel and by the time you finish this tutorial if you're interested in more self-driving car content feel free to check out the link in the description below but without further ado let's start this tutorial welcome to your first lesson this lesson is quite simple as all we're going to be doing is installing the Anaconda distribution we'll start this off by going over to anaconda dot-com / download the anaconda distribution conveniently installs python the jupiter notebook app which we're going to use quite often throughout this course and over a hundred and fifty other scientific packages since we're installing python for mac make sure to navigate to the Mac section and we're going to install Python 3 not Python 2 it's very likely that you're seeing different versions of Python and the ones I'm seeing right now but regardless since Python 2 is no longer being updated it's imperative that you download the latest version python 3 to keep compatibility with future Python improvements and to also follow along with this course click on the download button no thanks once your download is finished will open this package continue keep pressing continue continue agree to the terms and conditions that you have read the agreement and simply enough press install once your installation is complete press Continue again close and move to trash that is all very easy and intuitive to ensure that this works whatever terminal window you had previously opens make sure to close it I already had one open so I'll close mine if you don't have one open then you should be fine what we'll do is we'll open up a new terminal window by performing a spotlight search what you can do by pressing command + space right terminal and press Enter alternatively you could have just access your terminal by presing on f4 and performing the search here but regardless to ensure a successful installation write the command Python 3 double dash version and I get a version of three point six point four make sure you also get a Python 3 version that is all hope you were able to follow along if you have any issues with the installation feel free to ask me in the Q&A section welcome back we'll be making use of the atom text editor in the computer vision section you can feel free to use any text editor you want like sublime or vim in which case feel free to skip this lesson otherwise if you don't have any text editor installed let's get to it downloading it is quite simple for both Mac and Windows I'll be proceeding with the Mac installation by going over to Adam dot IO and inside of atom dot IO just clicking download this is also downloadable from tech spot by also going over to Google and then searching tech spot atom download and going into the first link where if you just simply click on the appropriate link and then wait for it to finish setting up the download whether you're on Mac or Windows alright once your installation is complete it should be inside of your downloads folder let us just open up atom and see what it's like it's verifying this new application we're going to open it all right we're going to close the following and then what we'll do is click on packages and inside of packages we're going to go to settings you and open the settings if you're using a PC I imagine the process to get to your settings you should be quite similar and now inside of editor we can modify some settings for example the font family I'm quite satisfied with the current font and font size 16 seems pretty reasonable and two other boxes I like to have checked are showing the cursor on selection and showing indentation indicators indentation is fundamental to pythonic code as it distinguishes between different blocks of code so it will definitely be useful to always have indentation indicators and finally one more thing that's pretty important is the tab length the default tab length should be 2 I believe but I personally prefer using 4 spaces per tab and just one more thing I want to do before moving on is enabling autosave so we'll go back to packaged and this will be very convenient in our code since it would keep us from having to save our code every time we need to run it so what we'll do is inside of core packages scroll down until you find autosave right over here and make sure that you enable it I already have it enabled alright and that concludes the installation section the purpose of this section is to build a program that can identify lane lines in a picture or a video when you and I drive a car we can see where the lane lines are using our ice a car doesn't have any ice and that's where computer vision comes in which through complex algorithms helps the computer see the world as we do in our case we'll be using it to see the road and identify Lane lines in a series of camera images this lesson in particular will be quite simple as all we're going to do is set up the initial stages of our project and display an image on to which we'll be identifying lane lines you'll start by opening up your terminal or command prompt and navigate to the desktop directory with the command CD desktop change the rectory desktop inside of desktop we'll make a new folder with the command mkdir make directory finding lanes this folder you just made inside of desktop you're going to open up with atom by going to file open desktop finding lengths instead of finding lanes make a new python file called lanes py inside of the lanes file we're going to start by writing a program that can identify lanes in a JPEG image I posted the image on my github to access it make sure to go to the following link the link is also available in the description below and so once you get to this github page click on test image JPEG and what we're gonna do is actually download the image or better yet just save the image and make sure to save it as a JPEG image it doesn't matter where you save it and make sure this says test image such that there are no extra ones or twos in there so that you're naming remains consistent with what I have in the videos alright save your image once you download it wherever you have it downloaded make sure to drag it into your project folder like so and now to display the image we're going to use open CV an open source computer vision library so inside of your terminal we're going to write the command pip install will make use of the package manager pip to install OpenCV - TREB - python if it is installing it back to visual studio we're going to import the library CV to and from this library for now we will access to functions Emory and M show to load our image will first make use of the M read function by setting image is equal to CB - dot M read and in this argument is where you will specify the image the file name as a string ours is test image dot JPEG and what this function will do is read the image from our file and return it as a multi-dimensional numpy array containing the relative intensities of each pixel in the image we now have our image data in a number right the next step is to actually render it with the M show function so we'll write C v2 dot M show this takes in two arguments the first one is the name of the window that we're going to open up we'll just call it a result and the second argument is the image that we want to show itself if I run the code now to my terminal navigate to my project folder by writing CD our project is named finding lanes and run the python file python elaine's dot py notice that nothing is going to happen that's because this function should be followed by the weight key function c v2 dot weight key and what this function does that displays the image for a specified amount of milliseconds we'll set a time of 0 what this will do is it will display our window our result to window infinitely until we press anything in our keyboard if we rerun the code python Lanes the image is displayed and notice our window name results will keep this lesson short and stop here you learn how to load and display images using the opencv library in the next lesson we'll start discussing canny edge detection a technique that we'll use to write a program that can detect edges in an image and thereby single out the lane lines welcome to lesson number two the goal of the next few videos will be to make use of an edge detection algorithm the canny edge detection technique the goal of edge detection is to identify the boundaries of objects within images in essence will be using edge detection to try and find regions in an image where there is a sharp change in intensity a sharp change in color before diving into this it's important to recognize that an image can be read as a matrix an array of pixels a pixel contains the light intensity at some location in the image each pixels intensity denoted by a numeric value that ranges from 0 to 255 an intensity value of 0 indicates no intensity if something is completely black whereas 255 represents maximum intensity something being completely white that being said a gradient is the change in brightness over a series of pixels a strong gradient indicates a steep change whereas a small gradient represents a shallow change on the right-hand side you're looking at the gradient of the soccer ball the outline of white pixels corresponds to the discontinuity and brightness at the points the strength ingredient this helps us identify edges in our image since an edge is defined by the difference in intensity values in adjacent pixels and wherever there's a sharp change in intensity a rapid change in brightness wherever there's a strong gradient there is a corresponding bright pixel in the gradient image by tracing out all of these pixels we obtain the edges we're going to use this intuition to detect the edges in our out image this is a multi-step process step one being to convert our image to grayscale why convert it to grayscale well as we discussed earlier images are made up of pixels a three-channel color image would have red green and blue channels each pixel a combination of three intensity values whereas a grayscale image only has one channel each pixel with only one intensity value ranging from 0 to 255 the point being by using a grayscale image processing a single channel is faster than processing a three channel color image and less computational intensive let's start implementing this inside of Adam we've already loaded in red our image into an array now what we'll do is import numpy as the alias NP we're going to work with a copy of this array by setting lane image is equal to numpy copy image thus copying our array into a new variable it's imperative that you actually make a copy of this array instead of just setting Lane image is equal to image if we do this any changes we make to Lane image will also be reflected in the original mutable array always ensure that you make a copy whenever working with a race instead of just setting them equal directly so what we'll do now is we'll create a grayscale from the color image well do that by setting a variable gray is equal to cv to and from our OpenCV library will call the function CVT color which converts an image from one color space to another we'll be converting Lane image and in the second argument for an RGB to greyscale conversion we can use the flag C v2 dot color underscore RGB to gray very intuitive and now instead of showing the color image will show the grayscale image if we go to our terminal Python lanes that py everything works out accordingly this was step number one step number two of edge detection will be to next apply a Gaussian blur on this image let's talk about that in the next video welcome to lesson number three in the last lesson we applied step number one which was to convert our image to grayscale step two is to now reduce noise and smooth an our image when detecting edges while it's important to accurately catch as many edges in the image as possible we must filter out any image noise image noise can create false edges and ultimately affect edge detection that's why it's imperative to filter it out and thus smoothen the image filtering out image noise and smoothening will be done with a Gaussian filter to understand the concept of a Gaussian filter recall that an image is stored as a collection of discreet pixels each of the pixels for a grayscale image is represented by a single number that describes the brightness of the pixel for the sake of example how do we smoothen the following image the typical answer would be to modify the value of a pixel with the average value of the pixel intensities around it averaging out the pixels in the image to reduce noise will be done with a kernel essentially this kernel of normally distributed numbers is run across our entire image and sets each pixel value equal to the weighted average of its neighboring pixels thus smoothening our image we're not going to go over kernel convolution and how it doesn't just know that when we write this line of code inside of our editor blur is equal to cv - Gaussian blur what we're doing is applying a Gaussian blur on a grayscale image with a 5x5 kernel the size of the kernel is dependent on specific situations a 5x5 kernel is a good size for most cases but ultimately what that will do is return a new image that we simply called blur applying the Gaussian blur by convolving our image with a kernel of Gaussian values reduces noise in our image back to our project set blur is equal to cv - Gaussian blur we'll apply this blur in our greyscale image with our 5x5 kernel and we'll just leave the deviation as zero the main thing you should take away from this is that we're using a Gaussian blur to reduce noise in our greyscale image and now we'll simply show the blurred image if we run this code Python lanes dot py there is our blurred greyscale image later on when we apply the canny method it should be noted that this step was actually optional since the canny function is going to internally apply a 5x5 Gaussian when we call it a regardless now we know the theory we've obtained our grayscale we smooth and it introduced noise with a Gaussian blur now it's time to apply the canny function we'll do that in the next video in the last lesson we smoothen our image and reduce noise now it's time to apply the canny method to identify edges in our image recall that an edge corresponds to a region in an in where there is a sharp change in intensity or a sharp change in color between adjacent pixels in the image the change in brightness over a series of pixels is the gradients a strong gradient indicates a steep change whereas a small gradient a shallow change we first established that an image as it's composed of pixels can therefore be read as a matrix an array of pixel intensities to compute the gradients in an image one must recognize that we can also represent an image in a two dimensional coordinate space x and y the x axis traverses the images width and the y axis goes along the images heights with representing the number of columns in the image and height the number of rows such that the product of both width and height gives you the total number of pixels in your image the point being not only can we look at our image as an array but also as a continuous function of x and y since it's a mathematical function we can perform mathematical operation which begs the question what operator can we use to determine a rapid changes in brightness in our image what the candy function will do for us is perform a derivative on our function in both x and y directions there by measuring the change in intensity with respect to adjacent pixels a small derivative is a small change in intensity whereas a big derivative is a big change by computing the derivative in all directions of the image we're computing the gradients since recall the gradient is the change in brightness over a series of pixels so when we call the canny function it does all of that for us it computes the gradient in all directions of our blurred image and is then going to trace our strongest gradients as a series of white pixels but notice these two arguments low threshold and high threshold while this actually allows us to isolate the adjacent pixels that follow the strongest gradients if the gradient is large the upper threshold then it is accepted as an edge pixel if it is below the lower threshold it is rejected if the gradient is between the thresholds then it will be accepted only if it is connected to a strong edge the documentation itself recommends to use a ratio of one to two or one to three as such will use a low high threshold ratio of one to three fifty to one fifty now that we know what goes on under the hood we can call this function inside of our project by writing Kenny is equal to cv to DA Kenny will apply the canny method on the blurred image with low and high thresholds of 50 and 150 and now we'll show the image gradient instead of the blurred image Kenny if we go ahead and run the code python lanes dot py and there's the gradient image which clearly traces an outline of the edges that correspond to the most sharp changes in intensity gradients that exceed the high threshold are traced as bright pixels identifying adjacent pixels in the image with the most rapid changes in brightness small changes and brightness are not traced at all and accordingly they are black as they fall below the lower threshold that's it for the candy method we used it to outline the strongest gradients in our image now this was a lot of explanation for just three lines of code but it's good to have a grasp of what's going on under the hood before we proceed any further and now that we've computed the strongest gradients in the next few videos we'll apply the Hough transform method to detect our lanes in the last lesson we used the candy function to outline the strongest gradients in our image now we'll focus on how we can identify lane lines in the image before doing that what we'll do now is specify a region of interest in our image that we're going to use to detect our landlines asked currently shovin before proceeding any further what we'll do first is actually wrap our code inside of a function by defining a function Kenny which takes in an image and what we'll do is copy the Kenny algorithm code inside of the function and specify Kenny as the return value return Kenny and what we can do now is simply set Kenny equal to the return value of our candy function passing in the initial RGB color image and rerunning this code back into our terminal Python leads that py everything is still intact what we'll do now is we'll specify this area as a region of interest before doing so instead of showing our image gradient with OpenCV will use the matplotlib library to better clarify how we're going to isolate this region you should already have matplotlib installed courtesy of the anti counter distribution so what we can do now is imports matplotlib and we're going to need the sub package from matplotlib called pi plots as an alias PLT conveniently pi plot contains the function m show so we can just replace cv to web PLT and in this case no need to specify a window name just the image the equivalent of open Stevie's await key function would simply be PLT so if we go back and run the code in our terminal python lines the py we get the same image along with x and y axis notice how the y axis starts from the first row of pixels and then goes downwards with our axes we're going to limit the extent of our field of view based on their region of interest with chalten mainly traces a triangle where the vertices of 200 along the X and 700 pixels along the Y which would simply be the bottom of the image 1100 pixels along the X and once again the bottom of the image 700 pixels out the Y the very bottom and the last vertex will simply be 550 pixels along the X and 250 traveling down the Y ultimately tracing a triangle that isolates the region where we want to identify the lane lines so the goal of this video will be to create an image that's completely black a with the same dimensions our road image and fill part of its area with a triangular polygon first and foremost will revert showing the image to being number the opencv rather than matplotlib and what we'll do now is define a function deff a region of interest which also takes in an image and what this function will do is pretty self-explanatory it will return the enclosed region of our field of view and recall that the enclosed region was triangular in shape so we'll set a variable named triangle it's equal to and this polygon this triangle we'll declare as a numpy array and P dot array inside of this array is where you specify its vertices recall that while limiting the extent of our field of view we traced a triangle with vertices that go 200 along the X and vertically until the extent of our image until the bottom which in this case is the height we can get the height of our image by setting height is equal to image dot shape at the end X 0 recall from the numpy crash course that the shape of an array is denoted by two pool of integers since we're dealing with a two-dimensional array the first integer corresponding to the number of rows the y-axis traverses the images height and height accordingly is the number of rows so you can already assume that this value will be something very close to 700 since that's what we saw in matplotlib so we'll simply set the height to complete our first vertex the second one being 1100 pixels along the X and vertically once again right up until the extent of our image being the height the vertical extent and lastly in the last vertex was 550 pixels along the X and 250 pixels along the Y this polygon we're going to apply it onto a black mask with the same dimensions as our road image so we'll set mask is equal to num pi dot zeros like image recall that an image can be read as an array of pixels zeros like creates an array of zeros with the same shape as the images corresponding array both arrays will therefore have the same number of rows and columns which means that the mask will have the same amount of pixels and thus the same dimensions as our canny image that we're going to pass it in a bit although it's pixels will be completely black as all of them will have a zero intensity what we have to do now is fill this mask there's some black image with our polygon using open series fill poly function that is CV to the fill poly we will fill our mask with our triangle the third argument specifies that the color of our polygon which we're going to have be completely white so what we're going to do is take a triangle whose boundaries we defined over here and apply it on the mask such that the area bounded by the polygonal contour will be completely white what we'll do now is return our modified mask return mask and instead of showing the Ken image will be showing the return value of our function region of interest and the image that we're going to pass in is simply going to be the Kenny image let's run the code let's run Python Alain's dot py and it would throw an exception and that's because fill poly the fill poly function fills an area bounded by several polygons not just one even though you and I both know that we're dealing with only a single polygon we'll rename this variable from triangle to polygons for consistency and we'll set it equal to an array of polygons in our case an array of simply one polygons change this from triangle to polygons if we rerun this code there is our mask and inside of the mask there is the enclosed region the polygon with the specified vertices now you might be asking yourself why did we go through all of this well this video has gone on long enough so let's talk about that in the next lesson previously we created a mask with the same dimensions as our road image we then identified a region of interest and our road image with very specific vertices along the x and y axis that we then used to fill our mask the image on the right why is it important well we're going to use it to only show a specific portion of the image everything else we want to mask so to understand how we're going to use this image to mask our canny image to only show the region of interest traced by the triangular polygons you'll require a basic understanding of binary numbers if you're already familiar with binary numbers feel free to skip the next two minutes or so of this video otherwise I'll introduce it now and I'll do it very quickly commonly when one thinks of binary representations they think of zeros and ones well more specifically binary numbers are expressed in the base-2 numeral system which uses only two symbols typically zeros and ones what does that mean for example the number 23 it's a binary representation is 1 0 1 1 1 how did I obtain that number well let's imagine 8 placeholders 8 boxes as we're dealing with a base-2 numeral system each box represents a power of 2 each numerical place will correspond to an increasing power of 2 the first box 2 to the power of 0 which simply equals 1 then 2 to the power of 1 which equals 2 all the way up until 2 to the power of 7 up until 128 we'll come back to this now each box can only accept one of two values 0 or 1 so we want to put in binary the number 23 we wish to represent this number in binary format so what we do is we start with the highest value 128 and ask ourselves is this value in the number 23 128 is clearly not in the number 23 it's way too big so we leave that as 0 64 is not in 23rd we also leave that is 0 32 is certainly greater than 23 we leave that as 0 16 this goes into 23 so it's assigned the number 1 and so far we've used up 16 23 minus 16 equals 7 there are 7 left we have to account for this 8 going to 7 nope we leave that as 0 does 4 go into 7 of course that takes a value of 1 so now we've used the forest seven minus four equals three and there's three left that we have to account for just to go into three yep now there's only one left does one go into one indeed and there is the binary representation of 23 we can cut off the zeros in the beginning and it's just as we said earlier 1 0 1 1 1 all right so why don't I just randomly start talking about binary numbers while the image on the right I went ahead and printed out its pixel representation I resized the array simply because it was too large but never mind that notice how the triangular polygon translates the pixel intensities of 255 and the black surrounding region translates the pixel intensities of 0 what's the binary representation of 0 while none of these numbers go into 0 so we leave 0 for each placeholder leaving us with a binary representation of 0 0 0 0 what about 255 well for that we'll need 8 placeholders 2 to the 0 up until 2 to the 7 and so if you do the math we just talked about you'll realize that it's a binary representation is all one's 8 ones as all of these numbers add up exactly to obtain 255 as a side note if we think of this in terms of bits where each bit holds a single binary value and 8 bits form 1 byte 255 is actually the maximum representable value by an 8-bit byte so we can conclude since the surrounding region is completely black each pixel with a value of 0 than the binary representation of every pixel intensity in that region would be all zeros 0 0 0 0 as for the polygonal contour whose region is completely white than the binary representation of every pixel intensity in that region would be all ones why is this important well we're going to apply this mask on to our canny image to ultimately only show the region of interest the region traced by the polygonal contour we do this by applying the bitwise and operation between the two images the bitwise and operation occurs element-wise between the two images between the two arrays of pixels now both of these images have the same array shape and therefore the same dimensions and the same amount of pixels by applying the bitwise end since it occurs element-wise then we're taking the bitwise end of each homologous pixel on both arrays and the way bitwise and works is let's imagine two binary numbers 0 1 1 0 0 1 which if you do the math you saw earlier you'll realizes the number 25 and 1 1 0 0 1 0 which would be the number 50 this is a pretty standard example but regardless let's take their bitwise and and what n will do is it puts a 0 unless both pairs are ones in the first pair since one of these is 0 we put a 0 and the second one both of them are ones so we put a 1 in this one one of them is zero so we put a 0 and we keep doing this until eventually the resultant end operation would yield 0 1 0 0 0 0 what if we took the bitwise end of all zeros with any other value well no matter what during the operation you're always going to have at least one 0 which means the result of the end operation will yield all zeros no matter what value we choose to operate against it so going back to our two images the black region whose pixels have intensity values which correspond to the binary number we just talked about zero zero zero zero by taking the bitwise and by operating it against the pixel values in the corresponding region of the other array the result is always going to be a binary value of 0 0 0 0 this translates to the number 0 which means all pixel intensities in that region will have a value of 0 that's what the results going to be they will be completely black thereby masking the entire region we know the operation occurs element wise so all the white pixels in this region of the array will be operated against the corresponding region of the other array well this region will remain unaffected why you might ask well we already concluded that since the polygonal contour is completely White's than the binary representation of each pixel intensity in that region would be all once if you take the bitwise and of the ones with any other binary value it's not going to have an effect we could try this out as we take the bitwise and of these two values to one so we put a 1 here same case here here and here and we keep doing this to obtain the following results which notice is the same as one of our values meaning that taking its bitwise end with the ones didn't have an effect and so in our image taking the bitwise end of these two regions would also have zero effect which means we've successfully masked our canny image to ultimately only show the region of interest the region traced by the polygonal contour we can implement this by setting mast image is equal to C v2 dot bitwise and and we'll compute the bitwise and of both the canny and mask erase this Universal function implements the Python operator end finally we'll return masked image and we'll just set cropped image is equal to the return value of a region of interest we'll pass in the Kenny image as we did in the previous video and we'll show the corrupt image instead back to our terminal Python Lane's py and everything worked out accordingly we isolated the region of interest and masked everything else the final step of Lane detection will be to use the Hough transform technique to detect straight lines in our region of interest and thus identify the lane lines so far we've identified the edges in our image and isolated the region of interest now we'll make use of a technique that will detect straight lines in the image and thus identify the lane lines this technique is known as Hough transform we'll start by drawing a 2d coordinate space of x and y and inside of it a straight line we know that a straight line is represented by the equation y is equal to MX plus B nothing new so far just simple math our straight line has two parameters M and B we're currently plotting it as a function of x and y but we can also represent this line in parametric space which we will call Hough space as B versus M we know the y intercept of this line is 2 and the slope of the line is simply rise over run the change in Y over the change in X which evaluates to 3 given the y-intercept and slope this entire line can be plotted as a single point in Hough space now imagine that instead of a line we had a single dots located at the coordinates 12 and 2 there are many possible lines that can pass through this dot each line with different values for M and B you could have a line that crosses that with M and B values of 2 and 8 3 & 6 4 & 4 5 & 2 6 & 0 so on and so forth notice that a single point in X&Y space is represented by a line in Hough space in other words by plotting the family of lines that goes through our points each line with its own distinct M and B value pair this produces an entire line of M and B value pairs in Hough space what if we also had a point at eight and one once again there are many lines that can cross this point each line with different values for M and B all of these different values for M and B represented by a line in parametric space the point being whenever you see a series of points and we're told that these points are connected by some line ask yourself this question what is that line as previously mentioned there are many possible lines that can cross each point individually each line with different slope and y-intercept values however there is one line that is consistent with both points we can determine that by looking at the point of intersection in Hough space because that point of intersection represents the M and B values of a line consistent with crossing both of our points which in this case has slope and y-intercept of 4 suppose there is one more point than our image space at the point 16 and 3 this point is also represented by a line in parametric space each point in that line the notes different values for M and B which once again correspond to different lines that can pass through this points but notice that there is another intersection at the same point which means that the line with the following slope and y-intercept four and four crosses all three of our dots why is this relevant well this idea of identifying possible lines from a series of points is how we're going to find lines in our gradient image recall that the gradient image is just a series of white points which represent edges in our image space you and I can look at the various series of points in our image and automatically assume these points belong to a line this series of points belongs to a line so on and so forth but what are the lines what are their parameters how do we identify them well take these four points for example in our image space which correspond to the following Hough space what we're gonna do is first split our Hough space into a grid each bin inside of our grid corresponding to the slope and y-intercept value of a candidate line for example what if I told you these points belong to a line what is that line well I can see that there's points of intersection here there's some here and some here as well well all of these points of intersection are inside of a single bin for every point of intersection we're going to cast of votes inside of the bin that it belongs to the bin with the maximum number of votes that's gonna be your line whatever M and B value that this bin belongs to that's the line that we're going to draw since it was voted as the line of best fit in describing our data now that we know the theory of how we're going to identify lines in our gradient image you would think to yourself alright enough talking time the code well not so fast there's just one tiny problem we still haven't taken into account two vertical lines obviously if you try to compute the slope of a vertical line the change in X is zero ultimately will always evaluate to a slope of infinity which is not something that we can represent in Hough space infinity is not really something we can work with anyway we need a more robust representation of lines so that we don't encounter any numeric problems because clearly this form y is equal to MX plus B cannot represent vertical lines that being said instead of expressing our line with Cartesian coordinate system parameters m and B well instead express it in the polar coordinate system Rho and theta such that our line equation can be written as Rho is equal to X cos theta plus y sine theta if you're really interested in how this equation is derived feel free to ask me in the qat but the main idea is that this is still the equation of a line but in polar coordinates if I am to draw some line in Cartesian space the variable Rho is the perpendicular distance from the origin to that line and theta indicates the angle of inclination of the normal line from the x-axis which is measured in radians clockwise with respect to the positive x-axis let's look at some examples suppose I had a point with X position 5 and Y is equal to 2 as you know this is a point and many lines can pass through this point including a vertical line we used to define lines passing through our points by their slope and y-intercept M and B but now they will be the fine based on a row in theta if we want to measure the perpendicular distance from the origin to the top of the line the angle of inclination of the normal line from the axis is simply 0 which works out to a distance of 5 another possible line that could pass through our point is a horizontal line the perpendicular distance from the origin to our line would correspond to an angle of 90 degrees which in radians is PI over 2 which ultimately works out to a distance of 2 this line therefore is characterized by an angle theta of pi over two and a distance Rho of two just to strengthen our knowledge another possible line is the following whose perpendicular distance from origin to our line corresponds to an angle of 45 degrees from the positive axis that is PI over 4 radians which works out to a distance a row of about 4.9 the point of all this being is that previously a point in image space representing a line in Hough space whereas now with polar coordinates for a given points by plotting the family of lines that go through it each line with a distinct value for theta and Rho we get a sinusoidal curve this curve represents all of the different values for Rho and theta of lines that pass through our points this might look a bit intimidating but the concept is the exact same because imagine instead of one point we had ten points which in turn results in 10 sinusoidal curves as previously noted if the curves of different points intersect in Hough space then these points belong to the same line characterized by some Rho and theta value so this like before a line can be detected by finding the number of intersections between curves the more curves intercepting means that the line are represented by that intersection crosses more points in our case all 10 of our curves intersect at a single points which means that there is a single line with some Muro and theta value that crosses all ten of our dots we can look at a more specific example with three dots in our Cartesian which represent the following sinusoidal curves all three lines intersect at the same points characterized by a theta value of 0.92 radians and perpendicular distance of about nine point six six so that's the idea finding which line best fits our data in our case it's the one with the following parameters we can also apply the concept of voting that we discussed earlier such that our Hough space is still in the form of a grid and obviously this bin would have the maximum number of votes and just like before the bin with the maximum number of votes that's going to be your line whatever theta in our value that this bin belongs to that's the line that we draw since it was voted as the line of best fit in the scribing our data later on when we start implementing this we'll talk about the concept of thresholds but for now that is all for Hough transform we're just trying to find the lines that best describe our points and that's what we're going to use to find the lines that best define the edge points in our gradient image let's start implementing that in the next video previously we looked at the theory behind the technique possible lines from a series of points by looking at the bin with the maximum number of votes that is the maximum number of intersections inside the bin we discussed that the bin with the maximum number of votes is the line we draw through a series of points whatever theta and R value that this bin belongs to that's the line we draw since it was voted as the line of best fit and describing our data will start implementing this by detecting lines in the cropped gradient image by setting lines is equal to C v2 dot Hough lines P the first argument is the image where you want to detect lines which would simply be our cropped image the second and third argument specified the resolution of the Hough accumulator array the Hough accumulator array previously I described as a grid for simplicity but it's actually a two-dimensional array of rows and columns which contain the bins that were going to use to collect votes with each beam represents a distinct value of Rho and theta the second and third arguments are really important as they specify the size of the bins Rho is the distance resolution of the accumulator in pixels and theta is the angle resolution of the accumulator in radians the larger the bins the less precision in which lines are going to be detected for example imagine every bin in our array was so large this is way to course in the sense that too many intersections are going to occur inside of a single bin we need our bins to be sufficiently small the smaller the row and degree intervals we specify for each bin the smaller the bins and the more precision in which we can detect our lines yet you don't want to make your bins too small since that can also result in inaccuracies and takes a longer time to run so what we'll do is we'll specify a precision of two pixels accompanied by a 1 degree precision that's needs to be in radians 180 degrees is equal to pi radians so 1 degree will simply be PI over 80 that is numpy pi divided by 180 one Radian to demonstrate the effect of this early on here is a sneak peak of the end results when we finally detect our lines in this picture the bin resolution was a row of 20 pixels in 5 degrees whereas here it's 2 pixels with a single degree precision clearly this one is much more precise in its output so that's it for a resolution the fourth argument is very simple it's the threshold to find and display the lines from a series of dots we need to find the bins with the highest number of votes right once you find that bin you take its data in row valley and plot the line however how do we know which bins to choose what's the optimal number of votes where we can say okay draw the line that corresponds to this bin well that's where the threshold comes in threshold is the minimum number of intersections needed to detect a line as previously mentioned in a series of points the points of intersection and Hough space represent the theta and row values of lines that are common between a series of points for every point of intersection of vote inside of the bin that it belongs to which represents a line with some value for Rho and theta there's 500 sections here so five votes and we assign a threshold of three the number of votes in this case exceeds the threshold and that is therefore accepted as a line that describes our series of points if we assign a threshold of some large number let's say twelve in this case we don't have sufficient intersections in our bin to say that the line belonging to this bin describes our data and is therefore rejected in the case of our gradient image we'll have a threshold of 100 which I found to be an optimal value such that the minimum number of intersections in have space for a bin needs to be 100 for it to be accepted as a relevant line in describing our data the fifth argument is just a placeholder array which we need to pass in so just declare an empty array not much to it the sixth argument is the length of a line in pixels that we will accept into the output which we'll declare as a keyword argument min line length is equal to 40 so basically any detected lines traced by less than 40 pixels are rejected and lastly there is the max line gap keyword argument which we're going to set equal to 5 this should be capitalized this indicates the maximum distance in pixels between segmented lines which we will allow to be connected into a single line instead of them being broken up that's all guys we just set up an algorithm that can detect lines in our cropped a gradient image now comes the fun part which is to actually display these lines into our real image what we'll do is we'll define a function def display lines which takes in an image onto which will display the lines as well as the lines themselves and before giving this any logic we're going to go right back here and set the line image equal to the return value of our function which we're going to specify momentarily but we'll just set it equal to the return value for now display lines pass in our lane image as well as the detected lines and back to our function similar to what we have in region of interest we'll declare an array of zeros line image is equal to NP dot zeros like with the same shape as the lane images corresponding array so it will have the same dimensions as our image although its pixels will be completely black as all then we'll have zero intensity and now all of the lines that we detected in our gradient image will display them on - there's a black image this is a three dimensional array to check if it even detected any lines we have to check if the array is not empty that is if lines is not none if it's not empty we'll loop through it four line in lines if you print each line that we iterate through print line back to our terminal run Python lanes that py if you print each line notice each line is a two-dimensional array with one row and four columns what we're going to do is a reshape every line into a one dimensional array such that line is going to equal line reshape and we're going to reshape it into a one dimensional array with four elements and you know what instead of setting line is equal to line dot reshape for we can simply unpack the array elements into four different variables x1 y1 x2 y2 and now what we'll do is take each line that we're iterating through and draw it onto our blank image thanks to open CV we can write CV to line this function draws a line segment connecting two points well draw our lines on the line image the black image we just created the second and third arguments specify in which coordinates of the image space that we want to draw the lines so the second argument will be the first point of the line segment which is simply x1 y1 and the third argument is the second point of the line segment x2 y2 all right so we've specified the coordinates in which we want our lines to be drawn with respect to the image space the next argument is what color we want the lines to be so we'll specify a BG our color of 255 0 and 0 this should result in a blue color since the red and green channels will have zero intensity and finally the line thickness which is going to have a value of 10 obviously the higher the value the thicker the lines and that is all all of the lines we detected in the gradient image and our cropped image we just drew them onto a black image which has the same dimensions as our road image now let's just go ahead and return the line image return line image and over here we're going to show the line image instead of cropped image and back to our terminal well rerun our code Python lanes dot py and as expected it shows the lines that we detected using Hough transform and it displayed them on a black image the final step is blend this image to our original color image that way the lines show up on the lanes instead of some black screen so what we'll do is we'll go back to our code and we'll set combo image I'm sure I could have come up with a better name than that but combo image is equal to CB - dot add weighted and so what we're going to do with add weighted is take the sum of our color image with our line image which should now make sense to you as to why the background of line image is completely black since that would signify pixel intensities of 0 and by adding 0 with whatever pixel intensities are inside of this image inside of Lane image the pixel intensities for that image would just stay the same it wouldn't change because 0 plus anything doesn't really make a difference it's only when we add when we blended the pixel intensities of our lines to the original that we'll see a difference since they actually have a nonzero pixel intensity value anyway first argument is going to be the lane image and we're taking the weighted sum between the arrays of these two images will give our Lane image a weight of 0.8 basically that's going to multiply all elements in this array by 0.8 the creasing their pixel intensities which makes it a bit darker it will be more apparent while we're doing this momentarily the third argument is the second and put our rate of the same size and we know it's the same size because we gave it the same shape as our Lane image array but it's all zeros in stud but anyway let's put our aligned image array and we'll give that a weight of 1 multiplying all elements in this array by 1 and now when we add these two arrays up this one will have 20% 2 more weights which means that the lines will be more clearly defined when we blend the two in this image is going to be a bit darker thereby better defining the lines that we're blending it into finally there's the gamma argument where we can choose some value that we'll add to our sum we'll just put a scalar value of one it won't really make a substantial difference and that is all we detected lines in the gradients placed these lines on a black image and then we blended that image with our original color image so if we replace this with combo image run the code Python lanes that py indeed blended both of our images such that the lines are displayed right on top of our lanes that is all for identifying lane lines you learned how to identify lines inside of a gradient image but the Hough transform technique and then we took these lines place them on a random black image which has the exact same dimensions as our original road image thereby by blending the two we were able to ultimately place our detected lines back on to our original image in the next video we'll further optimize how these lines are displayed in the last lesson we detected lines from a series of points in the gradient image using the Hough transform detection algorithm we then took these lines and place them on a blank image which we then merged with our color image ultimately displaying the lines onto our lanes what we'll do now is further optimize how these lines are displayed it's important to first recognize that the lines currently displayed correspond to bins which succeeded the voting threshold they were voted as the lines which best described our data what we'll do now is instead of having multiple lines we can average out their slope and y-intercept into a single line that traces out both of our lanes before getting into it it seems that there is some inconsistency in the code this should be image so as to properly reference the argument and not Lane image the same thing here make sure to reference the argument image not the global variable Kenny it shouldn't have made a difference since they both correspond to the same value the same case for this one but it's always good to be consistent so as to avoid bugs also not a good habit to reuse a variable names so we'll rename this to Kenny image and change it over here accordingly all right we'll start this lesson off by going over here and setting averaged lines is equal to the return value of some function that will declare later on average slope intercept that will be our function name and we'll pass into it our colored alene image as well as the lines that we detected and now we'll simply define the function right on top Def average slope intercept with argument image and lines and what we'll do first is we'll declare two empty lists left fit is equal to an empty list right fit is also equal to an empty lists left fit will contain the coordinates of the averaged lines on the left and intuitively right fits will contain coordinates of the line which will display on the right what we can do now is loop through every line as we did previously for line in lines and reshape each line into a one dimensional array with four elements line dot reshape four and now we'll unpack the elements of the array into four variables where x1 y1 x2 y2 will equal the four values in the array respectively nothing new so far these are the points of a line when you're given the points of a line is very easy to compute the slope by calculating the change in Y over the change in X subbing that into our equation to then determine the y-intercept well to determine these parameters in code what we can do is set parameters is equal to numpy Polly fit what Polly fit will do for us is it will fit a first degree polynomial which would simply be a linear function of y is equal to MX plus B it's going to fit this polynomial to our X&Y points and return a vector of coefficients which describe the slope and y-intercept the first argument is where you will place the x coordinates of your two points x1 x2 the second argument is where you will place the Y coordinates of your two points y1 and y2 and we'll fit a polynomial of degree 1 to our X&Y points that way we get the parameters of a linear function if you go ahead and print parameters into our terminal python Lanes dot py for each line that we iterate through it prints the slope and the y-intercept the slope is the first element in the array the y-intercept is the second element so what we can do is set slope is equal to parameters index 0 and we'll set intercept is equal to parameters at index 1 and now for each line that we iterate through we need to check if the slope of that line does it correspond to a line on the left side or a line on the right side to determine this it's important to note that our lines depending on which side they're at are all roughly more or less going in the same direction all the lines here are slanted a bit to the left and all the lines here are slanted a bit to their rights here's our image displayed with x and y axis notice that in the image the y axis goes downwards along the rows and if you remember from basic high school math a line has a positive slope when y always increases as x increases so these lines would have a positive slope as their change in Y over the change in X would result in a positive value although for these lines since as x increases Y decreases their slope value would therefore be negative since there are change in Y over the change in X would result in a negative value final conclusion being lines on the Left will have a negative slope lines on the right will have a positive slope so what we can do is back to our code we can write if slope is smaller than 0 if the line we're iterating through has a negative slope value will append it into the left list left fit dot append and we'll append each slope and y-intercept as a tuple slope intercept otherwise else will append it into the right list right fit dot append slope and the y intercept if we print the results outside of the for loop print left fit print right fit back to our terminal Python lines dot py now we have a list that contains all of the slopes and y-intercepts of the lines on the left side and another list that contains all the slopes and y-intercepts of the lines on the right side what we want to do now is average out all of these values into a single slope and y-intercept back to our code we'll do that for both sides but we'll start with the left side we'll set left fit average is equal to num pi dot average and we're going to average out all the values of our left fits and it's really important that you specify an axis is equal to 0 imagine this was an array of multiple rows and two columns what we want to do is operate vertically along the rows to get the average slope and the average y-intercept respectively do the same thing on the right side right fit average is equal to numpy average we'll do that for a right fit with an axis is equal to zero all right we'll go ahead and print left fit average print a right fit average and we'll go ahead and just label them for clarity left and writes back to our terminal we get back to erase this array represents the average slope and y-intercept of a single line through the left side and this array the average slope and y-intercept of a single line through their right side we're not out of the woods yet we have the slopes and y-intercepts of the lines that will eventually draw but we can't actually draw them unless we also specify their coordinates they actually specify where we want our lines to be placed the x1 y1 x2 y2 for each line so what we'll do is we'll define a function deff make coordinates with argument image and align parameters with some return value this return value is going to denote x and y-coordinates of the line so back here we'll set left line equal to the return value of May coordinates we'll pass in the respective arguments the image as well as the slope and y-intercept of our left line left fit average same thing for the right line right line is equal to make coordinates image right fit average back to our function we're going to unpack this list of two elements whichever one that's being passed in the slope and intercept into two variables slope intercept is equal to line parameters we'll start with y 1 the initial vertical position of our lines this one is pretty obvious since we want our lines to start at the bottom of the image before we do this if you go ahead and print the shape of the image print image dot shape into your terminal will rerun this code it prints twice since clearly we're calling the function twice but regardless recall that the shape corresponds to your arrays dimensions in this case this represents the images Heights width and number of channels we're only interested in the heights which also corresponds to the bottom coordinate of our image as demonstrated by matplotlib a little counterintuitive that the highest volume is on the bottom of the y-axis but another way to interpret this is to have the y-axis going vertically downwards from 0 to 700 which makes sense since images are simply arrays of pixels and array indices are read from the top down we'll set Y 1 is equal to image dot shape at index 0 since that represented the heights and now y 2 is going to equal Y 1 times 3 over 5 we'll make this into an integer type and so essentially this will be 704 times 3 over 5 which is going to evaluate to 422 which means that both of our lines will start from the bottom at 704 and goes three-fifths of the way upwards up until the coordinate 420 all right now x1 can simply be determined algebraically we know that Y is equal to MX plus B so X is equal to Y minus B divided by M if you rearrange the variables so we can simply set x1 is equal to y1 minus the intercept divided by the slope we'll make this into an integer as well and the same thing for x2 except that we have to replace this with x2 and y2 now that we have all of our coordinates we'll return them as a numpy array numpy array x1 y1 x2 y2 so both of our lines are going to have the same vertical coordinates they're both going to start at the very bottom but they're going to start at the bottom and go upwards 3/5 of the way up until the coordinate 424 but their horizontal coordinates are obviously dependent on their slope and y-intercept which we calculated right here we finally have our lines we can return them as an array return numpy array left line and right line and now comes the fun part which is instead of our line image being populated by the Hough detected lines we're going to pass in the averaged lines if we show the line image instead run the code this looks a lot smoother instead of many lines they were all averaged out into a single line on each side back to our code obviously were still blending the line image with the color image so let's show that instead combo image back to our terminal I'll rerun this and it displays our two lines on our two lanes we took the average of our lines and displayed one line on each side instead this looks a lot smoother than earlier one more thing before we end this lesson is that previously we were passing in a three-dimensional array the Hough lines into our display lines function but now we're passing in the average lines that we created and we know that when we iterate through over this lines array each line is already a one dimensional array so there's no need to reshape it into one it already is one dimensional so feel free to remove there reshape for it's just extra code it doesn't really make a difference whether you have it or nuts and better yet we can simply unpack each line into four variables over here and delete that and that is all let's rerun it to make sure we didn't make any mistakes and everything still works out accordingly in the next lesson well use the code that we currently have and take it up a notch by identifying Lane lines in a video welcome to your last lesson of this section and the last lesson we finally finished our line detection algorithm and identified Lane lines in our image what we'll do now is use that same algorithm to identify lines in a video this is the video and we'll use the algorithm we currently have to detect Lane lines in every single frame this video you can access from the following github link and now this github link you'll find any description below labeled as video link or you can feel free to just type out the link it's up to you now anyway once you're on this page click on test to that mp4 and download it alright and upon downloading it make sure it says test to dot mp4 so as to stay consistent with the videos and once you download it to wherever you have it download it drag it in to your project finding lanes if you are using the Adam text editor don't try and open the video with Adam otherwise it will freeze alternatively we could have placed this video in desktop or some other directory and just referenced its path but we'll go with this for now just to keep things quick regardless to capture this video in our workspace we need to create a video capture object by setting a variable name cap is equal to C v2 dot video capture and we'll capture the video test2 dot mp4 and while cap dot is opened this returns true if video capturing has been initialized we'll enter into a loop where we will first use the read function kappa read to decode every video frame and what this returns is two values which we can unpack the first value is just a boolean that we're not currently interests sit in so leave that is blank the second value is the image the frame that's currently being projected in our video and it's the current frame of our video where we'll be detecting lines so what we'll do is actually copy and paste the code the algorithm we already have for detecting lines we're not going to do it all over again and wherever it says Lane image we're gonna replace that with the current video frame over here here in here as well make sure to get all of them otherwise your code will not make any sense pretty easy all we did was apply the algorithm we already implemented but instead of a static image we're doing it on a video since we're not working with images anymore you can go ahead and comment out this code or delete it it's up to you I myself will comment it out and now with em show we're still showing the current image that's being processed as for weight key if we leave this at weight key 0 you're going to be waiting infinitely between each frame of the video so your video would just freeze up instead we want to wait one millisecond in between frames to then display the next one now before we add any logic to break out of this loop let's just run the code by going to our terminal and running Python main stop py and alright it identifies the lines in every single frame of our video that's pretty cool and this is nothing new it follows the exact same process as how we detected lines in the image we're still applying the canny algorithm to get the gradients detecting lines with Hough transform and then averaging them out except now it's being done in every single frame of the video repeatedly inside of a while loop that is all for this section now just to finish this lesson off if I rerun the code and try to close the video it doesn't work we need a way to actually break out of this for loop and not just to wait until the complete for it to dismiss so we'll go back here upon pressing a keyboard key we want the video to close so we'll put this inside of an if statements such that we're still invoking the wait key function we mentioned that it waits one millisecond in between frames but what it also does is it returns a 32-bit integer value which we can compare to the numeric encoding of the keyboard character that we're going to press the numeric encoding we can obtain from the built-in function Ord and the keyboard character that we're going to press will be q and we'll just set a comparison operation between the two ultimately when we press the keyboard button Q the comparison of these numbers will evaluate to true and once they are equal we'll break out of the loop and once we break out of the loop we'll close the video file by calling cap dot release and we'll also call CV to destroy all windows to destroy the window that we're currently on to close its rerun the code Python means dot py press Q and everything works fine if the keyboard action did not work out for you a common trick is to apply a bitwise and operation with the hexadecimal constants 0xff just know that this operation it masks the integer value we got from weight key to eight bits which ultimately just ensures cross-platform safety when making our comparison it's still the same concept as earlier since when we press Q this comparison still evaluates to true breaking us out of the loop rerunning the code everything should still work as expected that is all for this section we used candy to convert our color image into a gradient image and from the gradient image we detected the most relevant lines averaged out the lines and then displayed them on the image that's currently being processed in the near future we'll be using a more advanced to the tech lanes but for now this is pretty awesome great job and making it this far and that is all to access the complete self-driving car course where you'll learn to build a fully functional self-driving car with deep learning alongside computer vision techniques it's accessible within the link in the description below I'll see you in there
Info
Channel: ProgrammingKnowledge
Views: 453,883
Rating: 4.9444447 out of 5
Keywords: OpenCV Tutorial, Python (Programming Language), Python 3.6, Python, Python 3.x.x, Python Tutorial, Python Tutorial for Beginners, Python for Beginners, Python course, python scripting tutorial, Online Course, Python Guru, Learn Python, PyCharm IDE, OpenCV, OpenCV Tutorial for Beginners, Computer Vision, Computer Vision Basics, Computer Vision Tutorial, Windows, Linux, Mac, Image Processing, Self-Driving Cars, OpenCV Python Tutorial
Id: eLTLtUVuuy4
Channel Id: undefined
Length: 86min 22sec (5182 seconds)
Published: Sat Oct 06 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.