3.2: Linear Regression with Ordinary Least Squares Part 1 - Intelligence and Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello welcome to another video in my machine learning series and in this video I am going to talk about something called linear regression so I'm just actually going to move straight over to the whiteboard it along here so why are we talking about linear regression so what I'm leaning towards and when I may get to if you keep watching these videos they don't exist yet but I'm going to keep making them so eventually you might keep watching them is I'm going to get to neural networks and neural networks are useful and powerful in the case of large datasets with many many variables many many inputs parameters that we almost can't figure out mathematically how to make sense of it may be a neural network can do that in some almost magical way we're going to get into all the details of that but there are machine learning scenarios where we can actually just calculate precisely using a statistical method the relationship between inputs and outputs right so if we were to review we have this idea of a machine learning recipe previously I looked at K nearest neighbor as a possible algorithm to make sense of input data and predict some sort of output whether it's classifying or predicting a price now type of thing so we have some sort of input we get some sort of output so let's take the simplest scenario of inputs related to outputs and a simple scenario for this would be something like a 2-dimensional data set okay so we could graph using something called a scatter plot a data set and we're going to make the data set there with me here a temperature the x-axis I want to think of as temperature so maybe I'm really sorry I'm going to do this in Fahrenheit to apologize for that I'll happily say max so anytime you want zero degrees to 100 degrees I guess it could be negative so in Fahrenheit and then the y-axis will be ice cream sales and this was suggested in the chat but I'd like to add to this sorbet in quotes for no reason really about the frame about the frame I back ok uh but you can see this right yes sir I just thought if I've got my head in front of it okay so uh yeah just because you know Gary doesn't really agree with me just in case you were wondering okay so I like serving okay so uh we can say like oh when it's you know 24 degrees they're only three ice creams are sold per day and then on another day it was 90 degrees and there were 18 and you know in another day there was 90 degrees there were this many and then you know you could imagine if you were the owner the purveyor of an ice cream shop II that you could keep track of your sales as it relates to temperature and then what you could do is that you have all this data now somebody comes into your place of business and says okay tomorrow the weather is going to be 50 degrees what could you like could you make a guess as to how many ice creams you're going to sell and so we could look and say here's you know 50 degrees well there was some other day where I sold this much was 50 some other day some of the day how could we make a prediction well this is a scenario where it appears there is a linear relationship a linear relationship between temperature and sales the higher the temperature the more the sales the lower the temperature the less the fewer the fewer the sales so the idea of a linear regression is to figure out how can we fit a line best fit a line to this data and I could look at this and hold on I'm going to be back in a second for some magic I got business if I waited like a historic moment not really because that's be ridiculous this is the first time I've ever using a marker with a different color YouTube channel maybe suddenly I'll just get so many subscribers because if you're like I heard there's this channel with tutorials in a white board where you colors okay so we could make a guess and I could say like look that looks like a line that kind of fits the data that's me just as a human being kind of eyeballing it so now if I wanted to say you know when the temperature is 95 degrees I could just look at 95 degrees find this and find the corresponding you know you know 200 ice creams or whatever sold so this is the idea of linear regression looking at a data set and fitting a line to that data set now how do you do this there are many different methods and we're I'm going to look at multiple methods in different videos in this video I would like to discuss the method called ordinary least-squares remember I'm going to write this down ordinary oh boy I got a little dizzy everything's going to be okay least squares what does that mean okay so if we look at this line we can compare every single this is the line that we've fit to the data you know but as a human being eyeballing it one thing that I can do is I could say look how different is each one of these data points from the line and I could see like I could essentially like look at its distance from the line and the idea of ordinary least squares is the least squares method is we want to find the line that minimizes all of these distances so what if we okay so if we could think of all of these as data points like you know X 0 X 1 X 2 X 3 X X 4 right we could think of all of these distances as like d0 d1 d2 d3 d4 so if we took all of these distances and squared them d0 squared plus v1 squared plus D 2 square etc etc etc and added them all up together that's the sum of all the squares of all differences we want to minimize this value so how do we calculate the bill how do we find a line that minimizes all those so you might be asking well why are you squirtle why you squaring the values well this is a common technique you know you'll notice that some points are below the line and some points are above the line so the difference could be positive or negative squaring it gets rid of that difference okay so how do we do this like I said there are a variety of methods and what I'm going to do is I'm going to show you a formula for this which I have written down another historic moment I prepared for today's video this was by preparation I wrote down the formula I click what N that I memorized it by editing this out if this is still in this video then I did not pretend okay so let's look at so first of all how do we represent mathematically this pinkish reddish line so the formula for a line is typically written as y equals MX plus B I will point out however that if you look in the statistics textbook you might see something like y equals b0 plus b1 times X this is the same exact formula M refers to our b1 here as the slope and B 0 or B here which is the quote-unquote y-intercept which is the value where the line intersects the y-axis so the slope this M value determines like which way to the line point and then B y-intercept is how high or low it you know where is that line relative to the x axis so all we need to do is we need to both calculate n and B so here's the thing this is a you while we're looking at this you know most data sets that you might work with are just simple 2d data sets there might be you know there's temperature there's you know population of the city that the stores in maybe you know there's the hours that it's open I don't know you could think of like all sorts of other data inputs that might relate to the sale of ice cream and this can actually be generalized much of you know this this could be y equals B 0 plus B 1 times X 1 plus B 2 times X 2 so there could actually be multiple linear written is referred to as multiple linear regression and generally you know the same math that I'm going to show you applies to this scenario but it typically involves a matrix based calculations maybe I'll do that in a different video for it's simpler to look at in just this context with just one input but we can extrapolate that and you could think about it instead of you know if instead of a line you're fitting this to a plane right if you had this as you know if there was just simply one other to two data to input pieces of data okay how we do it so far it's still here okay so now let's look at this formula look you can see I wrote it down and this piece of paper maybe I can auction this off on ebay or me nobody cool want it okay I'm going to need that piece of paper okay so here's how we calculate M the slope we calculate it as the sum so I'm going to use this Greek letter Sigma looks like an e but it's not an e it's a Sigma which means sum of X minus X with a line over it and I'll talk about what that means time Y minus y with a line over it you could call that Y bar I suppose divided by the sum of X minus X with a bar squared okay so let's think about what this means so first of all X bar or Y bar this means the mean or the average so what this really is is it's all of the X via values added up together divided by how many there are so you could think of X bar as being the sum and Sigma by the way mean um so I've got to kind of unpack that a little bit but the sum of every single X so X index I so x0 x1 x2 where I goes from 0 to N and being the total so this is kind of mathematical notation to say add up all the X you can think of it as an array right an array of data points add them all up and then divide it by the total number there is divided by n so this is really what X bar is it's just the average of all the X's Y bar is the average of all the Y's so this means get that average and then take each X minus the average times each Y minus the average and so this really is also these this Sigma should really also have I goes from 0 to n and this is X index I this is y index I we have this from 0 to n this is X index I as well so I'm not going to derive or approve this formula in this video although if I could find some supplemental information I'll link to it in this video's description or maybe in the comments you can offer a suggestion but you can kind of get a an intuitive kind of sense of why this formula works so first of all imagine if x equals y if the formula for the line we're just y equals x that would mean the slope would be 1 right the slope would be 1 if y equals x well look at this if y equals x then X minus X bar times X minus X bar would be like that squared so you can see how M 1 equal 1 if y equals x and then you could sort see the numerator is essentially the correlation Thank You Takei we've bought in the chat for typing it out like that cuz I think that's a good way of thinking about it you could see that you know if Y grow you know if it is y growing more as X grows or as X as Y growing less as X grows you can sort of see how this relationship is going to between the numerator and the denominator is going to give you a fraction that describes the slope of this line so think about that hopefully it has some intuitive sense and I'm sure people in the comments will write some nice explanations that help with that understanding so this is really it so what I want to do now in the neck I don't do this in the next video is I want to program this so I think what I'll do is I'll program it in such a way where a user can click and add data points and each time the user clicks I will I will implement this formula and draw the line of best fit according to the ordinary least squares method in canvas in the browser and after we do that I will talk about well what are some reasons why linear regression does it might not make sense for your data but this is the idea this is just just sort of trying to recap for a second the reason we're doing this is this is a model right the idea of a model the idea of a model is you try to fit it to the data we have known data training data temperature with actual sales we want to fit our model or model just as to parameters the slope of the line and the y-intercept and once we solve for those parameters we can make new predictions and even though it's kind of overly simplistic here this is the exact same process that we will that I'll employ again and again once we look at a simple perceptron then a multi-layer perceptron and then things like convolutional networks or recurrent net all of this all this is laying the foundation for more sophisticated robust machine learning based systems oops I forgot so for the formula of a line y equals M X plus B the slope is the more complex calculation actually once we have that slope it's pretty easy to calculate the y-intercept where is B and the formula for that is B equals y bar minus and times X bar and you can kind of see why this is the case right because remember y equals MX plus B so all I need to do is say B equals y minus MX right and we could just use the average of all the X and all the Y to figure out where should that line be shifted so this is a this is the formula to calculate that y-intercept so we can do this with a statistical method we run through all the data we calculate the slope we calculate the y-intercept and we have that line and formula for the line for which you can make predictions for new data okay so I hope this was helpful and made some sense to you and you have a I do of what linear regression is what the least squares method is and if you're inclined to continue just keep following to the next video and I will program this particular algorithm from scratch ok thanks very much see you there maybe [Music] you you
Info
Channel: The Coding Train
Views: 127,504
Rating: 4.8773351 out of 5
Keywords: live, programming, daniel shiffman, creative coding, coding challenge, tutorial, coding, challenges, coding train, the coding train, nature of code, artificial intelligence, itp nyu, neural network, intelligence creative coding, neural network artist, intelligence and learning, machine learning, machine learning art, linear regression, ordinary least squares, least squares, linear regression javascript, regression algorithm, linear regression math, least squares criterion
Id: szXbuO3bVRk
Channel Id: undefined
Length: 16min 43sec (1003 seconds)
Published: Mon May 29 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.