Gradient Boost Machine Learning|How Gradient boost work in Machine Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
  • Original Title: Gradient Boost Machine Learning|How Gradient boost work in Machine Learning
  • Author: Unfold Data Science
  • Description: Gradient Boost Machine Learning|How Gradient boost work in Machine Learning #GradientBoost #GradientBoostMachineLearning #UnfoldDataScience Hello, ...
  • Youtube URL: https://www.youtube.com/watch?v=j034-r3O2Cg
👍︎︎ 1 👤︎︎ u/aivideos 📅︎︎ Apr 08 2020 🗫︎ replies
Captions
welcome to our full data science friends this is Amun here and I'm a data scientist in this particular video we'll try to understand another boosting technique known as gradient boost so in my last video the link for which you can see here I was talking about uttermost so adaboost is another boosting algorithm and gradient boost is the next boosting algorithm that I am going to discuss so basically I want to make you understand what are the differences between other boost and gradient boost okay so adaboost and gradient boost now if you have seen my last video and add a boost you would know that in adaboost algorithm lot of machine learning models are trained one by one and then the final model is created so how that happens is in adaboost the weights of the records the weights of records get updated right so what happens in adaboost is all the misclassified records on all the records which a particular model could not perform well or could not predict well their weights are increased betweens more weight ease are given to those records for the next model so if model one misclassified some of the records or model one is not able to you know correctly predict some values or the error is high then those records will be given more weight is in the next model iteration and this continues so final model is model 1 plus model 2 plus model 3 so this is how adaboost works so all boosting algorithm work on combining multiple models here also ingredient must also multiple models will be combined so what is the difference the difference is here the learning happens by adding the weights right so as I told learning happens by changing the weights but in gradient boost learning happens by optimizing the loss function optimizing the loss okay so what is this loss I'll talk about this in a moment before that the another difference be adaboost in Radian mode stays in adaboost that trees normally the trees which are built are basically stumps which means trees are not gone fully or trees are not grown below a limit so trees normally look like this one root and tulips okay whereas in gradient most the trees are normally leaf nodes so here there are only two leaf nodes whereas in gradient moost the leaf nodes will be normally the range of leaf nodes is between 8 to 32 which means that trees are bigger in size in gradient boost so there is two differences between and a boost in gradient boost number one and a boost my learning happens or you know the boosting happens by adjusting these weights of the records misclassified records on the other hand in gradient boost the learning or you know and sampling happens by optimizing the loss now let me talk a bit about what is loss if you guys remember how linear regression works using Euler's method ordinary least square method then you will be able to recollect that in linear regression if this is y axis and this is my x axis if these are my data points and this is my regression line then the errors of this regression model are these distances distance between actual values and predicted values right and we call these errors as e1 e2 like this so how ordinary least square method of OLS method works is by optimizing these errors so what is the meaning of that the ordinary square method tries to optimize even square plus e 2 square plus up to en square why we are squaring this is because some of these can be negative and some of these can be positive so we can make this equation generic with a summation we can say I is equal to 1 to N and E I square so how Wireless will work is ordinary square in linear regression it will try to bring this value as close to zero as possible and this process is called optimization of loss function now what is loss function this becomes the loss function for this particular model okay and this process is called optimization of loss now why this is important is the same concept is being used in gradient boost as well let's try understanding how is that okay so I will take a very simple data here and try to make you understand how gradient boost works okay just pay attention okay gradient boost is something which is not very well explained on internet and web okay so just pay attention here so let's say I take the age of a person I take the BMI of a person and let us say from age and BMI I want to calculate the height of the person or I want to learn this pattern okay now let me put some numbers here 21 22 23 BMI might be 24 might because Asics might be 30 and let us say height might be 174 176 169 something like this okay so let us say in this data these two age and BMI are my independent variables and this is my target variable which means this is a regression problem so we want to learn the pattern how height is dependent on age and BMI okay so this is what we are interested in learning now if we take this data and submit this data to a boosting algorithm so let me walk you through step by step how boosting algorithm will work so the very fundamental thing in boosting algorithm is it will need two things one is a loss function okay and another is a additive model okay so I'll tell you what are these things in a moment now this is the data so the very first thing the boosting algorithm does is it will try to compute the first residual so what is residual if you remember in linear regression I just discussed residual is the difference between actual value a predicted value in this context we do not have any predicted value so we're from the residuals will come so what boosting algorithm does is in step one or you can call it step zero in step one boosting algorithm will just compute the average of the target column this happens if it's a regression huge case if it's a classification used case then the concept of odds and probabilities will come okay but here to keep it simple let's start understanding what is the average height of this so we can say 174 plus 1 76 plus 169 by 3 ok is equal to let's say I'm just putting a dummy number here if it is 171 I'm not sure if this is the right answer let's go with that okay so 171 is the average height of this column now what happens if there is no model no learning in place so boosting algorithm in step one has used that there is no learning in place or no model which starts with the base model no model at all then what will be the predicted values of this so if I have to add predicted compel use here then all the predicted values will be 171 whatever we have computed the average here okay this is step number one this is step number two now we have actual sand we have predicted so we can have the residuals okay so what will be residuals here actual - predicted so this residual will be three what will be this interval well this residual will be five and this integral will be minus two okay now we have residuals in hand okay this is step number one step number two now I am rubbing this off just you can go back and see the data if you want so these are the actual x' these are the predicted and these are the residuals okay having said that at the next step what gradient boost will do is gradient booster will fit a model on this particular residual so residual becomes the target column and all the in visual columns of the data which I have rubbed off age and BMI those will be there as independent column and this residual becomes the target column and a model is fit on this residual okay so let's call it residual Model one now how this residual model will look like is let us say this is a diffusion tree it looks like this and this residual model one is fit on this as residual as target as a next step what will happen is these residuals will be predicted okay using this model okay now using this model we will get the new value of the residuals received well predicted so let us say this model learning happens and then the value predicted for this Ness it was three is let us say three point five for this residual five is let's say four point two for this minus 2 is minus one I am just putting some dummy numbers here so this model is predicting the value of residuals this now what happens as next step is the prediction gets changed which means if you remember this column is just the average of target column in the beginning step zero here what will happen is this one seventy one one will change now how it will change is the new prediction the new prediction after first iteration will be 170 one plus there is something called learning rate okay so learning rate is the rate at what speed or at what you know shift you want to change the predicted value so this value I am keeping it zero point two one for simplicity so it will be zero point two one into the residuals difference okay so here what is the new residual computed so this is three point five now whatever this output is this becomes the new predicted value okay if you have some confusions let me repeat again step one compute the average of the target column step two if we have the actual and if we have the predicted we call that average prediction okay so actual predicted we will have a residual fit a model on this residual as target and independent columns as columns of the data we get a model this model predicts the residual now we have the predicted residual now what to do not update the default prediction okay so this default net Excel 171 gets updated by the new residual 171 plus learning rate in 23.5 okay so this is how I present one gets completed now we will have new numbers here once we have the new numbers then obviously the residuals will change because we have the actual value and we have the predicted value and then residuals will change and the same process will repeat okay again our M 2 will come residual model 2 will come and the role model 2 will update the prediction again so if you see carefully what is being trying to happen here is we are trying to come closer to the actual values so we made some assumption that this is just the average then updating that assumption with the new residual values then further updating that assumption with one more residual values so let me show you how the final model will look like so this is how the final prediction will look like final prediction will be base value which in our case is the average value so let's say that value is 171 plus learning rate into first residual prediction by residual model 1 okay so first residual prediction by residual model 1 in our record 1 was 3.5 and this learning rate is zero point 2 1 so zero point 2 1 into 3 point 5 plus learning rate into second residual prediction by residual model - so if residual model 2 gives the prediction as 3 point twos and zero point 1 into 3 - and this will keep on continuing till how many ever trees we want to grow in the boosting model so if we grow hundred trees then hundred of these will come these terms and finally the prediction happens okay so I hope you have understood this boosting algorithm now gradient boost is it is a topic which is as I told in the beginning not very well explained on the web if you have any doubts any questions just don't hesitate to drop me a comment I will definitely respond to you and understanding this is very important guys because if you write these algorithms in your resume and go for interview people will grill you on you know these details what is a learning rate what is base value what is first residual model how the most thing happens these kind of things they'll ask you hence I want you to be very confident and clear on these funders okay so how did you like this video let you know through likes and comments I'll see you all in the next video with practical of this till then take care [Music]
Info
Channel: Unfold Data Science
Views: 23,758
Rating: undefined out of 5
Keywords: Gradient Boost Machine Learning, how gradient boosting works, How gradient boost works in machine learning, adaboost vs gradient boosting vs xgboost, adaboost vs xgboost, adaboost vs gradient boosting, gradient boosting, gradient boost loss function, gradient boost error optimization, unfold data science
Id: j034-r3O2Cg
Channel Id: undefined
Length: 14min 10sec (850 seconds)
Published: Tue Feb 18 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.