Machine Learning-Bias And Variance In Depth Intuition| Overfitting Underfitting

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello on my name is Krishna and welcome to my youtube channel so guys today in this particular video we are going to discuss a very important topic which is called as bias and variance and then we are also going to discuss about topics like overfitting under fitting I probably think you have heard a lot and if I talk about just bias and variance you also heard about terminologies like high bias low variance low bias high variance like all this kind of terminologies will try to understand properly and we are going to take both the example of regression and classification problem statement and will understand these terms so let us take an example over here I have a problem statement with respect to x and y these are my points and our aim is actually to create a best fit line with the help of a linear regression and there are various different kind of linear regression like multiple linear regression polynomial linear regression here specifically I've used polynomial linear regression now when my degree of polynomial is equal to one that basically means this polynomial linear regression will just be acting like a simple linear regression so it will try to create a best fit line now based on this actual points and you know that guy's linear regression is a problem I mean it is a model which is actually just create a best fit line it is not suitable for you know the nonlinear separated points or nonlinear spread points right so over here my point is actually spread in this particular shape right it is in the shape of a curve but if I have degree of polynomial is equal to 1 I just get a best fit line now when I compute the R squared error definitely the a squared error will be high you know it will be on the higher side because again you understand this is my predicted point this is my actual point that if I do the summation of all the errors right it will be usually high right now suppose if I increase the degree of polynomial is equal to 2 in the polynomial linear regression what will happen is that now the best fit for line will actually become a little bit smaller curve right it will be a little bit smaller curve now in this scenario you can see that it is actually satisfying most of the training points and definitely the error is very very less right the error is very very less now let us go one more step ahead the degree of polynomial is equal to four now you can see that this is a condition where each and every point is exactly fitted to this particular curve line now let us go back ahead to this particular thing only when my error is very very high and I understand is we had created a model on a training data set and for the training data set it is giving a very high error so this scenario we basically call it as under fitting right under fitting basically says that for whatever data have trained my model there is quite high for that right I'm just not talking about my test data or the new data this is just only with respect to the training data even for the training data my error is very very high so this is basically an under fitting condition now let us go back to the last diagram oh here I probably think youyou know about this now since each and every point is getting satisfied by this best fit line right now this is a scenario where I can say it has overfitting I'll tell you why we are saying it as overfitting now just to understand guys okay now overfitting basically means what now with respect to training data this particular best fit line satisfies all the points perfectly right but just understand if we have some new points suppose my test points are over here suppose my new test points are over here suppose it is here it is here it is here it is here right now just understand when this best fit line satisfied properly on the test data again the error rate will be high in overfitting condition even though for the training data the icarus is quite high but for the test data the kracie goes down okay so what what I'm saying for the training data for the training data I can write it as accuracy is very very high but for the test data the accuracy is very very it's going down in this scenario what is happening in this scenario what is happening in this scenario my accuracy of the training data Micra see of the training data accuracy is going down right for the test data also the accuracy is going down right right I hope you have understood this in this case the training data the crease is very high but for the test data the crease is very low why so what's over here for the training need also the crease is less and for the test data also accuracy is less so that scenario we call it as under fitting this scenario we call it as overfitting our main aim should be in such that for the training data also migration should be high and for the test data or for the new data also my accuracy should be high and that is actually solved by this particular degree of polynomial is equal to 2 now in this scenario out of these three models you know I should be selecting this model in order to solve my problem statement now this is the most suitable model that we should select and in this particular model you can understand guys this particular model is you can say it as this particular model is basically giving us lower bias low variance low bias and low variance now is this let's discuss about this what is bias and variance I have just told you about overfitting and underfitting but what about bias and variance now let's go over here in the under fitting scenario in the under fitting scenario I basically have I basically have high bias and high variance always remember this thing's guys or and underfitting I always have high bias and high variance bias basically means the error of the training data let's consider think in this particular way okay the error of the training data variance basically says that it is the error of the test data okay so we have high bias and high variance obviously for the training data the error is high for that test data error is high that is what is under fitting condition now let's go back to the overfitting in this scenario I will be saying that over here I have low bias and high variance why low bias understand for the training data that is less right when therefore the training data the error is less we basically denote it as bias since the error is less we call it as low bias but for the test data we are getting a huge error so we are calling it as high variance okay I hope you have understood so for this particular scenario will be having low bias and low variance because for the training data also we are getting less error for the test data also we are getting less setup pretty much simple I hope you have understood this I hope you have understood it much more perfectly if you have not just rewind it you guys just rewind the video and try to understand in this also for the new test data also I will be getting a high error for this also I will be getting a high error right when I compare this this will be giving us a low error okay this was with respect to the regression problem statement now let us go to the classification problem stain now classification problem statement suppose I've used three models with three hyper pure parameter tuning techniques have have done some hyper parameter optimization for the first time my model one my training error was 1% okay classification problem statement basically means and trying to compare whether my model is able to do a binary classification like yes or no and usually you know that we use a confusion matrix for that right so over here suppose my model one gives the training error of 1% my model to model 1 gives the test error of 20% I just understand what kind of scenario this is one person basically means low bias right test error is high so it will become high variance so in this scenario what is the scenario where low bias and high variance if we have low wise and high variance what is the scenario this scenario is basically called as overfitting this scenario is basically called as overfitting right now let us go to the second model in the second model my training error is 25% my test error is 26% now in this scenario what do you think should be you know training error is given 25% again understand this if your model is just 7 percent accurate I think you should try to improve that particular model yeah so it also depends on the domain that you are working but right now I'm considering that this is actually a under fitting problem what is under fitting problem high bias high variance since my training error is high I'll say it as high bias if my since my test error is very high I am going to say that's high variance okay now next three in this model three in model three what is happening my training error is less than 10% my test error is less than 10% now this is the scenario that we are covering over here which is nothing but low bias and low variance so this becomes an under fitting problem this becomes the most generalized model right so I hope you have understood these things now this is pretty much important to understand what I'm going to do guys I'm just going to rub this diagram okay and I'm going to show you a general representation of this bias and variance how it is shown in graphical order okay we will try to understand that okay guys let us go ahead and try to understand the general representation of bias and variance I'll take the same example what I have actually taken over here in my x-axis is degree of polynomial over here in the y axis it is error so understand if you have an under fitting condition what will happen usually the error rate will be high so error rate for the training data will also be high error rate for the test data will also be high right okay now let us go and try to understand this overfitting condition in the overfitting condition what will happen in the old fitting condition if I take this particular example over here or let me just okay understand guys this red point is basically my training error this blue point is basically my cross-validation error or you can also say it is test error okay so we'll just try to write in this particular way okay now for the overfitting condition you can you know that I have low bias and high variance so my training error for the training data I mean for the training data it will become less so suppose I am going to mention this particular point okay now with respect to this particular point you you can see that for the test data it is high variance so the error rate for the cross-validation error will be high in the case of overfitting problem statement right when my degree of the polynomial hi in this case my degree of polynomial is high so I am just going to keep this particular point like this let me combine this particular point like this okay so I have combined this particular training error like this it will go like this itself okay fine now about this particular point okay you will be able to see that you will be also able to see that this points will also reduce like this at certain point and then after that it will again increase okay now this particular point that I was talking about it is nothing but high variance for the same degree okay high variance for the same degree because I told you till till at certain point again your cross-validation error will be reducing but after certain point you know since you are overfitting the problem statement it the accuracy sorry the accuracy or the error rate the error rate will actually be increasing so our aim is to actually click find out a model you know where this generalized formation can come this generalized model can be created like based on the errors that we are getting over here and this scenario is nothing but low bias and lower variance now in up from this particular graph you can see that this particular point is actually having low bias and lower variance pretty much simple guys so I have hope you have understood this is the general representation of bias and variance let us go ahead and take some examples with respect to decision tree and random forests and then we'll try to understand whether it is an or fitting condition or under fitting condition now by default you know that guys decision tree creates disguised row of trees itself right completely to his depths it takes all the features and then it starts splitting to its complete depth okay now when it does this this scenario is just like an over fitting condition okay this scenario is just like an over fitting condition now in over fitting you just split all the decision tree till is complete that definitely for the training data this may give you a very good result okay the error rate will be less but for the test data guys this scenario not work now this scenario this decision tree basically has a scenario of you know low bias low bias and high variance it has a scenario of low bias and high variance because understand because we are just training on the data on the training data Excel and we are splitted it to is completely depth trained enough for the test data definitely this is not going to give you a very good result you know so because of this we use techniques like decision pruning we try to only create the decision tree up till some depth you know after that that will not still split so that is the way of actually converting this high variance into low variance and again there are a lot of hyper parametric techniques hyper parametric tuning techniques guys and definitely you should go and explore or about decision tree regarding that now let us go ahead and take an example of a random forest in random forests we use multiple decision tree in parallel okay multiple decision tree in parallel and when we consider random forest guys when we consider random forest since we are using multiple decision tree in parallel we basically have a scenario of something like low bias and since we are since we are using decision tree again I am Telling You initially it will have this kind of property of high very low bias and high variance this will be the property with respect to each and every decision tree but since we are combining those decision tree in parallel because understanding random Forsch what is there it is basically called as bootstrap aggregation bootstrap aggregation in bootstrap aggregation what we do we take a data set we give it to multiple models okay we give we give this data set we get we don't get the whole records but we give a partial sum n number of records to different different decision trees okay and finally we get the output and we can aggregate it we aggregate it and then we see that who suppose many models are actually given a value as one then we'll consider that output is one if many model gives us the output as 0 we'll consider the output as 0 now initially since I was using many decision trees and each and every decision tree has a property of low bias and high variance if I combine this in a path this high variance will get converted to low variance okay how it is getting converted to ha from high variance to low V in it since we are using this decision tree parallely and remember this is my data set this is my model M 1 M 2 M 3 M 4 not all the data set is going right there only some data set that will be going suppose 20 record goes over here 20 record goes over here 20 record goes over here 28 the card goes over here suppose in my data set the total number of records are 80 so it will be splitted in this particular manner and then it will be trained on that particular water and output will be given to us how you should understand that the high variance which was present in one single decision tree since we are combining all the decision tree in parallel it is getting converted into low variance so this was just one example with respect to decision tree and random forest one question is that what kind of technique xg-- boost have doesn't have high way high bias and low variance or does it have low bias or low variance you don't basically answer me that please do comment down in the comment box of this particular video but I hope you have got the idea of bias and medians guys I hope you know now what is underfitting what is overfitting you know I hope you know that if somebody says low bias and high variance that is an overfitting scenario or underfitting scenario you should know that ok so yes this was all about this particular video please do let me know like if you have any other questions and I'll see you all in the next video have a great day thank you one and all bye-bye
Info
Channel: Krish Naik
Views: 115,107
Rating: 4.9377432 out of 5
Keywords: bias-variance tradeoff example, bias-variance tradeoff deep learning, bias-variance tradeoff python, bias and variance in machine learning - geeksforgeeks, bias-variance overfitting underfitting, bias-variance tradeoff analytics vidhya, bias-variance tradeoff medium, bias variance tradeoff andrew ng
Id: BqzgUnrNhFM
Channel Id: undefined
Length: 16min 53sec (1013 seconds)
Published: Mon May 04 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.