Machine Learning: Testing and Error Metrics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi and welcome to this tutorial on machine learning testing and error metrics my name is Luis Serrano and I work at Udacity actually doing this teaching machine learning and that's a picture of me so anyway today we will focus on this two questions the first one is how well is my model doing so let's say you've already trained the machine learning model and the question is is it good or not so we're gonna learn metric to tell us if it is good or not and the second question is once we figure that out how do we improve it based on these metrics so let's let's dive right in let's look at some data we have some data here in the form of some blue points and some red points and we want to train the model on it and the model would be something that splits the data so our our simplest model is this linear model which is a line that cuts the data into blue one line and red it makes some mistakes because there are some red points in there in the sort of blue area and some blue points in the red area but in general it's a pretty good model but let's look at another model a more complex one so I hire problems at higher degree polynomial and this is how it goes that one is seems to do much better with the points it gets all the blue points on one side and all the red points on the other one so here's a question which model is better between these two I'll let you ponder that question and let's let's see some methods to to find this out so what we're going to do is called casting we're going to do it we're going to not use every single one of her points for training instead we're going to split them in two we're going to take a small portion of these points and call them testing so here the color points are training we used to train the algorithm let the model and the points that are wide inside there for testing so they're not we don't see them when we train the algorithm but we look at them later to test how they how the other model did so here's again our our simple linear model is this line that splits the point in two we have two points in two and now let's again look at the data width between training and testing and train a higher degree polynomial model here so the model does this it does as we said he only looks at the training points and it does pretty well on the training points it dumb actually gets them all correct and now when we want to test the model we just forget about the training set and we evaluate on the testing set now how does the model in the left to on the testing set well it only makes one mistake which is this red point that is from is classified as blue it's highlighted in yellow and the model on the right makes two mistakes because it thinks it's our four red point as a yellow pass a blue point and a blue point as a red point so actually when you look at the testing set it turns out that the model in the left is better than the one on the right and the reason for this is that the one on the left just generalizes better whereas the one on the right this kind of memorizes the data without really finding it an interesting rule so anyway that's uh that's testing in a nutshell now there are some very important rules about testing I call them the golden rules so here it goes golden rule number one those shall never use your testing data for training so when you take your testing data you put in the corner of the room and never look it again until the very very very end never use your testing data to train your model or to make decisions about your model you'll see later how easy is it to to mess this up accidentally golden rule number two says friends don't let friends use their testing data for training so again don't use your testing data for training and golden rule number three says think not what your country can do for you I'm kidding don't ever use your testing data for training so never ever ever ever ever use your testing data for training now this this looks a little sad because it looks like we're taking some points and kind of throwing them away and not not using that information very well so there is there's a way to kind of get around this so let's look at our training points are these green points and the testing points there are these yellow points and let's be a little redundant so again green training yellow is testing let's take our data and split it into k equal sets here k is 4 so we take our data split it into four sets and we use one of the sets as the yellow one for testing and the other three for training and then we train a model again using the second little little set for testing and the others for training and then we do this again and again and then at the end we average our results so basically each each portion of data is used for for training and for testing and that actually works pretty well normally you would want to randomize the points so not not take the first when I first came in a second you actually want to do this randomly but if you see here every every point is used for for training three times and testing another time anyway so that's it for testing sets now we still have the question how well is the model doing so let's look at some metrics that will tell us how our models are doing and this can be a tricky question let's think about it what is a way to tell if a model is doing well so first thing at the top of my head would be saying well if a model is correct most of the time then that model must be pretty good but this can be a bit tricky let's look at an example the example is called credit card frog so we have a bunch of data in the form of credit card transactions and some of them are good like the ones on the left and some of them are fraudulent like the ones in the right let's look at numbers so there are two hundred and eighty four thousand three hundred and thirty five good transactions and 472 fraudulent transactions is actually real data so let's let's think of a model and let's try to come up with a model that's really good so a model that is correct most of the time can you help you think of a model that is correct over ninety-nine percent of the time so here here's a model that's correct over 99% of the time the model that says that all the transactions are good so how many times are we correct well we're correct two hundred and eighty four thousand three hundred thirty five times out of two hundred eighty four thousand eight hundred and seven times and that's ninety nine point eighty three percent so this model will be pretty good right well not really I'm not catching any of the balance on the point of the model is to catch the fraudulent transactions so let's look at another model what about can we get a model that catches all the bad transactions can we think of one well here's one I'm going to call all transactions fraudulent so that's great right now I'm catching all the bad transactions that a good model well no it's a terrible model because I'm also accidentally catching all the good ones so it's tricky to to try to think of how many times I'm correct and how many times I'm wrong so let's go to another two examples that will illustrate this we have a medical model so the medical model is basically a doctor that tells you if you're healthy or if you're sick that's our first model and our second model is going to be the spam classifier model so the span classifier model looks at emails for example on the Left we have an email from from grandma saying I baked cookies that's not them obviously and then from the right we have an email that says earn lots of cash from a weird address and it looks very sketchy so that's going to be a spam model and so this is our second model so let's look at what it's called a it's called a confusion matrix this is a table where four things can happen right the rows say are you sick or are you healthy and the columns say are you diagnosed the sick or are you diagnosed as healthy and so if you are on this box it means you are sick and your diagnosis six of the model did well is called a true positive if you are on this box you the model was also good because you're healthy and you were diagnosed as healthy if you're here you're called a false negative now that means you are sick but you're diagnosed is healthy and if you're here you're called a false positive means you're healthy and diagnosis sick so those are our means we're going to use a lot to positive false negative false positive and two negative and the confusion matrix given given some data like for example 10,000 patients we're going to look at how this model is doing so there are 8,000 through positive sick patients that are correctly diagnosis sick there are eight 200 false negatives which are the sick patients that are diagnosed healthy there are eight hundred false positives which are healthy patient's diagnosis sick and eight thousand two negatives so and healthy patients diagnosed is healthy okay that's a confusion matrix now let's look at our second model so we have on the Rose we have spam and not spam and on the columns we have our sent to spam folder or send to inbox because we're going to take it at the day then the messages that are not spam ascends to inbox so we have our true positives or false negatives or false positives and our true negatives and now we look at Thousand Needles that's our data and we have that a hundred of the scramble ins or correct we sent to spam folders through positive 170 of these panels are accidentally sent to a box as false negatives 30 of the not spam were accidentally sent to your spam folder which is false positive and 700 of being not spam were sent to the inbox so let's look at a more graphical example more graphical example is that the model from the beginning this blue is positive and red is negative so maybe you can help me out filling this this table how many true puss positives are there I'll give you a minute feel free to pause the video if you want to think about it so the true positives are the points that are already positive and get positive so this one's right so that means there's six of them how many ah two negatives do we have well we have five points that are negative and also diagnosis negative so that's five what about false positives well these are the points that are negative so they're red but the the model thought they were positive so it's these two over here and finally the false negatives are the ones that are positive but the model thought were negative so it's this one over here so one so that's a confucian matrix for the linear model we saw at the beginning so let's look at the first of our metrics the metric is called accuracy and accuracy is the answer to the following question out of all the patients how many did we classify correctly so here the accuracy is the number that we classify correctly which is the ones on these two boxes ten plus eight thousand so nine thousand divided by the total number which is ten thousand so it's 90 percent the accuracy of the email model is out of all the emails how many do we classify correctly so it's the ones in these boxes which is one hundred plus seven hundred divided by a thousand which is the total number of emails so it's 80% and finally in our linear model again the accuracy is the number of correctly classified points divided by all points so how many points are correctly classified well the ones that are positive in the model Kappa Phi is positive plus the ones that are negative in the model correctly classified is negative that's these 11 points divided by the total which is 14 so 11 divided are 14 that's 78 point 57% but now as we saw in the credit card model accuracy is not the best metric so let's actually study the model a little more to see what metrics to use so let's look at only the mistakes let's forget about the cases when it diagnosed a sick person is sick and a healthy person is healthy so we have two things that can happen to miss we can make we can diagnose a sick person is healthy which case we get false negative or we can diagnose a healthy person is sick in which case we get a false positive so which one of these is worse I'll let you think about it from it so the the problem the false positive would be if we take a healthy person diagnose them as sick and then we send them to do more tests so that's kind of annoying but the false negative means there's a sick person and we're sending them home with no treatment so that's much worse so in this case our false negative is a lot worse than a false positive and now let's look at the spam detector model so again let's forget the times one be a model got it right and focus on the mistakes so the two possible mistakes are a false negative which is taking a spam message and accidentally sending it to your inbox and a false positive which is when an email from your grandma comes in and the model accident presents it to spam folder so let's let's think about it again let's take a few seconds to think about it which one of these two mistakes is worse so if you said false positives you're correct right because the false negatives so it's just kind of annoying but it's okay just means you've gotten a spam email sent to your inbox and you have to delete it manually whereas the false positive that'd be terrible the poor poor grandma she she learned to type an email only so she can send you an email to let you know that she baked cookies and you go and delete it so in this case false positives is much much worse than false negatives and in that way the medical model and the family detector model are fundamentally different right and the medical model is okay with false positives but not okay with false negatives whereas the spam detector is not okay with false positives and it's okay the sauce with false negatives so in principle the medical model is is trying to find all the sick people and it's okay if it finds some extra people it's fine whereas the spam detector says well I don't necessarily need to find all the spam but whatever I find it better be spam so we call the medical model a high recall model and the spam detector a high precision model those are the two metrics that we're going to measure recall and precision so let's define precision okay and I put have an X there because that's what we really really care about right that the sick people that are diagnosed healthy that's that's the number that we're trying to avoid so precision answers the following question out of all the patients that we diagnosed with an illness how many did we classify correctly okay so precision is this row because this row are the ones that we diagnose is sick so it's a thousand that were correct versus 1800 which is the ones who diagnosis 6os 55.7% is not a very precise model but again we're avoiding that red X so that's that's okay what is the precision of the spamming Tector model so see the red X now is number on the on the bottom left because that's that those thirty are really bad that we want to avoid so again precision set out of all the emails sent to spam box how many were actually spam so again this column is precision so it says 100 that are correct divided by 130 so this this model has seventy six point nine percent of precision and and then this model needs high precision so that's that number better be they let's go to our our linear model what about here precision all the points were predicted to be positive how many are correct so let's let's stop take a few seconds to think about so the points that were predicted to be positive are only these ones so we only need to focus on these and now precision is the number of true positives divided by the number of true positives plus possible to this so it's the ones that are correct which are these six divided by all of them which are six plus two which is eight so this precision here is 75% now let's look at the second metric which is recall so recall in the in the medical model recall is answering the following question out of the six patients how many did we correctly diagnose the six so remember precision was out of the patients were diagnosed the sick how many were actually sick now recall is out of the patients that are sick how many do we correctly diagnose the sick now this is this row and this row is is grabbing the red X so this is we see that this is an important metric for this model recall is from the 1200 sick patients how many do we diagnose correctly well a thousand of them so thousand divided by twelve hundred eighty three point three percent this model better have good recall because we want to catch all the sick people and the recall of the email model well it's saying how many out of all the spam emails how many were correctly sent to the spam folder it's this row over here so we have a hundred correctly sent to spam folder divided by one hundred seventy a very low recall but again we're worried about avoiding that red X in the bottom so this model doesn't need to have high recall and again linear model let's see how how it's doing with the recall so it says out of all the points labeled positive how many did we correctly predict so which points are labeled positive the blue points and so recall is through positive so how many were correct among these divided by all of them so it's 6 divided by 7 6 + 1 + 6 divided by seven is eighty five point seven percent so that is precision and recall so as a summary we have that for the medical model the precision is low is fifty five point seven percent the recall is high as eighty three point three percent this is a model that's supposed to have high recall and for the spam detector model the precision is seventy six point nine percent and the 37% but spam detector model is supposed to have high precision now the question is do we want to be carrying two numbers around we want to be carrying precision in one pocket and recall in another pocket and always looking at both we kind of want to have this one where we have we want only one score right so how do we combine these two scores into one can you think of a way well a pretty simple way is is taking the average right so let's take the average of precision and recall so on the Left we get sixty nine point five and on the right we get sixty six point nine five and that's that's an okay metric but I'm sure you're probably thinking in your head is like that's not much different than than accuracy right that's not telling too much and the way to see what what this average means and how it's doing is to try it in the extreme example so there's a credit card fraud example so again we have some good for credit cards and sections and some fraudulent ones these are how many we have and let's pick our terrible model number one it says all transactions are good so what's the precision well the precision is out of the ones we classified as bad how many are bad that's 100% because technically we never been classifying it as bad ah now what about the recall well the recall is how many of the bad ones did we catch and that's zero because we didn't catch anybody so 0% so the average between precision and recall it's not 50% so the average in 100 zero now the question is do I want to give that horrendous model a score of 50% it seems like a very high score for such a bad model so I kind of want to give it a zero so how is not that good let's try the opposite let's try the model that says that all transactions are fraudulent what's the precision of this model well I correctly diagnosed four hundred seventy two out of all of them so it's 0.16 percent and the recall is I caught all of them so actually pretty good recall 100% what's the average within press between precision and recall well it's slightly over 50% again a terrible model with a 50% score that's not good I kind of want to give it a zero too so average seems like it's not the greatest thing but there's another kind of type of average then we can learn and then this is called the harmonic mean so let's say we have two numbers x and y texas on the bottom right on top and we have the arithmetic mean here it's X plus y divided by 2 and there's something called a harmonic mean which is defined as 2 X Y divided by X plus y and it's kind of an average 2 in the sense that if is the numbers are equal as x and y are equal we get we get X or Y but it actually it's a mathematical fact that it's always less than the arithmetic mean so it's kind of closer to the smaller number than to the higher number let's look at an example if my precision is 1 and my recall is 0 my average is 0.5 the harmonic mean if you plug it in the formula you actually get zero my precision is 0.2 and my recall zero point eight again the average is point five but harmonic mean is 0.32 okay so we're not going to use the arithmetic mean we're going to use the harmonic mean and that's when I call the f1 score and as such as we said the excellent score is closer to the smaller one so if one is small between the heart rate between the precision and the recall the f1 score kind of raises a flag whereas the the average kind of like says if one is bad but the arm is good then I'm so so and that's what we're going to use the f1 score book so let's see what the f1 score is for medical model so our precision 55.7 or Rico's 83.3 average 69 point five and then the f1 score is just this formula over here which ends up being 66 point seventy six percent and for span the vector model precision seventy six point nine recalls thirty-seven hours sixty six point nine five and the f1 score is this formula which gives us forty nine point nine six and finally for a linear model precision seventy five recall eighty five point seven average 80 and the f1 score is this formula which is eighty percent sounds like a little lame because the position of recall are very close so the average in the f1 score are very close to now let's look at precision recall for credit card fraud example so the bad model that does all transactions are good gets hundred percent precision zero percent recall f1 score is actually going to be zero if you plug in the formula and this actually something more general than f1 score is called the F beta score and it's kind of like a beta is any number that is higher than zero and if the beta is one means you're you're taking the a harmonic mean between precision and recall advice you're taking if your beta is small then you weight more on precision and if your beta is large you wait more and recall so if precision is over here for a spanned model and recalls over here for a medical model f1 score is kind of in between expo at 0.5 scores closer to precision and f2 score is closer to recall question where would the credit card fraud models be what would be a good metric for it closer to precision or closer to recall I like to think about it for a moment well the question is what do we prefer do we prefer to catch all the fraudulent transactions or do we prefer that everything we catch is a fraudulent transaction so what's worse if you accidentally get a text in your transaction we have a having fraudulent check this out or if a fraudulent transaction happens and the model doesn't catch it so in my opinion the second one is worse I don't mind getting the occasional text about a fraudulent transaction I have to make a call and approve but I really do mind if there's a fraudulent transaction on my credit card and it's never caught so this is a good score for this one it's going to be over here pretty close to the recall let's say maybe f10 score and in general that's a good way to do to analyze our models to say what do I want more precision or recall and then with that in mind we pick a number for the FS are for the F beta score let that resemble that okay so now that we've seen some metrics let's go to a different topic the different topic is the types of errors that we can make when we train a model so like in life there are two types of errors we can make right one is if we try to kill Godzilla with a fly swatter the other one is if we try to kill a fly with a bazooka so there are two different types of errors right the left one trying to kill Godzilla with a fly swatter it's kind of oversimplifying the problem right it's under estimating this the size of our problem that's called under fitting and the one on the right which is trying to kill a fly with a bazooka if the opposite is completely overestimating your problem and so that's called overfitting so let's look at a classification problem and the classification problem is this one so what does it seem like that the rule here is for for for classifying the data on the right and on the left let's you think about it well it seems like a good rule is on the right we got dogs and on the Left we got not dogs so that's that's a pretty good model right what if we set the following on the right we have animals and on the Left we got things that are not animals what's wrong with that model what's wrong in that model is that the model is too simple right where we're under estimating the problem we're not even correctly classifying the training set because here's an error right we got a cat and we try to fight a non animal but that's an actual animal so so this is we're kind of like simplifying the problem too much we're we're under estimating our data and that's like killing uh trying to kill Godzilla with a fly swatter right that's underfitting we're also going to call it error due to bias now what about the following for the following model now on the right we have dogs that are wagging their tail and on the Left we have anything but dogs that are wagging their tail what happens with this with this model well the model does pretty well on the training set because check it out everything in the right is correctly classified and everything relaxed correctly classified but what's the problem if we get a testing point it's this dog where does this where is at this point fit it kind of fits in the right with the other dogs but the model says it's going to be on the left because it's actually not wagging its tail so it's male 'king a mistake so what is wrong with this model well it's way too specific so it kind of memorizes the data instead of really catching a good rule and that's that's some overcomplicating a problem which is like killing a a fly with a bazooka so that's going to be called overfitting or error due to variance okay so those are the two types of mistakes we can make error due to bias or underfitting or error due to by variance or overfitting let's look at how they work with some data so let's look at this data that's here twice um it looks like the answer is kind of this right it's a quadratic equation that separates the red points and the blue points what if I were to think that a line can do it relies on much simpler solution ah that doesn't seem like the right answer and it makes a bunch of mistakes and the mistake we make this year is that we kind of oversimplified it with we try to solve a problem that is inherently quadratic with a linear equation so we underestimated the data we did under fitting which is like trying to kill got zero with a flyswatter what about this we have the data like this and I again the right model seems to be a quadratic equation that separates the red points from the blue points but instead we're going to look at a super complicated high degree polynomial that fits our data pretty well but as we saw in the beginning of this video that's not a good solution because even though it does fit the training set well it will it would not generalize well so it doesn't really it doesn't really learn a good model it just just memorizes the data so it's it's over complicating the solution and that's overfitting so it's like trying to kill a fly we saw bazooka so let's look at the three little bears on the Left we have a very very simplified solution that under fits on the right we have a very very complicated solution that over fits and in the middle we have just right just right it says dots and not dog seems to be the right way to classify this data so what happens with one on the left well it's bad on the training set because it it makes a mistake on the on the training set it get that cat gets it wrong on the other hand the complicated one is great on the training set it hits it really well and the one in the middle list is good on the training set what happens if we bring a testing set well the left one is bad on the testing set as well because if it does if it's not good with training set even the test answers were so here it's correctly classifies the dog but that doesn't mean anything the one on the right is bad on the testing set because it incorrectly classify set dogs whereas the good model is good on the testing set so that's how we tell them apart if you have if you have a model that it's bad on the training set and ban the testing set you have under fitting if you have a model that is great on the training set but terrible on the testing set you have overfitting and you want something that's good on both the training and the testing set so let's take that last observation and make a graph out of it it's called the model complexity graph so again we have the three little bears on the Left we have a high bias solution or underfitting which is trying to fit a degree one polynomial on some data that is clearly quadratic on the right we have the over overfitting or hi virus which is thinking that the solution is degree 6 when it's really degree 2 and in the middle we have just right which is the degree 2 solution to a problem that is inherently a degree 2 problem on the Left we have under fitting killing the tailor with the fly sweater on the right we have overfitting which is killing a slide with a bazooka and in the middle we have just right so we are happy so as promised we're going to turn this into a graph so again we have our three little bears the degree 1 degree 2 and degree 6 and let's recall that from the beginning that we we have the training set and the testing set separation of the data so here we're going to take the colored points as the training set and the points that are empty in the middle as the testing set and we're going to build the 3 models and look at their training and their testing errors so let's build a first model that's aligned and the question is how is it doing on the training set well it's making three mistakes they're highlighted in green so the training error is three and now let's see how it's doing on the testing set so it's making three mistakes on the testing set aside the yellow highlighted point so the testing error is three so there's put it on this grid training error is three testing errors three training is green testing is yellow and now let's do the same on the other two models so on that's on the second model the degree two with fit our polynomial how did it do well it made one training mistake this point over here and it made one testing mistake so this point over here so the training error is one and the testing error is one now let's look at the complicated degree six model how did it do in the training error will by nature over fitting model the very well training error so it did very well it made no mistakes how did it do on the testing set well it made two mistakes so our training our zero our testing error is - so we can see that the it's the rule that we said at the beginning right under fitting problems are bad on training set balance testing set overseeing problems are great on the training set but bad on the testing set and they just right ones are the one in the middle which is good on the testing set and on the training set so let's actually graph the the curse that join the the green points and the yellow points and there you go we have what's called a model complexity graph and the model complexity graph is the the curve with the training error which is green and it decreases the the curve with the with the testing error that goes down and then up and the way to tell we're just right model is is this point this point where they're the closest before starting to diverge so it looks more like this so here we have here we have it in a nutshell right on the Left we have our linear model that under fit on the right we have our high degree polynomial that over fits and the just right is the point here where they sort of start diverting so very very very simple what we have to do right is we have to try a bunch of models we have to train them on the training set and then we test them on the testing set and how do we decide which model is better between the linear quadratic cubic quartic well we decide what model is better based on the testing error how did that sound it's pretty good right oh wait oh no oh what's happening what is this oh now we did it we broke the golden rule just terrible thou shalt never use your testing data for training we've made a huge huge mistake we can't we can't do it right we can't make decisions based on the testing set the testing set leave it at the very end right you can't touch that can't touch the testing set so what are we doing what do we do if you can't make decisions based on based on testing set well the solution is cross-validation so before we have training set and green testing set and yellow now what we're going to do is we're going to cut a little more for set so we're going to take this brown set in between so we're going to use the training set for training your models the cross-validation set for making decisions such as degree etc and the testing set at the very very end for testing and so now this looks a lot better we use the cross-validation curve to error curve to to decide what what our perfect model is and this is how it looks on the on the dog example right on the Left we have under fitting so very high testing and training errors on the right we have overfitting so low training error high testing error and the just right point is here so this is this is kind of how it looks in in general well in general you may have like very complicated weird curves but it's kind of like this and this is this is the just right understand left and over on the right so a small summary for you just this is we take our training data we trained a bunch models and we use the cross validation data to pick the best one of our models so for example for training a logistic regression model we try it a model degree one degree two degrees three three or four and with the training data we train them all so we find that things like the slope of the line the code the coefficients of the polynomial etc then with the cross-validation data we say pick say the f1 score and calculate the f1 score all these models and then we pick the highest one with respect to the f1 score and then as a final step we use of testing data to test and make sure that our model is good so the parameters of the algorithm are our in this case where the coefficients of the polynomial but the degree is kind of like a higher parameter right so we call those hyper parameters let's see another example let's say we're training a decision tree so what are the hyper parameters tastes like death we have depth Eagles one depth equals two three and four then we use the training data to pay it to train a bunch of trees of depth one deaf two deaf three and F four so the parameters are then the numbers in the end obese and the threshold etc then we take f1 score and with the cross-validation set we we find the f1 score on each one of these models and then we pick the one that did better and then finally when the testing set we make sure this model is good so what happens if we have more than one hyper parameter right here we only have one which is death what if we're training a support vector machine so in a support vector machine we have things hyper parameters like the kernel which can be linear or polynomial for example and we also have the gamma parameter which if it's small we get solutions like this and if it's large we get solutions like that and so how do we pick the right combination between kernel and gamma well that's called grid search we literally just make a table with our columns are a linear and polynomial kernel and our our rows are different values of the gamma parameters so we take for example zero point one one and ten is normally good to pick some values that have increased exponentially just to kind of swipe the whole set of values so again we use our training set to train a bunch of linear model so much of polynomial models with different values of gamma then we use a crowd validation set to calculate the f1 score on all of them and then we simply pick the one with the highest f1 score and finally we use the testing set to make sure that what we did was good so in summary we have parameters and hyper parameters for certain algorithms I'm going to tell you what the parameters are when the high parameters are so if we have random force our parameters are things like features thresholds on trees etc as parameters are for example the number of deaths number of trees or the debt for things like logistic and polynomial regression parameters or coefficients of the polynomial and the high parameters are things like degree of the polynomial for support for support vector machines parameters or coefficients and - browser things like kernel the gamma parameter of the C parameter etc and for things like neural networks our parameters has also the coefficients in our neural network and our high parameters would be things like number of layers things like the size of layers inside the activation function and etc so just to summarize this is how I see machine learning you have a problem that's optics in your car then you have a budget tools that could be helpful for fixing your car or not and then you have a bunch of measurement tools which is what we learned today and you use this measurement tools or metrics to measure its tools performance then you pick the best tool to fix your car so in general this in machine learning you have your problem which is some data that you need to classify you have a bunch of algorithms you have things like neural networks or just regression for regression fvn decision trees rather forest etc you have a bunch of metrics things like you know multiply see graph your accuracy precision recall echo score based like learning curves which you didn't dialer today but they're also pretty useful and similar to the model complexity graph and so what you do is you use these ones to to test your models see how they're doing it's the best one and that's the one that you use to model your data then when you have a point you want to predict it well predicted based on that model so anyway that's that's it so thank you very much for getting through the end of this video if you liked it please subscribe we hit like you and I share comment I am very happy to see all the comments and I ask you thank you for anybody who commented and suggested ideas for other videos I'm working on a few other things um feel free to walk your message me my email is is this my link team is right there feel free to connect my Twitter is Luis likes math and if you liked this video I have a bunch more at as Udacity from the courses I teach there thank you very much and see you next time
Info
Channel: Luis Serrano
Views: 83,544
Rating: 4.9666829 out of 5
Keywords: machine learning, artificial intelligence, testing, training, data science, models, overfitting, underfitting, cross validation
Id: aDW44NPhNw0
Channel Id: undefined
Length: 44min 43sec (2683 seconds)
Published: Thu Mar 16 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.