How to evaluate ML models | Evaluation metrics for machine learning

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

training your machine learning model with the data that you have is not enough you also have to evaluate it to understand if it needs to be improved or if it's going to perform well in the real world or not and to do that we use evaluation metrics depending on the type of problem that you have the kind of evaluation metric you're going to use is going to be different so in this video we will learn about evaluation metrics what they mean and in which cases to use them this video is brought to you by assembly ai assembly ai is a company that is making a state-of-the-art speech-to-text api if you want to try it out for free go and grab your free api token using the link in the description all right let's get started so let's first take a look at evaluation metrics that we use for classification tasks the most popular one is accuracy and the reason that is being used so much is because it's a very simple metric it's very easy to understand and it's also very easy to compare different models with each other because it's just one number that you need to look at accuracy is basically how many of the instances that you got right divided by the total number of instances that you have even though it's simple and very popular at the same time it might not always be the best thing for you to use because accuracy simplifies things a little bit too much and that's why you might need to look into things that are a little bit more detailed like precision and recall so precision and recall are classification metrics but they are mainly defined on true positives true negatives false positives and false negatives but this implies that the problem needs to be a binary classification of tagging the instances either with zero or one or true or false but you can actually use precision and recall for some other classification tasks too you don't only have to have two or false or only two different classes so we will look into what we would do if we are not dealing with a binary classification task but let's first define precision and recall thinking that we're doing a binary classification task precision is the percentage of correctly labeled positive instances out of all the instances that were labeled as positive whereas recall is the percentage of correctly labeled positive instances out of all the instances that are actually positive so basically precision tells us out of everything that i labeled as positive or as one of the classes how many of them were actually belonging to that class whereas recalls tells us out of everything that belonged to that class or you can say out of everything that was positive how many of them i was able to capture because they are slightly different they give us different perspectives to the same model and that's why most of the time they are used together and by looking at the precision and recall values it's easier to understand what's going wrong in the model and to also improve it as i mentioned sometimes your classification problem might have more than one class that you want to classify things into at that point you might want to use different approaches to your either accuracy precision or recall whatever you're calculating there are a bunch of different ways to do that one of the ways to do is to just calculate all of these values for all of the different classes and then take their average you can also weigh one of the classes or the others based on how important they are to get correctly and then again take the average it is important to decide on the correct way to calculate accuracy precision and recall based on the kind of problem that you have so before you go forward with those ones make sure that you check the documentation of the type of framework that you're using or the type of library that you're using and to choose the correct one for your problem another thing that you can use is called the f1 score it is a combination of precision and recall and it gives you only one number so sometimes people prefer to look at f1 score f1 score is basically a harmonic mean of precision and recall and this is how it's calculated but remember that f1 score is most of the time best used in combination with other metrics so for example the pr curve or the roc curve so let's talk about them now pr curve is called precision and recall curve it's basically a comparison of what the recoil is when precision is has this value and also vice versa so in this example graph for example we would want the curve to be on the top right corner because that's the point where we have a high precision value and also a high recoil value roc curve on the other hand is again very similar but this time you are comparing the true positive rate to the false positive rate by looking at these two different graphs you can understand how your model is performing better and sometimes you would hear people use the word auc area under the curve and sometimes they calculate this using the pr curve and sometimes they calculate this using the roc curve but either way you want this auc number to be as high as possible because then you know your model is performing the best and lastly for classification tasks we have cross entropy cross entropy basically calculates the difference or the distance between two probability distributions depending on the kind of problem that you have you might need to use binary cross entropy categorical cross entropy or sparse categorical cross entropy but these are different implementations that you can find easily for example in the keras library for example let's say you have a one hot encoded classification for your model and that goes zero one zero so your model belongs your instance belongs to the second class and if your outcome from your model is let's say 0.5 0.95 and then 0.0 that means that your model has classified it with really high confidence to be the second class but not fully so that's why calculating the distance or the difference between these two distributions is important to understand how accurate your model is so let's talk about regression evaluation metrics the first and the simplest one is called mean absolute error and as you can understand from the name also mean absolute error is basically the sum of all the errors but their absolute values because if you do not take their absolute values the minuses the negative values might just cancel out the positive values and you might get an error or the mean error that actually looks very small that's why first you need to take all the absolute values of your errors and then sum them up together and take their mean another way to calculate the error for regression problems is called mean squared error and the way you calculate it is basically taking the square of all the errors and then taking their mean by doing that you are letting the greater error values to impact this mean squared error value more by exaggerating their importance by getting their square but this value sometimes the mean squared error value sometimes can be a bit hard to understand that's why instead we use root mean squared error and that is calculated by again taking the square of all the errors getting their mean and then getting the square root of this final value so it's basically the square root of mean squared error by doing that you're still exaggerating the importance of the greater error values but still you're bringing it down to the scale of mean absolute error metric so that it's easier for you to compare these two metrics and understand what might be going wrong in your model r squared or coefficient of determination gives us a good measure of how well your model fits this data so it's basically the metric of how much the real values vary from the curve that your model came out with so if you look at this example let's say this line is our the the is the curve that our model came up with and the dots are the values of the real values of the instances that we have in our data so if all the values lie perfectly on the curve that our model came out with our r squared value is going to be one whereas if our model does not fit the data at all our r squared value is going to be zero and for anything in between where our model is fitting a little bit better or a little bit worse we're going to get a value between zero and one so for r squared the higher the value the better between zero and one and lastly we have cosine similarity cosine similarity is very similar to cross entropy metric which is for which was for classification problems but cosine similarity is for regression problems because it can deal with real values cosine similarity tells us how similar two different vectors are to each other and in this way we are able to compare the predictions to the real values these are of course not at all all of the evaluation metrics that you can use in your machine learning problems if you go visit some of the documentations of some libraries like scikit-learn or keras or tensorflow you will see that they have many more different evaluation metrics or different implementations of the evaluation metrics that we talked about here or different types of evaluation metrics that are based on the ones that we talked about here so it's always a good idea before you start your project to do some research to understand what evaluation metric you're going to work towards so i hope this video was helpful to give you a quick review of the possible evaluation metrics that are out there and what you can consider using on your next project if you like this video don't forget to give us a like and maybe even subscribe or leave a comment with your questions or comments or any ideas of what kind of videos that we can make next we would love to hear from you and before you leave don't forget to go grab your free api token for assembly ai's speech to text api using the link in the description thanks for watching and i will see you in the next video

Info

Channel: AssemblyAI

Views: 46,981

Rating: undefined out of 5

Keywords:

Id: LbX4X71-TFI

Channel Id: undefined

Length: 10min 4sec (604 seconds)

Published: Mon Jan 24 2022