Tutorial 41-Performance Metrics(ROC,AUC Curve) For Classification Problem In Machine Learning Part 2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello my name is Krishna and welcome to my youtube channel so guys we are going to discuss about the performance matrix part 2 and in this particular media I'm going to cover about ROC and EUC curve now guys in my previous video about performance matrix have already explained you about confusion matrix have explained about recall precision true positive rate false positive rate apart from that F one's code FB does Co have many more things now in this particular video we will try to understand about ROC and AUC curve and remember guys are we see AUC curve is mostly used for binary classification problems and it is pretty much important that suppose if you are implementing a large stick regression and suppose your model has predicted some probabilities right so what is the threshold that you should decide so that by by default if I consider logistic regression it considers anything greater than 0.5 it is going to consider it as 1 anything less than 0.5 it is going to consider it as 0 right now understand one thing is that in each and every use case that you have this threshold can play a very very important role now let me just take a some examples about it suppose you are working in a healthcare domain project and the model that is predicting it is very very critical you know it may be a diseased patient it may be something so at that time you know what may happen is that the threshold value that should be kept based on the type of problem statement suppose you want higher true positive rate you want less false positive rate so based on that we can actually play with the threshold and again a domain expert a person will always be there will be able to guide you out but remember guys you need to show that how your model is actually performing in the form of some graphs then only the domain expert a person will be able to help you out so let us go and try to understand how to construct ROC and AUC curve what exactly it is and we'll just be discussing about this in this specific video so make sure that it was this video till the end so let us take an example here I have an output value of my use case which is why okay this is my actual value okay so I have values like 1 0 1 1 0 1 okay and my model is actually predicted of probabilities like 0.8 point nine six point four point three point two point seven suppose it has predicted like this now for constructing the ROC an AUC curve we will be considering some threshold bad news okay so suppose I start with zero then point two point four point six point eight and one right these are my threshold values okay so first off the thing is that I will try to suppose if I set up my threshold value as 0 then what will be my Y hat okay we'll try to understand so we know that 0.8 is greater than zero so definitely if it is greater than the threshold value it is going to become one so all these values will actually become one now once we have this right we will try to capture or we will try to calculate the true positive and the false positive rate so suppose if I want to calculate a true positive rate it is nothing but TP / TP plus FN right pretty much simple so here how many true positives I have see what is true positives suppose my actual value is one my output predicted value is one this two are true positive so I have one repository of 2 to positive 3 to positive n for true positives so total number of for true positives divided by 4 plus what is this FN false negative basically means that suppose if I have 1 over here my predicted value is 0 right that is actually false negative in this scenario I don't have those scenarios right so this will basically become 0 now I have 1 as my output fine I got my true positive rate and remember for ROC we quai about true positives false positive rate because for the construction of graph will be actually requiring it now let us go ahead and calculate my false positive rate now my false positive rate basically says false positive / false positive + true negative now in this particular scenario what is false positive when my output value is 0 my predicted value is 1 this is basically false positive so I have 1 over here - over here so I am going to write it at 2 divided by 2 plus whatever true negative true negative basically means if your value is 0 the output value is also 0 right so in this scenario it is 0 so my total value is 1 now understand one thing I'm going to construct a graph in the right-hand side in the x axis I have false positive in it away I have true positive rate this is my 0 comma 0 remember my false positive rate this is 1 this is 1 right now understand that when my threshold was zero I got my true positive rate as one false positive rate as one for this specific model so if I go and try to point it out it will come somewhere here so this is basically 1 comma 1 right my false positive rate is 1 my true positive rate is 1 when my threshold value is 0 right note it down now the next thing let me take the next value Y hat is point 2 right so anything greater than point two is going to become one in this case this will become 1 this will become 1 1 1 this will be 0 this will be 1 because here I appoint two only ok now if I go and calculate the true positive rate what it will be again how many true positive rate ok I have 1 over I 1 1 2 3 4 so my true positive rate will be 1 only ok what about my false positive rate now try to understand how many false positive are there only one false positive are there right so my false positive will become 1 divided by 1 plus what about my true negative now you can see there is a 1 through negative right so this will become 1 so 1 divided by 2 is nothing but 0.5 so my false positive rate is point 5 over here my true positive rate is somewhere over here so I am going to get this value and I am going to write it as point 5 comma 1 for my threshold value point 2 right now similarly when we start doing these things for all the values we will be getting some kind of graphs suppose my third point will be somewhere here and this will be somewhere like 0.5 from 0.6 for threshold of 0.4 ok we'll be constructing this graph ok then one more point will be coming over here right so here basically my false positive again this will be my for my threshold point 6 I guess 0.8 so this kind of graph will get created and finally one point will be here right now this whole thing is basically a ROC curve ok now you need to understand what is this a you see now when I connect all these points all these dots together all these dots together so when I when I mean when I connect all these things the curve that comes inside this is basically my a UC curve area under the curve and remember guys the more the area under the curve the better the model is yours right so instead of this so let me just also draw one line from the center okay and always remember a good model should always be greater than this particular area greater than this particular area it should never be less than this point five okay this particular line that you see right it should always be greater than this because if it is less than this it is basically a dumb model model which is just random guessing right and if the probability is less than 0.5 I can just write an if course and write that sometime I can say it as one sometimes I can say it is zero so this is basically my AUC area under the curve now still we did not decide what threshold values needs to be sure needs to be selected from this particular use case now understand one thing nice if I take this problem straight manna-fest go and take this particular graph and show it to a domain expert II the domain expert it will focus on saying that Chris we will be requiring higher true positive rate now suppose if I want to you know get more true positive rate or more true positive from this I can select two values one is this particular value if he says that Krishna focused on true positive then I can go and select this particular threshold the threshold of 0.6 because in this scenario you can see that my spr is zero my false positive rate is completely zero right but suppose he says that Chris come on I want high more higher than this true positive rate okay but I don't care about false positive then what will happen is that I go and select this particular value in this particular value I am getting one to positive rate but there is some amount of false positive rate but he's saying that I do not care about that right so at that time I can select this threshold values point two right now suppose he says that Chris I I need to focus on both true positive and false positive I need to see that particular result I can also select this one so it depends on the domain expert the person what they are looking at how your model is actually performing and what is the data that you have represent in front of him and remember guys if you are able to explain him in this particularly if you are able to show this particular diagram to him then he will be able to say okay my model is to focus on true positive it should be having less false positive rate so from here I can go and try to find out which is will be the threshold value and by that I can select that particular threshold value and based on that I can decide my whole model this is the way how you can actually interpret a ROC na you see curve guys a you see basically means area under the curve okay so this was all about this particular explanation many of you I know that you are waiting for this performance matrix part two still there is one more part which is part three I need to discuss some more things related to classification matrix which is will be pretty much important so I hope you like this particular video please do subscribe the channel if you have not already suspect I'll see you in the next video have a great day thank you one and all bye-bye
Info
Channel: Krish Naik
Views: 77,896
Rating: 4.9528747 out of 5
Keywords: upgrad, appliediacourse, machine learning, deep learning, data science, appliedaicourse
Id: A_ZKMsZ3f3o
Channel Id: undefined
Length: 9min 48sec (588 seconds)
Published: Fri Mar 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.