Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is part two of logistic regression tutorial if you haven't watched the first part then you should watch that first in the previous tutorial we discussed about binary classification where the output classes are binary in nature they are either yes or no in this one we are going to discuss multi-class classification for example when you are trying to predict which party a person is going to vote for the possible outcomes are one of these three the concrete problem that we are going to solve today is to recognize the handwritten digit for example here this one maps to one of the output categories which is nothing but digit digits 0 to 9. similarly here for maps to this particular output category so we will uh use a training set with lot of hand digit uh characters and then we'll build a model using logistic regression and at the end of the tutorial you will have an interesting exercise to work on so let's uh jump straight into writing the code as usual i am going to use my jupyter notebook as an ide and here i have imported matplotlib and also scikit learns our data set so sklearn.dataset has some predefined ready-made datasets that you can use to learn machine learning from this i am using load digits data set so if you read the documentation all it is is 1797 97 handwritten uh digits uh of size eight by eight okay so it looks something like this and what we are going to do is given these digits we are going to identify that what what digit that is all right so let me just run it so this has run fine i am going to now call load digits method to load my training set basically and i want to explore what this training set contains so it contains couple of things it has data which is your real data so let's print few elements so as it's written in the documentation there are 1797 sample so i'm just going to print the first one and it's an array okay as such it is an eight by eight uh image but the image is represented as a one dimensional array so if you count these elements it will be uh 64 which is eight by eight and if you want to see this particular element then you can use uh matplotlib so i'm going to do plot plt dot gray and plt has a method called mat show and what you can do is you can print the corresponding image so data has uh the numeric data and images will have the actual images so you can see that our data 0 and image 0 they kind of relate to each other and the only difference between the two is that you have numeric numeric data here versus you have an actual image so if you want to print let's say first five sample then you can just print it like this and you will see that c 0 1 2 3 four okay and corresponding numbers will be in this data array so that looks pretty straightforward now what we're going to do is use this uh to train our model now before we do that let's uh take a look at target and target names okay so our target so if i print digit.target 0 let me print zero to five so you see like zero to five is literally in the sequence the first element is zero one two three and that's what this is printing here it is saying that this image is zero the last image this is four so this is our complete training set which has our image as well as the target variable you know like it says what it is so we can use our data data and target to train our model now before training our model the usual thing that we do is we import from model selection we import train taste split and we try to divide our data set into our training and test samples so the way you do it is you say x train x test i don't exactly remember the order of the argument so i'm going to what i'm going to do it okay let me do this so to interest split uh digits.data because that's your data set then you have digits.target because that's your target variable okay and if you hit shift tab it will show you all the nice documentation of that api so here it says this is the order in which it returns the output [Music] all right so now what i just did by executing this command is i had input account output variable from my training set and i divided them into test and train sets now the reason that we do this typically is we don't want to uh overfit our model we don't want to make our model such that we just uh bias it against the training data that's why the data that the model is trained against should be different than the data that uh the model is tested against okay so that's why we split these two so if you look at okay i have to supply the size so i'm going to probably supply taste size taste size so i want 20 percent of my samples to be test size and 80 percent to be the training okay so if i look at length of x train it is this and if i look at length of x this it is this so this is roughly 80 percent of all available samples all right so i have a training and test data set split now i can [Music] create my logistic regression model so from this i want to import logistic regression and create a model object so that you can train it later and you all know the way you train it is by calling a fit method and fit method you will call it against xtest train sorry ny train [Music] when you run that the model is getting trained using this x train and y train data set so again to repeat x train has the hand written characters and y train will have the corresponding output it will say okay for this image it is 4 etc now since my model is ready the first thing i always do is i calculate the score so the score tells you uh how accurate is your model and the way you do that is you have to supply x test and y taste so using the x test it will calculate the y predicted value and it will compare those y predicted value against the real value which is y test turns out that my model is doing pretty good the accuracy is 96.67 percent almost which is really good so now i'm going to make my actual prediction and you know that you have to call predict method for that now let's see so before i call this method what i want to do is i want to pick up a random sample so i will say plt dot mat show digits dot images let's say i'm just picking up a random sample okay hmm this is pretty hard even i don't know what this number is actually let's see so this number is actually digits dot target 67 so you have to access the same index in your target okay so this is six okay so let's see what our model will predict for this guy so i will say model dot predict okay model.predict what okay what do i want to predict i want to predict now see i'm not going to supply images here because image is all binary data my model likes numeric data more so i will use the same index 67 but i am using data instead of images okay this is the error you get when you're not supplying multi-dimensional arrays i'm just going to supply multi-dimensional array just for the sake of it and you can see that it is predicting the target variable all right okay let's just okay let me just create a new cell here and let me predict okay what do i want to predict okay i want to predict zero to five now you all know zero to five is literally zero to five so zero is zero one is 1 and so on when executed see my model is doing pretty good so my score is 0.96 how do i know where it didn't do well okay because all the samples i tried it seems to be doing pretty well so i want to know where exactly it fell and you know i want to get overall feeling of my model's accuracy and one of the ways of doing that is confusion matrix so i will show you what confusion matrix is really for that i have to import from this matrix i need to import confusion metrics okay and then before i do that i need to uh get the predicted values so i will say predict x taste when i run that i get all the predicted values for this x test okay and then i create a confusion matrix and in the confusion matrix what you supply is whitist which is the truth and then y predicted which is what your model predicted and then you get confusion matrix back when you run that you get this two by two dimensional array and you are wondering what the heck this is so this is better visualized in matplotlib or c bond right so i will use that library for the visualization here i'm just going to copy paste the code for confusion matrix visualization here i am using cbon library which is similar to matplotlib it's used for visualization and i'm calling a heat map here with the confusion matrix cm variable that we created here and when you run that this is the confusion matrix that you got now the way this works is see here you see 37 number what it means is 37 time the truth was zero and my model predicted it to be zero this two means two times my truth was eight meaning i fed my model the image of eight but my model said no it is one so these are the instances where it's not doing good so you can see that in in anywhere in this area in this area when you don't see zero it means your model is not working right so here for example again two times my images were off digit four but my model predicted it to be one so that's what this is so confusion matrix is just a nice way of visualizing uh how well your model is doing all right now it's the time for exercise today's exercise is going to be uh using sql on data sets iris flower data set which has following four features so if you don't know about iris iris is a type of flower and the flower has a diff two type of leaves you know one leaf is called one leaf is called sepal the other one is called petal and they have like a height and width and based on these height and widths you can you can predict what kind of iris flower it is okay so our data set will have three kind of flowers these are the names of three different iris flowers and the features that we have are these four which is basically petal width and height and sample width and height and you will use uh this data set the iris data set and you will load all those 150 samples then divide them into test and training samples and then build a logistic regression model and tell me the accuracy that you can come up with and then you can just do a few predictions uh using that model all right that's all i had for this tutorial i have the link of this jupyter notebook down below and you can find the exercise also so make sure to refer to those useful links and please please do some practice yourself just by watching this video you are not going to become expert alright thanks for watching
Info
Channel: codebasics
Views: 128,424
Rating: undefined out of 5
Keywords: sklearn logistic regression tutorial, logistic regression machine learning, logistic regression pandas, multiclass logistic regression, multinomial logistic regression, multiclass classification in machine learning, multiclass classification, logistic regression python, logistic regression, logistic regression in python, logistic regression tensorflow, multiclass svm, multiclass classification python, svm multiclass classification, keras multiclass classification
Id: J5bXOOmkopc
Channel Id: undefined
Length: 15min 43sec (943 seconds)
Published: Fri Sep 21 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.