Comparing Machine Learning Classification Algorithms Accuracy in Python | sklearn

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome back to my channel and this amazing video i'm aditya in the last video we saw about how to create a decision tree algorithm and connect it with our next app so we use fast api for that so what i thought was it would be great to just compare other machine learning algorithms so mostly the classification algorithms so i thought maybe we can compare decision tree random forest k n n na and neural network and also may not maybe knife base classifier so i won't be dwelling in the detail of each algorithm so i will assume that you have the knowledge of them uh i might tell you in brief what each algorithm means so what we are going to do we will just use this to just get to know like which algorithm is best in this scenario or which algorithm should be used for our scenario uh also there are various amazing ways to get the best algorithm so i came across another one which was k fold cross validation so that's another way uh but here i will be using uh trend test split method so i'll be splitting the data into 80 training 20 test and i'll be using train and test method to just find which like just predict this accuracy of each of the algorithms so without any further ado let's begin so we have our csv file as we have seen in the last video so if you haven't seen the video make sure you find the link in the description below i will put it over there so in that we are using this input parameters except the year to predict in which country the car should be made so so far so it's all working great with decision tree algorithm and let's now rewrite this function so what we will do is we will create another new file so we don't disturb this file so we'll say [Music] classification elbow dot ui it's a long name so let's copy and paste it over here or rather it could do a class based approach but it's okay i could create an algorithm class and then call those class individually over here that's fine as well but let's let's go with only simple thing let's let's keep it simple so what we need here now is we don't need this function of course so let's uh do like this so we need to always like just do it over here we don't need this lines because we are not doing anything over here uh we will just remove this for now so we don't need this and we have the cast or csv that's fine and we will need this line as well targets brand numerical keep this and yeah this target association is no more required so let me tell you in quickly what we are doing here so we are importing pandas we are importing the file using pandas so it's a csv file then we are encoding the label because system understands only the numeric language numeric uh values so this us europe and japan as there are three classes they will get a numeric values with this label encoder we are dropping the column of year and brand then we are refining our input we are then setting our targets and then just dividing the data into training and testing for each input train input test output train output test and then we are just using our uh models we are first initiating the instance of the model then training it over here and then just predicting it and that result printing the prediction so that's pretty simple so the tree we have decision tree classifier so let's now let's change this to decision tree decision tree model and let's do over here as well decision tree model fit and here instead of input output i'm gonna do x train so i'll be sorry not trade test so x train and y train okay and here i'll be predicting the result using our test data secure test and here i will say for vibrating i'll say why decision tree okay now let's print the core object so that's a pin and then i could sorry and then all i need to do is accuracy of decision [Music] tree algorithm is let me spell it over here so it is and what will be the accuracy it will be nothing but so we could use the metrics from sk learn it's an amazing uh library so scikit it's an amazing library scikit-learn in python to use all the possible machine learning algorithms we could so and it's make it pretty simple like you can the testing the scoring the usage the training it makes it pretty pretty simple so let's use that so we will use the matrix from this sk learn and we will save metric sorry it should be like this that's the matrix dot accuracy score and what we want to score again so we want to score against our y test and our actual prediction which is why decision prediction decision tree this one this is our actual prediction this is our test prediction so now we will see what do we get out of this so if i run this we are getting the accuracy of 0.77 so let's do it into like let's multiply this into 100 so we get out of 100 so it will get it in percentage simply so you're getting 79 percent of accuracy now let's do the same now let's try it for random forest so from sk1 we will import and symbol and here so let's club this together so we need to know so this is for decision tree now let's do it for random forest equal to uh this will be ensemble dot random forex classifier we are dealing with the classification algorithms so we will keep it simple and default state for all so random forest dot and then x strain and white rain then we will have y random forest equal to random forest dot predict and that would be x test and then we'll just copy this line and print it over here and here instead of accuracy score of y decision tree we will say y random forest and here instead of random decision tree algorithm we will say random forest algorithm okay so let's see what we get let's run this quickly and it will take a bit of a time and you'll see like the random forest has way higher accuracy than decision tree algorithm of course once you have more amount of data these values will differ and in train and split this values definitely differ so if i do it again you'll see the values will differ so it depends upon like because this splitting is random so it's kind of like uh the accuracy will differ here and there but the main point would be here is random forest is doing way better than decision tree now let's see another algorithm and let's do it quickly so we'll do neural networks then we will need knife base and let's try k n as well nearest near neighbors so we are including from scalar we are using neural network we are taking knife base and we are taking neighbors this neighbors will be for k nearest neighbors k n algorithm so let's go over here and let's copy this lines and we will just need to modify them so one thread for a nearest neighbor one for knives and one second neural network so we have random forest here so you will change this to neural network uh it will be problematic so here we say neural network and here we want so i'll just remove this we want mlp classifier so once we have this we just need to copy this over here here you need it to be over here let's do y and then for neural network and then predict with the neural network after that let's try for night base so knife base and here we will use these uh dive bits so it's a night base and we will be using the gaussian knife is classified so what's in knife which classifier and then we will just paste it over here here and let's do this nb last we needed for nearest neighbors so let's do k n k nearest neighbor and here we will need to use neighbors dot a nearest classifiers i can give the value of nearest neighbors so i'll put it by default five so like i guess it's by default five i guess but let's do not give any value and let's see how k n algorithm gives different accuracy based on the end visuals so let's go over here do it here and let's do y k n so we have this all now let's print it over here so this two so this would be for our neural network and this would be here y n n this will be for our knife base knife base this would be y and b and here we will be having our result of canon this would be y k n now what do you think which algorithm would have the highest accuracy so let's see so let's run this and if you see decision tree has 77 percent almost random forest has 90 percent neural network has 62 percent knife base has 69 percent almost 70 percent i would say and knn has 64 percent so if you see random forest is doing way better over here so as it's like it won't be fair if i don't say what exactly random virus algorithm is so you can consider it's nothing but a collection of lots of decision trees and when it comes to classification it uses or it makes the prediction based on those classes which gets the maximum prediction from each decision tree so suppose there are five decision trees in any random forest and each then there are two classes let's say yes or no if three are saying yes two are saying no then it gives yes as a higher vote it's in very very simple words and mathematically speaking it's a bit complicated but in simple words it just chooses the best possible like like the one which get which one class which gets the highest score it comes as a uh winner it's as good as saying when you're going somewhere for a travel you have several several of your friends like where to go and the place which gets most of the votes from your friends you choose that for your travel destination it's same with random forest so it stands to the same thing uh so i guess that's pretty much it for this video so i'm not sure if i have missed anything so [Music] uh matrix and simple okay we haven't tried uh the svm algorithms let's try that actually so let's do uh i guess it's svm from sklearn so before we conclude our video video let's try for svm as well so here i'll just copy this i'll say svm i'll go naming convention it's going here and there with us so let's sorry i did something wrong no so here now let's do so from svm we need to get dot svc so we will need to also change the heap over here so let's say y suvm sorry svn actually not assembly svm and let's do the prediction for that as well of the accuracy score actually so let's go over here and svm then we have the accuracy of y svm and now let's see how much accuracy this algorithm gives to us so lcvm here is giving around 62 percent so it's but obvious like uh most mostly like when it comes to classification it's generally random forest is mostly used and uh yes i was i wasn't surprised with this we have reserved because i knew that it will give you it will give the less accuracy but uh a bit surprise like neural network is giving more lesser accuracy than svm let's try again one more time so if i run again and neural network is 58 while svm 62 yeah uh this is a bit surprising for me so let me know in the comment section why do you think that this is like which one was surprising for you also let's try with the k n let's try to change the nearest neighbors so let's say n [Music] uh i guess it's neighbors and then let's try from 2 and let's again n value for two neighbors so 71 percent let's run again [Music] 71 percent let's change the neighbors to poor and let's see what we get so okay so as we are going like as we are increasing the value of k the accuracy is going down and down so let's do three so 71 okay so i guess the best value for here is two to three so in that range and yeah i guess that's pretty much it for this video hope you enjoyed this video if you feel that this video is for sharing with your network please do share it with your network and if you haven't subscribed to my channel please make sure that you subscribe to my channel also if you haven't liked this video yet uh please hit the like button and do check out the link in the description for the decision tree algorithm that we did in fast api so you will come to know what exactly each step is doing here before we write this like training and testing things so that's all from me till the next video goodbye
Info
Channel: WebDevWithArtisan
Views: 3,154
Rating: undefined out of 5
Keywords: machine learning, knn, decision tree, random forest, nn, naive bayes, python, sklearn
Id: X1CrNnPLZJ8
Channel Id: undefined
Length: 15min 47sec (947 seconds)
Published: Sun Aug 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.