Decision Tree Algorithm | Fast API | Nuxt.js

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome back to my channel and this amazing video today we are gonna see using decision tree algorithm with fast api so we will be doing a fast api back-end where we have we will have a decision tree algorithm model and we will make a request from our next tab and get the most accurate predictions so without any further ado let's begin so before we dive into the code let's briefly see what is decision tree algorithm i won't be dwelling much in the theoretical concern because they are boring so i'll try to keep it as minimalistic as possible and try to just summarize the decision tree algorithm in as simple words as possible okay so decision tree algorithm is a predictive analysis algorithm okay so in machine learning there is descriptive analytics predictive analysis analytics and prescriptive analytics so decision tree algorithm is predictive analytics there are two types of decision trees which is classification and regression so classification tree as the name suggests is used when the predicted outcome is classes so for instance let's take an example suppose i want i have a car like without any logos or anything anything and based on certain parameters i want to predict which company this cars belongs to so it could be yundai or toyota or honda all those companies so each company name is nothing but the class okay so in this case like to predict the exact brand of that car we will use classification tree now let's take another example for the same car suppose if i want to predict the average of that car in that case i will use regression tree algorithm so here we are predicting the accurate number or the real number so in this case when we are predicting a number we will use regression tree another example of regression tree is pricing of houses now decision tree algorithm stands on two core concepts which is entropy and information gain to be precise entropy in this case we'll be using genie which is slightly different than entropy but the main purpose of genie and entropy is nothing but to measure the randomness so the measure of randomness is enthropy information gain on the other hand it relates the information theory and it refers to like it helps you to choose the best possible features to make decisions or the best possible attributes that renders maximum information about a class as a definition says so our aim here is to maximize information gain and minimize entropy okay so we have to maximize the information to reach the best possible class and we have to minimize the randomness so this reduction in randomness is nothing but information gain okay so let's take an example then it will be more clear so let's see let's take an example of playing tennis so i will play tennis suppose if i want to play tennis i will play when the weather is overcast so this is you can consider like a trained decision tree model so here if the weather is sunny or if it's rainy we are considering other factors which is humidity and wind okay in that case we are again considering that whether the humidity is normal or high when wind is strong and normal so in general when we choose weather as a root node our number of randomness or the measure of randomness which is entropy becomes low and our information gain becomes high okay so when you choose nodes for the decision tree you need to make sure that the order you choose the node is very very accurate because if i choose humidity first or wind first over whether my entropy will increase and my tree won't be accurate so in that case i need to make sure that the nodes i choose are based on the fact that which node provides me the maximum information gain because our aim is to maximize the information gain these amber colored nodes or yellow colored nodes are known as leaf nodes or class nodes so our class here or classes are yes and no so we are reaching to a decision of either it's we play tennis which is yes or we don't play tennis which is no so that's the final outcome or in simple words these are our two classes humility whether when these gray boxes are nothing but decision notes why because these are helping us to reach these classes okay so this is what decision tree algorithm is in nutshell so now let's see the code and let's see how we can do it in fast api so i will be using data set from kaggle so if you are new to calculus kaggle is a website where you will get data set for machine learning algorithms so you will get a data set in csv files it's open source i guess yeah it's mostly open source and you can use this data set to test your machine learning algorithm it can be anything it can be k means neural networks decision tree knn anything you will also see a code section where you will see other people they have used like they have given their code which is very amazing and it shows like how like it's very actually useful also to learn how they have achieved higher accuracy and how they have changed it differently or how they have done it differently so it's very interesting website so we will be using this cars data and you can interpret this data in several ways so how i'm going to interpret this but so i'll just tell you originally how it is there how it is interpreted so we have some set of inputs till here so it starts from mpg and goes all the way so that i guess there are six or seven inputs and it goes all the way to uh year so if i do come back then i can see everything yeah so till here and based on that it predicts in which country the car was made so it could be u.s it could be japan or it could be europe okay so we have these three classes this is japan europe and u.s how i'm gonna deal with this is i'm gonna drop this year column and gonna use other parameters to predict in which country the car should be manufactured so based on this mpg uh cylinders cubic inches hp weight time 260 i'm gonna see in which country the car should be manufactured so let's see what we have on our front end so i have a simple form here which is just having some like six input fields and if you haven't seen my next tutorial series actually by the way so it's just the extension in that or code so please do see the next tutorial series and you come to know like how i have set up this project so getting back to our main topic over here so i have this input form this is going to make a post request for fast api and then it will just give us a country name or the region name where the car should be manufactured so let's see what we have on the fast api side so if you are new to fast api it's nothing but a python web framework to create api endpoints in python so you can create rest apis you can create graph api like anything of your choice all you need to make sure is you have python 3.6 plus and you have pip installed so i have already installed python and pip and also fast api so i won't be installing it again but i'll just tell you the steps how to do it so once you install python and pip just follow this command pip install fast api and once it is installed make sure you install uv con so in simple words uicon is nothing but an environment to run your fast api code now the like it's like uh you can see it as a pro but some might see it as a cons so the main pro or cons of fast api is that it do not have a project structure so if you choose frameworks like laravel django or any other web frameworks they have a project structure like a directory structure but in fast api there is no directory structure so you need to create each file by yourself same like express but in express you just have package.json and you create app.js or server.js and then you write all the express code there and then expand your project directory in fast api you start with an entry file or a main file so i'm going to create a main dot py you can name anything to this file it doesn't matter so let's go to vs code and here i have csv file now here i'm going to create a new file so i'll say main dot py now what we need to do we need to import from fast api import first api and then we need to create instance of it so we say app equal to fast api and now we will use a decorator to make the route and api endpoint so the get and post endpoints so a decorator in python in simple words is nothing but a design feature which allows you to expand the functionality of your object so let's create app dot get and if i do like a forward slash so this would be like a root directory or sorry root route and then after a decorator you need to write a function so i'll say def if i say print hello so this function will be executed once we hit this url and then if i say return hello world so this function will be executed once we hit this url and make sure each decorator has only one function okay so let's run this code and let's see what we get so i'll go to my browser and just hit the url so 127.0.0.1 8000 of course we haven't started the server so let's start the server so if i go over here all i need to do is say ubcon then name of my file here it's main dot py so i need to make sure i just write main not dot py so i'll say main app and the option of reload now i don't need to make changes it will automatically reload or refresh it refresh itself once i make any changes i don't have to do it manually so if i just refresh this i will see hello world another interesting thing about fast api is if i go to slash docs it will provide me my api documentation you could say so here you see it uses swagger ui so it just shows me like for the forward like the root route what the expected parameters and and what's the expected responses so now let's create another request or sorry another route and we will say app dot post and this will be our predict route so i'll say predict and here let's have a function to do prediction okay so let's write here return something we'll return it later so for now we'll just say predict now also make sure once you download this car store csv or any csv file make sure you have remove all the spaces from in the file so sometimes you might have comma space brand so remove the space also you might have comma space here so remove that space also so just remove all the spaces so now once we have this let's create a pedantic model so what is a pedantic model so in web frameworks you have a database models which are related with your sql database or any other database so in fast api you don't just have a database model but you also have something known as pedantic models so pedantic models you can consider it nothing but a schema of your request or response so i'll tell you what exactly it is again so if i go here create a new file schemas dot py and here all i need to do is write import pedantic or rather from pedantic actually import base model okay now if i go to fast api documentation and if i type here pedantic model so these are nothing but use for request and response validation so you will see that it's a class and it just has a key value pair so what we will do whatever input we have on our next app over here this one so we'll just copy this and put it over here first we'll make a class you can give any name to this class but make sure that it like the name says what the class pedantic model is for so it will say predict request and i will say base model this means it is extending the base model so if you haven't seen my python object oriented programming tutorial please do check it because i have explained the concept of inheritance and everything over there so please find a link in the description below and make sure that you have a look on it so here we will just paste what we copied and we won't leave it just there we need to make certain changes so let's remove this from also let's remove this this this this and this and then these are our keys so this will be the input that we want and we will give it a type of float now this model will make sure that whatever input we get is in this format and each of its type is float and it is required if i want to make it optional then i have to use from typing import optional and then all i need to do is just put here optional and then this mpg will become optional so i can keep it blank when i send a request or i can put some value so it becomes optional for now we don't want anything of this all as optional so we'll keep it everything required now how to use this pedantic model in our request so here i'm going to import it first so say from schemas import pedantic sorry predict request and then here in the function we will pass predict which will be the type of predict request now if i save this if i go to my swagger ui documentation and here you will see that the request body is required so this things are required now if i go back again over here and let's remove it from here for now oops like this and if i refresh this you see it says no parameters or no body is required but if i go back again over here and just undo it and now refresh again so it will tell me that these things are required now let's try it out so if i execute i should see predict okay that's great now let's try to make a request from over here so if i go over here and just simply hit predict you will see it gives me course error so what is course to avoid certain cyber attacks and it makes sure that only the origins you have allowed or the urls you have allowed they can only access your backend or your server so if i go to fast api and then course over here it will tell you how to use or how to configure calls in your first api project so let's do this in our project so i will go here and i will say from uh fast api dot middleware dot course import course middleware and then i need to define the origins i can have these origins in different file or in like maybe an environment variable or something but i'll keep it here only for now so i'll say origins which will be nothing but the list of origins that are allowed so my url would be this 27.00 3000 and you can also allow another origin which would be one http [Music] localhost 3000 so we can use these both urls to make requests and next thing would be once we have this we need to configure our course so if i go back again over here you will see we need to add the middleware to our app so i'll just copy this part of the code because it's too much to write and let's paste it here so we need to just push this app at the top over here yeah because python executes line by line so once we have this now you will see that we are allowing these origins credentials we are we are setting it to true uh methods we are allowing all methods for now uh headers we are allowing all error headers for now so let's check it now and if i go over here just remove this if i hit the predict again i shouldn't see those errors so now it's giving me another error which is 4 2 2 this 4 to 2 is coming because we haven't put anything here so if i go again to my swagger ui and if i just remove one of these and then execute i should get 4 2 2 again because we are missing parameter we are missing certain parameters so again this is just it will just just making sure that we are validating everything so to handle this 4 2 2 what i will do is in our axios after then we will have dot catch and then we will have error and we will just alert error let's see what let's console the error and let's see what we are getting console.log error and now if i go again over here you will see it says request field with status code 422 now if i go here again we can say like we can apply a validation sorry not validation but if we see something like if error message let's do like this if error dot message dot index of 4 2 2 is greater than minus 1 we just alert you need to put all the messages all the and you need you need to fill in all the fields all the fields and like this go back again here and say you need to fill in all the fields you can also avoid this by having an http exception over here so if something goes wrong then you could have http exception so you can handle this error in several ways so once we have this now let's proceed forward we need to create or we will create actually a separate file which will be our decision tree algorithm or the code for this entry algorithm where we will have our model and then we will just use it in this function and get our result we can also write the code here but it will get bit messy so let's keep it in a separate file so let's create a separate file decision now here we will first say import pandas as pd so pandas will be useful to read our csv file so let's do that so we will say file equal to it's a variable pd dot read csv and we will give the name of our csv which is cars.csv now if i print file.head it will show me first five six rows of my file so if i run this you see i'm seeing like first five rows of my file with the headers now next thing i need to do is i want to drop this year column i want to convert okay let's first drop the year column then i'll explain it to you why what we have to do with this brand so i'll go here and i will say file equal to daily dot draw and here we need to specify the column that we want to drop so it would be here and then file sorry not file but this will be axis equal to columns so now if i go back again here and print file dot head so it will show me all the data except that year column because we have dropped it now once we drop that now let's deal with this brand column so if you see it's alphabetical values and our system or machine learning in machine learning only numeric values are understood by the system so we need to make sure all these alphabetical values they are converted to numeric values so let's do that so for that we will be using so there's another library which is sklearn it's very amazing library if you're dealing with machine learning algorithms in python so will say from sklearn dot pre-processing import label encoder if you haven't installed or if you see some errors for sklen or pandas you can install it using pip installs once we have this label encoder now let's encode this brand so each label which is us europe and i guess the third second one is japan so each one will get a numeric value so we'll say here file equal to uh or rather let's take it in a different variable so it's a target column equal to and we will say label encoder so this will be the instance of this label encoder now what we need to do is we need to add an extra column to our file this this file so it will say file and that column would be brand numeric which wouldn't be nothing but target column dot and we will use this label encoders fit and transform function so if it actually we don't need this extra step we could straight away use this over here but it's okay we can just use it like this so we will just create an object and then we will use it so we'll say label encoder dot fit and transform and here in this function it takes certain parameters so the first and the only parameter that we need to give is the column in the array that we want to or in the list that we want to transform to a numeric value so i'll say file brand so now this brand over here this thing will become or get a numeric value so now if i run this and print it so you will see i am having brand new murray where us has to europe as zero and japan will get one so once this is done the next thing to do is remove this brand column okay but before that we will create another column which is target association column you will come to know why we have created this but for uh it will be mostly will come to not the end of the quote but till then just please bear with me so here we will create a target association and here what we are going to do we will say file and we want to create a 2d array of brand numeric and brand so we'll say brand and the associated brand numeric value of it okay now after we have this we need to drop this brand column as well so what we will do or rather let's bring it over here we can do it in one line we can drop multiple columns at same time so we will do here and we will also draw brand because we already got the numeric value now if i run this yeah everything works great so now let's proceed forward so once we have our our data being set properly now we need to divide it into into inputs and targets so what i'm gonna do my inputs are nothing but these six things so everything except brand numeric is my inputs so i'll take another variable here inputs and i will say file dot draw and i want to draw actually this brand numeric which is nothing but the column and if i print my inputs you'll see i'll get only the inputs except the brand numeric and here it is so this inputs so in machine learning in the classification algorithm we have inputs and the targets the targets are nothing but the labels so once we have this now let's create the targets so the targets would be nothing but the this brand numeric so we'll say file brand numeric bear in mind that here we have dropped the column and then assigned it to the file that's why to the new variable for the new value of the file when it comes here it won't have these two values but in this case we are assigning it to a different variable so in there we will have this brand numeric value even after we drop it over here okay so once we have targets so if i print targets now print it again and we will see our targets so all the targets are here okay that's great now let's proceed forward so the next thing we need to do is we need to create the testing data and the training data so we'll say x train gonna x sorry x test and y train y test equal to we will use train test split function so that will come from our sklearn so s from sk learn dot uh i guess it's double dot here sorry model selection we want to import train test split function so i'll say train test split now this function will take what we want to split so we want to split our data into training data and the test data so make sure this order is as i have returned here so first we have the x string so the input training set the input testing set the output training set the output testing set so what is this training and testing okay so in simple words we are using this train data to train our model and we are using this test data to test our model okay so here i'm gonna say train test split and i'm gonna use inputs so this will be this will be splitted over here then in the ratio of split that i will give so some will go to train some will go to test then i have my targets so this will go to over here and i need to give the other parameter here which will be test size so this will be the ratio or the decimal value so how big i want my test data so this general split is 80 20 you can also do 70 30 so where 70 percent is trained data 30 percent is test data or 80 percent train data 20 test data i will do 80 20 split so 20 is the test theta after that i need to do this random state equal to 1 now once we have this we have our training data test data and training target test target now let's create our model so to create our model first thing we need to do we need to from sklearn we will import tree okay and here we will say model equal to tree dot decision tree classifier okay now once we have our model for the decision tree object which is our model now we need to train it so we'll say model dot fit now i can use this x strain and white ring but i'm going to use inputs and targets so as the inputs and targets and then once i have this i can make prediction so i'll say so till this stage our model is trained so first we will use inputs and targets then we will see the accuracy of the algorithm then we will use xtrain and whiter and then we will see the accuracy again so if i go again over here and now if i do y prediction it's just a variable and if i say model dot predict and here i need to pass so first i'll pass the test data so these are nothing but the data frames so this file is also one data frame and this predict will also take a data frame so it will be a 2d id so you say x test this x test is also already a 2d array so we don't need to write 2d list over here so once we have this let's print the accuracy of this so i'll say uh accuracy and let's concatenate and convert whatever value we get into string because it's going to be a numeric value i will use another function or another functionality from a scalar which is matrix to get the accuracy and i will here say matrix dot accuracy sorry accuracy score and what accuracy score i want to do i want to do score with y test compared to y prediction so our test test targets compared with our actual predictions so if i run this now it's throwing me an error saying input contains nan infinity or a value to large for data type float32 this is because in our data set there are certain extra values being entered which has like nan or infinity or something like that so to avoid that or to clean that we will use numpy so here i will say from a rather import numpy as nb and then all i need to do here before i assign inputs i just need to say or rather after i assign inputs actually i usually say inputs equal to let's clean it so we'll say np dot nand to num and what we have to convert nand to num so inputs so what this is going to do is this nand to them wherever there is none it will substitute that with 0 and wherever there is infinity it will substitute with it infinite number now if i run this i will get an accuracy of 1.0 1.0 is 100 so now let's do with here we have model inputs and targets let's do it training train data so we'll say x train and y train and let's see what we get so the accuracy here we got is 79 percent you see so this is how it differs so once we have this now once we got our accuracy as well let's change this back to inputs and targets and now let's use our inputs not this inputs but the inputs that we get from over here and pass it over here so what i'm going to do everything what we have written here i'm going to wrap it into a function so i'll say dev decision maker and let's paste it over here oops so let's take it like this and here we will pass this predict this one so we'll pass it over here so say predict and another thing we're gonna do is uh here in the model dot predict we will say our data frame so we will say uh predict so again we need to follow the same order over here that we have else we won't get the accurate result so first is mpg so we say predict dot mpg then it's predict uh dot i guess it's cylinders yes then cubic inches cylinders predict dot cubic inches if i follow the correct over here yeah cubic inches and then we have hp weight elbs time to 60 [Music] we have predict dot hp predict dot wait helps if i have written it correctly from here or we just wait it's not petals and we have predict dot time to 60. so i go back again over here we have predict.time260 okay so once we have all this so then we will get our predicted result so we will remove this from here and let's print this y predict so we can see what exactly we are getting so if i save this and now we need to use this function in our main dot py so i will say from uh decision tree algo import this one decision maker and here we will just first run that function so we say decision maker with predict okay now once we have this let's run this so if i go to our code in our fast swagger ui so let's run this with this values only so we'll say mpg maybe zero again and let's see what we get so if i execute we are getting error that's fine and here you'll see we are getting the array which is so it is yeah okay so we did something wrong here so let's go back again over here so yes this should be 2d added because again it's a data frame so it should be like this 2d id yeah now it should be okay so if i click execute go back over here you will see i'm getting 0 which refers to if i'm correct it refers to europe so now if you see this 0 is a numeric value now how can i associate it with the alphabetical value that's why we created this target association okay so this is nothing but having a set of 2d arrays so there is target association brand and it's an array again or a list then target association brand numeric then a list again and each key of that list corresponds to the uh appropriate numeric value of it so suppose the zeroth index value of this brand list would be equivalent to zero brand numeric index value so how we can deal with this it's pretty simple so we don't need this so we'll just comment this out and we will then go over here and let's do like this so once we have our y prediction we will say for item in target association and that is brand numeric okay now if sorry item equal to equal to or rather let's do like this k in range of brand numeric and we will save like this we can also take a key value pair actually but let's go with this like here key n but i guess it might throw an error so just be on simple side we will go like this and then whichever corresponds to this we'll just put over here and instead of brand numeric we'll say brand okay uh okay once we have this so this brand region is taking this value and then let's print this brand region say brand region okay now let's go back again here execute and if i go over here so it says a series object cannot be interpreted as integer okay that's fine so let's see what's the problem over here so if i go back again here and if i do [Music] now if i do it execute so it shows me europe now i want to pass this euro to our front end so what i'm gonna do here i'm gonna say return brand region and in main dot py i'm gonna say return region and whatever the value of this brand region is of course you might need to put extra validation here to make sure that this run correctly and also you can save this model using pickle so it's another library you can say like the you first save the model and then once the model is ready it will be a dot pkl file and then you can load the model again and you don't need to follow all these steps because it's already loaded okay so we won't save and load it for now we'll straight away run through this because the data set is also small and it's not taking that much time anyhow so once we have this now let's go back to our code of next over here uh let's make a request so it will again throw an error so let's refresh this and move this from here let's take some value let's say 21 21 21 21 21 21 and let's see what we get now i'm getting here so i just need to restart it and it worked fine so we just need this region so let's take another variable here so we'll say maybe something like const region equal to ref of blank and let's put this region dot value equal to rest.data.region okay and we'll just send this or return this over here so that we can use it over here and then here we just we'll just copy this line because we will just need to show what region it should be manufactured we will say vf equal to region and we will say the vehicle should be manufactured in region let's give it a class the text green maybe it's just delve in class so just a success message kind of thing and let's then try it out so let's refresh it first let's try it out now and says vehicle should be manufactured in europe now let's do some uh let's go back over here not over here actually let's go over here and let's just use this data so if i want to see so if you see for us the weight is more than uh like more than 3000 actually mostly and the capacity is around 400 okay let's take this example straight away and we will see if we are getting us so if i go back again over here let's just remove this actually let's do paste it over here so let's say 8 400 cubic inches 150 of hp weight of 3 7 6 1 and time to reach 60 and if i remove this now i should see here us yes we are seeing yes now let's do a similar so if i say maybe this should be around 450 and i want this to be around maybe 175 and the weight should be around 4000 and this should be around 5. so this should give me us according to me but let's see and yes it's still giving me us because these values are nearest to that u.s class so this is how decision tree algorithm works now let's see how it works for japan so if i go over here let's take this example of japan so if you see the displacement and the weight is low in this case so if i just paste the values here let's do four uh yeah it's 89 then hp 62 weight is maybe let's do 2000 so we can see the nearest value and 17. and if i see this i should now see japan let's see if i see yes we are having japan so this is how we could do prediction and this is how we could connect our machine learning model with our front end or web or so that's all from me for today hope you enjoyed this video and if you feel that this video is worth sharing then please feel free to share this with your network also make sure that you subscribe to my channel if you haven't done it yet and also make sure that you like this video if you like it so that's all from me till the next video goodbye

Info

Channel: WebDevWithArtisan

Views: 284

Rating: 5 out of 5

Keywords: machine learning, python, decision tree, vue 3, nuxt

Id: VUvXgRcQDSI

Channel Id: undefined

Length: 43min 48sec (2628 seconds)

Published: Sat Aug 14 2021