Hyperparameter Optimization for Xgboost

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello today we'll be discussing about hyper parameter optimization and the algorithm that I am going to use is something called as X used hyper parameter optimization is a very important task in any machine learning new skills the reason it is because it will help you to select the right parameters that are basically used for the machine learning algorithms so to begin with guys make sure you watch this particular video till the end because this particular video by watching this particular video we'll give you our knowledge it will give you a fusion like how you can apply the hyper parameter optimization from other machine learning algorithms also so to begin with I'm going to take a very simple data set which is called as shown modeling dot CSV this particular data set is basically available in the cattle website and in this particular data set what we are trying to do is that there are some customer bank customer information and then we have to basically predict based on these particular features that we have basically like ratings for geography gender age then your balanced number of product whether the person has a credit card or not whether that particular person will accept the bank in the future or not okay suppose if he is trying to accept the bank in the future be predicted by the model what the bank can do is that given some better offers so that he stays in that particular Bank right so in this particular algorithm first of all what we are going to do is that we will be importing ponders and then don't worry about this particular code guys I will be uploading this indicator and the URL will be shared in the description box of this particular PD then I'll be reading the CSV file with the help of pandas this is how my head looks like okay and this particular report has 10,000 rows I mean this particular CSV file has 10,000 records so it will be a wonderful problem because I'll also be doing some kind of a change in heading also now after getting this particular data set what I will do is that I will just try to find out a correlation and I have discussed about correlation a lot you can go into my statistics playlist and understand what is actually correlation correlation basically checks whether each and every dependent features are sorry whether each and every independent feature is me useful for the different features or not okay so here you will be getting some positive and negative values and this is the simple code that you can be write like EF dot Co ara this is basically for the correlation then you can basically plot the diagrams as it is given over here by using this SNS which is basically the heat map now after you construct the heat map this will basically have lot of values your correlation value will be ranging between somewhere you can see different kind of ranges and the maximum range that will be one but based on some other data set your correlation value range is basically between minus 1 to plus 1 so here it is now you can see over here the output column is excited now you can see your whole number is not required because it is a negative value customer ID on important credits for but age is important balance is important i steam it itself is important and some of the values is basically negative so what we are going to do is that I will just divide this particular feature into my dependent and independent features from this particular data set we know that our dependent feature is basically the excited columns whereas all the other features are my independent columns right now I will just go down I'll just use the columns by using high lock I will take all my different features in my x value or my dependent features in my Y value which is just 1 okay then you notice from this particular data set I have features like geography and gender which is basically a category features no like France and Spain and other other other geographic locations whereas in general you have male and female so obviously as a part of feature engineering I have to convert this particular practical feature into dummy variables yeah so for that I will be using pandas and here it is PD dot get under Skoda means first of all I'm just taking geography and I'm doing this drop underscore first is equal to true because if I have three three different states or three differential bicycle location one geographical location can be dropped because the first value of 0 0 because they both the column is having 0 0 that basically indicates my third the geographic location so make sure you keep this value always as true okay normal let's go first is equal to 2 and if we interview they ask you why it is exactly done you can say that to prevent from the dummy variable crap okay but the exact information is that this particular column will be able to represent throat after that I'll again do it for gender okay get on the scooter dummies now you can again ETS drop a little force is equal to true so I just have one column where the male basically zero basically says that that is T mean when the one value base we say that it is me then what I'm going to do is that since I have converted these two features into category choose sorry nog Rafi and gender into catechol features or dummy variables what I can do I can drop these columns because I don't require it so for that what I'm doing is that I'll just over here I'll just say X dot drop geography and gender X is equal to one okay and this is my hat again you can see that I don't have geography and gender one more thing to notice that I have taken my features from third column you can see way from third column the reason I've taken from third column basically I have number and customer ID and sermon these are more important features as I know and the correlation also gave me from different values which are not interval so I will not be using these three features anyhow customer IDs will be like a unique ID and it will be increasing or decreasing whereas rule number it is just like index number and surname not important as usual so what I'm going to do when you're dropping this three things sumon not dropping I'm taking the features from 3 to 20 okay then let us just go ahead here it is I've dropped my geography and gender so this is my dataset now I have after this what I am doing is that I am concatenating my geography value which is my dummy variables and the gender values here you can go ahead and see see the graph is my Tommy variable okay this and my gender right this I have to concatenate with my independent feature which is this okay so for that I'm basically using a PD dot-com cat X comma jog Rafi comma gender with axis is equal to one access is equal to one basically means it will be appended column wise and many bones see the head part you can see that it is added over here so you have Germany Spain and male perfect very simple till here very perfect very very very easy steps till here every most of them are familiar with this now comes the most important part I am going to use the extra boost algorithm if you don't know the theoretical explanation of XZ boost have already uploaded a video and go into my playlist and have a look apart from that what you can do is that you can just apply XG boost while applying X abuse the first thing is that let's let us just directly go to the exhibit classifier again so in order to apply a post make sure you import XG boost first of all okay if you are not able to install this if you are not able to installed it just right pip install XG boost ok just open your command prompt like an anaconda prompt and once this will get open this right pip install XG boost and just press Enter I have already installed so I'm not going to execute this particular command now in exhibit there is a method which is called as xgb classifier find this particular HDD classifier a lot of parameter series their max depth learning rate estimators know how many decision trees I want to use and they are all different different values like logistic boost JB tree this is grading boost tree how many jobs and underscore jobs so it is very difficult for the us to directly say that what values should I select right before this purpose what we do is that we use randomized search randomized search internally works in various parameters and tries to find out exhibits will work better for what kind of parameters to give it will be given if it is given to this particular G we classify so to begin with what I do is that I'll select some parameters ok now in this parameters I'll be selecting only those parameters that are present inside exhibit classifier now in this particular case you will see that there is max there there is learning rate there is n under square meter everything is there so based on the same parameter I'll go ahead Izumo here so learning rate I will not just give one value I'll give a list of values what my randomized search algorithm will do is that it just go and do permutation and combination for each and every value and will try to find out which particular value it is giving the highest accuracy okay similarly in the case of max death will be giving different different values don't lower your value more than this for the learning rate otherwise it may be a war footing condition now and the training time will also take more okay then similarly you can select mean child weight where you can provide different different values like 1 3 5 7 and gamma values because child weight is basically required in XE boost gamma gamma value what are with Springs gamma values column sample by 3 how many different bodies in the stuff right different different values like 0.8 0.9 but make it sure that these on vanish be less than 1 it's now after this parameters is selected what I will do is that I will try to import randomizer see here you can see that I have imported a randomizer CV and inside randomizer CV what I am going to do I go down over here now you see I am calling a randomizer CB the first parameter is basically my classifier now here my exhibit is just taking the default classifier okay I'll just provide this classifier over here mine there is a parameter which is called as param saunders foot distribution inside this only I will be providing mine this parents that have basically wishlist okay so what exhibit classifier will do not myself will make use of all this value applied on the exhibit classifier and then see that which shape which which exhibit classifier with respect to different different learning rates and other parameters is giving you good accuracy and that values will only be select okay now here I'll go where so the first parameter is paramount is for distribution is put to parents then how many iterations you want to do what is the scoring attribute we basically use roc underscore you see and let us for jobs is equal to -1 this makes sure that or this value minus 1 will make sure that it uses all the cores that is present inside your machine you are desktop your a laptop and here I am taking a cross validation of five five different cross validation and Berbers is basically to given to give them message when I do the fit like how many times it is taking what is the time what is the status of the jobs and exiting all the information so this is my randomized search okay and all I have to do after that is that I have to just right fit but just before writing fit what I am doing is that I've created a timer method now this particular timer will note how much time it is taking to execute this whole randomizer see be taking the xgd classifier okay but surprisingly you'll be seeing that as soon as I started and recording this you know it hardly took me 5.7 seconds okay or it was just like six point one eight seconds within six point one eight seconds it was able to you know cross validate five different experiments and finally my randomized search work okay it worked properly and all the execution was done okay after the execution was done they are basically two parameters that we should focus on one is you should just say randomized random underscore search dot best underscore estimate now as soon as you select best endless quest in meter underscore this will give you all the parameters that are selected by the randomized search for that exhibit classifier and it is telling us that you basically use this best estimator all the values now you can see over here if I go on the top right my here you can see that what is the learning rate point zero five point one zero point one five point two zero from this which learning rate has been selected you can go down and you can find out that learning rate of 0.1 was selected similarly you had different different gamma values right point one point two point three point four now from this which gamma value has been selected you can see away from our selected as points similarly different different parameters like max their minimum child weight all has been selected away you can see that object is also binary does stick and all the different values were selected but if you want to know exactly from all these parameters that you are actually given what parameters were selected and what was the value of that you can just right there is another one is best and especially meter underscore one more is something called as best underscore param understood now here when you write it you will be getting all your parameters okay now what you can do is that you can just copy this parameters you know you can just copy this parameter and paste it inside exhibit classifier either that one way you can do or you can just copy this best undiscussed emitter and copy it and we over here when here creating new classifier we are not after on the same thing I have done the same thing what I have done is that I've just copied this whole thing and I have pasted over here and I have actually made my classifier as soon as I did this I made my classifier executed perfectly then what I have done is that I have implemented Crossrail school and I have basically used the same classifier and given my X and my value with my cross validation as ten experiments and by that score has found out that I have got 10 different accuracy which is like 87% it is explosive it 7% 86 85 and 87 so when I do score dot mean basically be getting 86 percent of my accuracy of my mod and that is how a hyper parameter optimization is done for X abuse always remember guys the reason I have used randomized search because it is much more faster than grits or CB okay and I've used it for examples to show you an example because similarly you can apply for logistic regression for en n for random forage for decision tree for anything okay you just have to fix your parents what are the different that are basically used in this particular algorithm that you have to just suggest it and you can basically use that's it and that is how you actually do it guys I hope you like this particular video make sure you subscribe the channel if you have not subscribed share with all your friends where require this kind of thing I'll see you all in the next video have a great day head god bless you

Info

Channel: Krish Naik

Views: 61,710

Rating: 4.9135137 out of 5

Keywords: hyperparameter optimization python, bayesian hyperparameter optimization, hyperparameter optimization keras, automatic hyperparameter optimization, hyperparameter tuning medium, neural network hyperparameter optimization python, hyperparameter tuning methods, automl hyperparameter optimization

Id: 9HomdnM12o4

Channel Id: undefined

Length: 14min 54sec (894 seconds)

Published: Wed Jun 26 2019