Ensembling, Blending & Stacking

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone and welcome to my channel in this video today i'm going to talk about ensembling blending and stacking and this has been one of the most requested video a lot of people have to ask me to make a tutorial around it and finally i found some time and that's why we are doing it today so when we hear the words ensembling and stacking what comes to our mind is competitions machine learning competitions and many people say they can on some linking and stacking blending of models this can be used only for machine learning competitions but that's not true anymore because it used maybe it used to be true like four years ago five years ago but nowadays we have a lot of computing power for uh cheap prices and it's definitely possible to even deploy three four neural networks which work together and um in production and provide a response time of less than 500 milliseconds so that's definitely possible and i'm saying that because i've done that and sometimes uh like a huge neural network like if you have a really huge neural network it can also be replaced by some models which are small in size and perform similar to the large model but everything is twice as fast so if this is the case why will you choose a huge model until unless you want to go fancy so like like you have a bird model and if you have like some simple lstm models and uh they their ensemble or stack performs much better than the bird model i don't see any point choosing the birth model but anyways so ensembling is nothing but a combination of different kinds of models and the models can be combined in many many different ways so i think it's better if we dive into some data if we get some data and start building something start writing some code and then try to understand what ensembling and blending is rather than me just talking so uh we need a data set and this is data set that i have chosen for today's video so you can use any kind of data set that you want whatever you want if you if you do few things properly you can always use stacking and blending so this data set is a movie review data set so you have 25 000 reviews in the training set and 25 in test set each review has sentiment associated with it so it's binary classification problem and when i see the evaluation metric which was used in this competition was auc so uh we will be using uh auc metric to see how our model is performing and we have um we don't have we don't have much data here so just one file so let's start with writing some code and see what we can do so i've already downloaded the data set here so this is the labeled train set obviously it's uh huge so i'm not going to open it here okay it opens fine so it's a tsv file it's a tab separated file so you have some kind of ide or there is a sentiment positive or negative and review so what is the first thing that you have to do when you're ensembling or stacking or blending maybe maybe you don't need to do anything but uh it is preferred that you always create the folds and that is useful when you're on something many different kinds of models from many different sources let's say let's say if you're working on on a kaggle competition and you have a team of three four people working on the competition if you want to combine all your models effectively you must use the same folds so that's the first thing that you have to think about if we we are using this sentiment problem there are two classes it's a unbalanced problem so we have to use stratified k folds so let's create the folds first we call the file create folds.pi and here we can write our code that generates a new csv or a tsv file and which has a column called fold so create creating folds is not difficult so let's import some things import pandas as pd and we will also import uh from scikit learn import model selection now we will read the dataset so if main underscore underscore then what you do is you read the data frame pd.read csv and what is the name of our csv label train data dot tsv okay so input slash labeled training data and since it's a tab separated file i'm just going to add the separation slash d and uh that should read the data frame and now we create a new column we call this column k for and fill it with some value minus one so we have created a new data frame and you can shuffle it in any way you want so i'm just going to shuffle the whole data frame df.sample uh frag culture1 and then i reset all the index and drop them in the old index so you can travel it anywhere you want to and then you have some value of targets so that's your y column and y becomes df dot sentiment dot values so it has three columns id sentiment and review and now you initialize k k-fold or stratified k4 let's say skf model selection stratified k-fold and you specify the number of folds you want so 10 splits equal to 5 and this is the most important thing that you have to do when you have a different kind of data set it might require um ford creation which is not stratified maybe so for f comma so t underscore comma v underscore training validation in enumerate skf dot splits um skf.split x is your data frame and y is your why and this gives you a full number and uh the indices for training and then this is for validation so what you do is you do df.lock and you fill the indices for validation always fill the industrial indices for validation and fill the k4 column with the current fold number now once you have that you can save the data frame so to csv we can save it as csv it's fine dot dot slash input slash train folds [Applause] dot csv and keep index to false now we don't need that index column so this should work and let's try to run this code and for this video today i'm using my dell laptop so let's see we go to the source folder and let me activate the environment and i hope you can read what i'm writing i think that the text is big enough and python create full stop by so it has generated the fold and we can try to take a look at the folds if we want i hope it's correct in the previous videos was not input train folds dot csv so now we have in the end a k4 column so zero zero everything seems to be zero and that doesn't sound good let me see maybe that's okay maybe in the beginning it's zero um so let's try to load the data and see what's happening and import honda spd input slash train false dot csv and df.k4 what value counts [Applause] okay so we have the folds so uh we didn't do anything incorrectly okay great now we create our training file so we can create multiple different kinds of models right so let's create a logitech recreation model call it lr.pi so few things are i'm going to be copy pasting from one file to another you shouldn't do that it's not a good practice but it's only for this video so i import pandas as pd from sklearn import linear model what else do we need do we need anything else from cycle learn from sk learn dot pre-processing text so we need tf idf too so let's import it later and then we have a run training function that takes a fold so my df is the original data frame not original but the one with folds so we can just do pd dot read csp input slash train underscore folds dot csv so this reads our whole data set and now you need um x train and y train so let's say df underscore train is your df df dot k fold not equal to full dot reset underscore index drop equal to true so here you get the data frame with only the training folds and valid so this one will change k fold equal to fold so you got your training set you got your validation set now what you want to do is is probably you want to convert the text data to numbers so for that we use tfidf vectorizer so let me see um let me import from sklearn.feature extraction text import dfidf vectorizer and we are not doing anything interesting here so we are not even changing the parameters of tf idf just using it as it is and tfv.fit so we fit it on train and there are other methods too but let's fit it in the proper manner train dot sentiment dot values and your uh x will be or let's say x train will be t f v dot transform on the training set and similarly your okay similarly your x valid will be tfv.transform df underscore valid and now your y train is uh what is your white train df underscore train dot sentiment dot values and similarly your y valid so pretty simple we are almost done with building our logistic regression model so fast okay so uh we got everything and in the end uh what do we want what do we want uh we want to calculate the auc let's say so auc we need to import something else from sklearn import metrics so auc will be metrics dot roc auc score and here you have white true first so why valid and why red or we don't we didn't even we don't even have white red so first we define our classification which is classification model which is linear logistic regression so we got logistic regression two classes that's fine zlf dot fit x strain comma y train so we have logistic regression we have fit the clf.fit and your white bread or let's just call it bread is clf dot predict proba all rows comma one now off it's valid okay so you predict you put on the training data and training labels and you predict on the validation data but you have to uh when you calculate auc you need the probabilities of one class so uh that's what that's what i'm doing here and then you calculate the auc then you can print uh full equal to whatever your fold is comma a auc equal to whatever your auc is um so this should work now in the end what you want to do is you have made the predictions right and these predictions are for validation set so why don't we return the uh predictions inside the data frame we can do that so df underscore valid um we will say uh df underscore valid dot clock pros comma um lr red logitech regression predictions uh will be spread and then this function returns df underscore valid it doesn't have to return everything in df underscore valid maybe just the ids and stuff but let it return everything um and then we have a let's just chronosphere in it i'm going to write the main function [Music] and then for j in range five run training so right now we are just testing the training j so okay so this should be name uh yeah stupid mistake and uh then you run the training and here we are we are only running the training so we are just testing it right now let's see if this even works or maybe we made some mistake um so what do we do python lr dot pi uh numpy.n64 object has no attribute lower okay when you're dealing with text data you have to take care of a few more things so what i'm going to do is df dot review dot apply str [Music] hmm that should work a series object has gone okay okay we did something wrong uh df.review.apply str and of course this is a stupid mistake so should we review so df.review not apply htr so we are converting everything to strings so it is still saying that there is some problem so there is some data where i have an int so let me check it's actually a very stupid mistake that i have made um we should not fit the tfidf on sentiment but on review column pretty stupid okay so now i hope it runs so let's see okay so yeah it is running and now you see it's quite fast four four zero i got a you see of nine five two nine five five okay it's already quite high i don't know if we can improve that but let's try let's try and see what we can do so um [Music] okay so yeah four folds okay so now what we are doing here first of all we don't need to return the whole validation data frame so what we are going to return is id uh sentiment k-fold and lr underscore so once we have these we are going to create a list of data frames and here we have temp df and we append in every iteration dfs dot opened temp let's go df and your final data frame will be fin valid df is d f pd.concat dfs okay and then you can print the shape for invalid df.shape and it should have the same shape as the training dataset so let's run it so you can see in the end i got this shape 25000 comma 4 but one more thing that i forgot to do is to save it so let me create a new folder here call it model threads so predictions from the model and we save this so fin valid df dot to csv model underscore threads and model name so we can do lr.csv indexes false okay so this will become our first model and now uh let's just run it and save it just to make sure that everything is working fine so it's done and we can check the head model okay so what you see now is we have id which is the original which are the original ids then we have sentiment and uh we have k fold value our own k-fold and then we also have the predictions from logistic regression model and this is how it should look like for each and every model in the ensemble so let's change something now maybe we could have also printed something else um here or maybe let's do that later and um let's change something here we used tfidf now let's create another model lr cnt maybe um dot pi okay so we use dfidf vectorizer and instead of tf idf now we will be using count vectorizer everything else is going to remain the same lr underscore c and g bread and this will be lrc int bread and this will be lr cnt nothing else changes um okay so we go here python lr underscore cng.pi and we run this you see i also got a bunch of warnings so i have four four you see nine four five nine four five nine four five nine four five so it's quite stable nine four three so you see like this this is a little bit lower than what we had before but now we got uh two files so um i go to model threads and here i have one for count and one for one the normal logic regression with tfidf and this these values are also a little bit different so you see i remember this value was around 0.88 this value was around zero point something something else so this is the second model we have created um i think we will create uh one more model maybe we can right so let's try so going back and creating a new model so let's take this as a pace so just imagine like if you're working in a team in a child competition everyone is building their own models but in the end they're saving the csv in the same format that's all you need to take care of and we create a new uh model and what should we create uh maybe svm or random first maybe let's create a random forest model on the svd data so yeah a lot of things that you might not know but you don't have to know right now just have to see what the process is like so here we have rfsvd so rfsvd thread and rfsvd thread we are not making any changes here but we are importing one more thing from sk learn import decomposition and we will be decomposing our tfidf features so we will here initialize spd decomposition dot truncated svd number of components is up to you 120 works quite okay and then you fit svd.fit uh xtrain uh yeah that's it right xtrain and then you write xtrain underscore svd so let's put it in a new variable and that becomes svd dot transform xtrain and similarly you have the validation one valid x valid uh x underscore valid dot c svd okay and instead of the linear model we will be using ensembl random forest classifier [Music] and estimators equal to 100 and jobs go through -1 okay i hope it doesn't take a lot of time to train so we don't need linear model instead of that we need ensemble and i think everything else is okay rf rf rf svd so everything is fine but this is not uh x train sbd [Music] and x valid svd so now we are training a random forest classifier on svd data so let's train so you can see uh my results of friend of ours are not good but it's not it's not like it they're not good they're quite good and we didn't do any kind of tuning and these are just experimentations so we got one more new file and now i think uh we should start looking into uh blending these and see how that works so let's call the first one blending it's a simple uh term where you have like multiple models and you blend them together so we even need to import pandas as pd and then how have we saved the model lr lr cnt rfsvd so lr narrows corporate lr underscore cnt underscore project and rf underscore svd underscore if i'm not wrong rf red underscore red okay so we import pandas let me also import glob that will grab to grab all the files from um the model print directory so if name culture could domain so what are we doing here so we are saying files is glob dot glob and inside dot dot slash model and score preds star dot csv so grab all the csv files from there and then for fn files just read them [Music] um read them maybe in a data frame let's say df equal to none for f and files temp tf uh so i will say if df is none which means it's the first iteration then temp underscore df is uh pd dot or not even temporary f then i will say it will just replace dfdf is pd.read and let's go csv okay and if it's not run then that means we already have something in df so what we're going to do is df equal to df dot merge temp underscore df is pd.greed csv and f here df.merge temdf on id column id column is going to be the same in all of them uh how equal to left okay so this should give you df so let's get df.head 10. and now we run this python blending let's see what we have okay okay so what we have we have a prediction from lr cnt lr thread rfsvd so uh yeah we have prediction from multiple models and you can see how the prediction varies so like last aggregation with count vectorizer gave me 1.0 for the sample 96 from uh logistic regression with tfidf and rfsvd was 77 so the predictions vary quite a lot and now uh what do we have um uh let's say our yeah what what columns are we interested in so prior to underscore columns so this can also be done automatically but i'm just saving some time here i'll add underscore thread then you have rf underscore svd underscore red or before that we had lr underscore cmt underscore so these three are the prediction columns that we are interested in and let's import from sklearn import metrics and now what we what i'm going to calculate is the overall auc so for uh call and prod calls print um threat call and overall auc um let's say this let's just call this variable auc and here f and auc will be uh metrics dot roc auc score df dot sentiment dot values so we have sentiment here and df uh column not values you don't even need values but yeah i'm just used to doing that and let's print some stuff let's see okay so we got nine five five nine four five eight seven seven that's our overall it's not mean overall a you see so what was okay i should have written this inside this bracket call okay so now what happens if i just take average of these three so let's print average so metrics dot roc uc score and here we have uh let me just call it targets somewhere [Music] targets so we can we can call it targets here and we don't need to print anything targets equal to this and average bread will be df dot um yeah just just the mean of these three right so that's like the simplest one so let's say df these three and then you just take column wise mean so i can say [Music] let's import numpy first numpy as np and here go to array was it to numpy [Music] or was it two array let's just use two values that's the simplest one um and here we can do np dot mean axis equal to one okay and now here you have targets comma average with gold red let's see what it gives now okay so it gave me an average juco of 0.953 which is a little bit lower than my alert red but i would say this is much more uh this will generally generalize much more so this is a combination of all these three but maybe we can do some something different so uh let me keep these three in variables or i can just do something else okay so print something else now what what i'm going to do is i will say alarp red is df okay let's write dot lr scorpret.values and similarly we have the other two this one here and this one here okay and now i say my average thread is two times lr underscore thread plus lr underscore counter plus rf red divided by four so just take a weighted average in which i know that i've seen that large degradation is giving quite good results so giving it much higher weight and then we print the same thing and then see if it improves anything i think i've chosen something that my first model was very strong i shouldn't have chosen that but let's try okay nine five five three eight which is still a little bit lower than what we had so the first method of blending is called averaging so you do a simple average this method is called weighted average in which you give weight so let's say i have given three times the weight to logistic regression with tfidf and now i have improved the overall auc so this auc is much higher than what we had and that's a combination of these three models but have i chosen the weights correctly maybe not and to choose the weights correctly maybe i need to optimize it somehow like let's imagine three monkeys are turning some kind of knobs and they have to find the exact combination so yeah it's like a lock thing so you have to find the exact combination and how you do that it's totally up to you i will show you one way of doing that so one thing that before going there one thing that i would also like to say here is uh what if you have a simple classification problem um in which you want to in which you don't want probabilities but you want predictions so here what you can do is maximum voting so if my logitech regression model has predicted zero this one has predicted one and this one has breakthrough one for a given sample for a given review then i will choose one as my prediction but here we are looking into probabilities but that's also something that you must keep in mind another thing that we can do here before i go into um the before i go into the uh finding the optimal weight is rank averaging so i will call it weighted average so we got average we got weighted average and now we have rank averaging so what is rank averaging when you are calculating a uc you are not interested in uh finding in probabilities or in it can be any real number you're interested in in ranks instead so what sometimes helps is rank averaging so instead of using the probabilities just convert them to rank and then average them so let's try a simple rank averaging first and then we can take all of this and call it weighted tracking rank averaging in which we provide the same weights three times this and five okay let's see so my rank averaging was 0.948 not very good and 0.9533 which is also not good so till now my best model has been weighted average and let me try to do something or maybe i can try to remove the large degradation with tfidf right because it's way higher score or we can modify it some somewhere so let's see if if i change this man use max features and i use only 5000 features then my score should reduce it's way too high for this video let's see what's happening so python logic degradation.pi so i'm running that again and it's giving me some iteration stuff uh which is i probably need to increase the maximum number of iterations in large degradation but other than that it's looks lower but it's not still still not as low as i wanted it to be so probably i will i will do something else and make it even lower or maybe i can use a different model let's see so i have changed now to 1000 features so it's going to be using only 1000 features and now it's in 0.93 range which i think is quite okay the only problem is we need to run our analysis again which is okay we have saved the file blending lrad 935 i'll count 945 svd 877 average 949. now i see the some kind of difference weighted average 946 rank average 944 and weighted rank averages 943 so rank averaging is not very good in this case but rank everything works quite well in many cases so uh definitely something that you must keep in your mind uh one more problem was now my uh let's go to blending and now my logitech regulation model is not the best one but i'm still giving it much higher weight and i shouldn't do that so i will give much higher rate to lr count prediction and let's run this again awesome now my weighted rank averaging has reached 0.9505 which is higher than weighted average so this is a magic of blending my friends um you can get much higher auc with much simpler models so i was not even able to reach this when i had logitech regression at nine five something so but uh nine four something or was it nine five anyways so but here you see the improvement from the top model is almost 0.005 and that is quite good so we learned averaging simple averaging weighted average rank averaging weighted rank coveraging and now what we want to do is we want to find out the optimal weights for the weighted averaging and see if that improves something so let's create another python file and call it optimal weights and what i'm going to do here is let me just copy paste everything and remove all this stuff uh okay so till here so till we read the file we have the targets and everything and now uh we have to find the best weights so finding best ways is like um it's not a very difficult task it can be done in many different ways so one of the ways of doing this is just writing an optimization function and let it find the weights for you so that's what we are going to do here so we are going to uh use fmin optimization from scipy so from scipy dot optimize import admin so we start by writing a new class and let's call it optimize auc [Music] and we define our init function itself inside this we define a coefficient so dot co f underscore let's say anything anything that you want to define and then we define a fit function like you have in cycle learn fit and predict so self x comma y which is taking the training data and the targets [Applause] and now here you define you say okay um [Music] what is a partial loss so partial underscore loss so partial loss will be like a function which takes only one coif argument so let's define another function call it underscore a you see so this is your loss function that you can define and that takes the coefficient that takes x and y so now when you're like in our case in weighted averaging the coefficient is like a list of three numbers so you say x underscore coeff equal to x multiplied by co f and predictions is nothing but the sum of x underscore coeff x is equal to one so if coefficient is one one one then the prediction is just the sum of questions coefficient is 0.333 0.333 0.333 then it's an average and you calculate some kind of auc score and auc score is metrics dot rca uc score uh y comma predictions and you return the auc score so one thing that you have to remember you have to return the negative auc because you are minimizing it so auc score okay and then you convert the this function to a partial function so let me import from func tools import partial [Music] and here i say partial loss partial so here you provide the arguments that you have so first one is the optimization of sorry the loss function itself a you see then you have x and x you already have from uh this fit function and y equal to y so this will create a function that takes only one argument coeff and other two are provided and return auc that's it it's not very complicated and then then you uh and then you also like initialize the um coefficients so n co f let's say the initial coefficient is np dot random dot did we import nump yes so let's say it's a we can we can initialize it in many different ways so it can be all zeros if you want all ones if you want but here today we are going to be using uh directly distribution so random dot directly and i think this is the function but it's not suggesting me anything anyways and here np dot once and how many of them uh what what should the shape be the shape b and the shape should be like uh the number of models that you have so x dot shape one so here x and y here in x and y x is the model prediction prediction from three different models that we had and self dot coeff underscore will be your f min then partial loss and you provide the initial coefficient and let's see there is a display parameter set it to true because we want to see what's going on so you cut this now there is also one more one more thing with this that you should remember is you have to again do everything in folds so you need to have a predict function predict self comma x and this is going to be the same as the auc function so predictions is np dot sum okay so you need this qf and here it will be cell dot co f underscore and return predictions okay so now you create another function called run training to find the best coefficients and fold number [Music] okay so uh this should take uh that this should also take uh um df let's say the prediction data frame and full number and now um we have train df which is a thread underscore df the same way that we have been doing before not equal to fold dot reset underscore index drop equal to true so we have to do everything the same way because if you optimize on the full data set your model is going to overfit and valid df this is equal to fold and now um what we do is we extract x and y so x strain is nothing but all this um columns so train underscore df this dot values and we have the same valid underscore df this is my x valid okay um finding the best weights now we create predictions so let's say opt is my optimizer you see let's initialize the class here and op dot fit xtrain comma uh train df dot sentiment dot values and practices op dot product uh update what uh x valid and now you here you can print hold equal to full comma auc so here you see here is auc we have not calculated it so let's calculate so auc will be metrics dot roc uc score and here you have um valid df dot sentiment got values and you have frets okay so here f string um i think we got everything if we didn't then yeah we will see later and then run training okay 4j in range five so this is also like finding your best coefficients from cross validation which is very important so python optimal weights and let's just hope it works okay it's not working run training j ah okay uh thread underscore oh sorry df on homage so it also takes the data frame okay so there has been some errors and let's take a look at what the error is now this is very weird and i uh the reason is i didn't write k fold here so it's doing a lot of weird things that we don't want to know about okay i accidentally closed it so yeah dot k fold and now we can run it again so let me clear this see if it works okay now it seems to be working okay that didn't take much time and now you can see like it started with the um aoc of 0.947 and when it was training the euc was 0.951 and what do i mean by training when it's optimizing the weights so that's why you should not do it on the training data because you will see like such a such high auc values and which are of no use to you but this is looking good nine four seven nine five two nine five zero nine five zero so which is much much better than what we had before and uh we can also do an overall auc here so return so let's say credit underscore df dot lock uh what do we have um all rows comma opt friction from the optimization that we have done and this is equal to threads and red underscore df okay so we return this and i can i think you can do that on your own spreads underscore df equal to nothing and threads underscore df.concat not concat uh df.opened and then here you have reds underscore tf equal to pd.concat threads underscore df and again you print metrics dot roc auc score and here you have targets which we had defined and provide sense for df dot um what was the name again operate dot values uh let's not make it targets uh brett's df dot sentiment dot values [Music] okay let's see if it works okay it's not working um so why is it not working cannot set a multi index so where is the error keep red underscore df okay so yeah yeah okay so it's trying to copy it so i will just say pdf with red underscore df dot copy uh deep uh difficult to true was it so create a deep copy of fred since kodf so that it can be changed and it doesn't change the original data frame copy takes no keyword arguments okay okay let's see it should be fred underscore df yeah we have been doing it wrong all along so it's not pressed here but it's valid tf [Music] [Applause] okay now it should work okay so it seems to be working but it's giving way less overall auc so there is something wrong so the values are going to be like this and there is nothing to be worried about because you are optimizing on different folds so what you should actually do is not look at these final predictions but take the optimal coefficients from there so let's try to do that so if if i just say here uh here if i add instead instead of returning the valid df i can just return op dot co f underscore so in each fold i'll be saving the best coefficients and um here what what do i have dot append run training this is fine and i will say coeffs so this will give me the coefficients which are going to be the best coefficients and then convert this to a numpy array so np dot okay okay curves and let's see print coils um yeah so i was not doing in the correct manner uh you should not make prediction but you all you need is the coefficients so let's generate coefficients okay so everything looks good so these are the coefficients that you need okay so five folds and for each fold you have three coefficients now the order is fixed so you have to use in the same order let's try this um qfs is np dot mean we will take an average of all the coefficients um curves comma x is equal to zero and that should give me three let's verify okay so i got three coefficients average of each column and now what i can do is i say okay my final prediction so waited average is cos zero into df dot lr prep dot values plus squares one and goes through okay so you don't have to average them you don't have to divide them by anything it's already been taken care of so lrc and cheap red alarp red cnt brad and rf svd thread okay so this is my weighted average prediction and metrics dot roc auc score targets comma weighted average optimal let's see what it runs let me just print optimal auc after finding [Applause] coefficients [Music] i didn't print it um okay now it should print so far so good nine five zero five nine five zero five eight and what did we have when we were blending it nine five zero five six so we do see some improvement but humans are always good at tuning so no that's not true but uh you do see some improvement i think uh for a different kind of problem you will probably see a huge improvement from this so this is how you find the optimal weights the only function that only class that you need is optimize aoc and you can use it for anything you want you can have log laws you can have you can even extend it to multi-label problems in which you have multiple columns for targets and multiple columns coming from different models so that's done next thing that we can also do is instead of going for this optimize auc function we can probably use large degradation let's see lr blend so i am going to remove the optimize ac function and instead of that i have linear model logistic regression so from sklearn dot linear model import logistic regression okay um let's run okay i'm not running the correct one [Music] python lr underscore planned awesome so yeah it does run but it doesn't give me good results i think all of the predictions are in the in good range so maybe it does needs scaling let's see from sk learn not pre-processing import standard scaler so let's just add seo equal to standard scalar [Music] and x train equal to scl dot fit transform [Music] um xtrain [Music] and similarly for x valid [Music] and so we have we scale the data first and then we do launch degradation and let's see now and anyways i think the data is already scaled so it's between zero and one all the probabilities so it doesn't make sense to scale them again okay i didn't do yeah but the results are going to be the same anyways okay so the results are the same so logic rotation is not performing very well uh maybe you can do linear regression [Applause] and you see like even for logic regression you have the coefficients so i'm getting the coefficients i'm getting some error in the end but don't worry about it uh it's because i have to format the data properly for this so i or maybe i can just quickly do that so what i have to do is just add another zero here and that should work okay so i'm getting a optimal layer cs9504 which is also not very bad that's okay i would say but the individual ases are quite low um [Music] or did i do anything wrong oh yes i did something wrong actually that should be product underscore robot i need to change it to probabilities and that's why they uc was low so now you also see uh if you want like for auc you need to use probabilities not predictions okay so now we're in good range it's all anyways not as good as what we had before but it's fine change it to linear regression um and it doesn't require a big change i hope everything will run oh yeah it doesn't have predict probably obviously because it's regression so but i think it does have qf yeah it does have coeff okay now it uses a different format again so i will have to it is again zero and then i can go and run this again so [Applause] yeah nine five zero five so a little bit better than logistic regression so this is also giving you coefficients that you can use and this method is also known as stacking so you're actually stacking using a model so um what you can also do here is get rid of everything um and create a new file let's say xgboost xg [Applause] um xgb model dot file and we will import xgboost [Music] and we will try to and we will change the model from linear regression to xg boost xgb classifier so we also need to change from predict to predict proba again or rows comma 1 and here i have written valid df.lock xgb print so just creating a new column like we used to do and return valid df and we can also borrow some code from the previous models so let's take this whole thing from here um yeah i think that is fine and in the end of this file uh we can just write this run training should take pre-df which is our df in our case and then in the end you can print [Music] metrics dot roc auc score and here you have white true which is uh let's say um fin valid df dot dark uh dot sentiment dot values comma fin valid df dot x gb underscore red dot values [Music] so this is your final auc score by running the xgboos model on top of lrlr count and spd rfs reading so let's see python xgb model so you can see it is working fine but it's not not like as good as uh i would want it to be and it might also be because i didn't uh care enough to tune it so let's not tune it in this video maybe in some other so this is what i wanted to show you this is also known as stacking so now you have learned uh averaging weighted averaging finding optimal weights or weighted averaging you have learned like all of this is also known as blending and stacking um so one more thing here is when you create these predictions you can use them as a features in the original uh data original training data that you have you can use them as features so you can combine them with the original features and you can use them as features and that also helps uh quite a lot the only thing that you need to take care of is the cross-validation folds if your cross-validation folds are good then you're not going to have any kind of problem but if they're not and there is some kind of leakage anywhere uh your code is going to be it it's going to be messed up model so before i end i would like to show you one one of the diagrams to understand stacking uh much easily uh so let's go and take a look at that so this is this is a page uh from the book i've written and here what i show is how you achieve stacking and that's what i have shown you today but this is more of a summary so you have the data you divide everything into folds and then you train a bunch of models and the models predict on the validation folds and using those validation folds you rebuild your training data and those predictions are known as fold predictions so here we had uh different kinds of uh models lrlrcnt and rfsvd those were our fold predictions when you have the four predictions you stack them column wise um and you have to keep the ids same so there must be an ide column and you train you can train your models on these new pre on these predictions as features so still you have to use the same folds that you created in the very beginning so once you have like let's say here lr simple lr lrcnt rf we have three models and uh these are m1m2 m3 which you see here and then you divide them by the same folds that you had created before based on the ids and then train another model and train another model and keep training model so now you're going to level two so you have you can call it level zero then level one but let's if this is my m1 m2 m3 is my level one then training another model on top of it is level two stacking or l0 l1 you you can save whatever you want doesn't matter and then in the end you have the final out of four predictions and the text test predictions so every time you do this if you have if you have test file you also need to predict on the test data that's also very important for each and every step and you need to combine your test data in the same way as you're doing with your training stuff so before i end these are the final points divide the training data into folds train a bunch of models m1 to mn create full training predictions using hour of full training and test predictions using all these models till here is level one now here's a full predictions from these models as features to another model this is a l2 model or it can be a simple weight use the same volts as before to train this l2 model or models now create oof predictions out of full predictions on the training set and the test set now you have l2 predictions for training data and also the final test set predictions so this is how ensembling stacking blending whatever you want to call it works and that's it actually so i'm not taking more of your time so that that is it if you like the video click on the like and like button and do subscribe and let me know how it was give me your comments in the comment section and do share the video with your friends and see you next time goodbye

Info

Channel: Abhishek Thakur

Views: 12,391

Rating: 4.9738135 out of 5

Keywords: machine learning, deep learning, artificial intelligence, kaggle, abhishek thakur, ensemble learning, kaggle ensembling guide, how to stack models, how to stack deep learning models, how to do ensembling, how to blend models, how to find optimal weights for blending, ensembling and stacking, stacking and blending, ensembling, stacking, blending, machine learning stacking, deep learning ensembling, blending ml models, stacking ml models, stacking machine learning

Id: TuIgtitqJho

Channel Id: undefined

Length: 76min 18sec (4578 seconds)

Published: Sun Sep 27 2020