Pair Programming: Deep Learning Model For Drug Classification With Andrey Lukyanenko

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone if you guys can hear me give a shout out in the chat can you hear me awesome great sounds good so uh yeah it's i'm very excited for today today we have andre one more time and that's pretty exciting and today we are doing something new so we might make some mistakes so forgive us if we do make mistakes i will make mistakes so uh let's go ahead and talk about this new kind of uh thing which which is not exactly new so it's pair programming so here we program together so some parts of the code will be programmed by andres i will do some of the parts and then we will combine our works we have also joined as a team in this competition which is called mechanisms of action protection and uh andre is going to tell you a little bit more about it and he's going to start with the eda part of this competition and he's going to show you some some data and some cool things that you might not have noticed till now or you may have noticed it but we are still going to take a look at it and then i'm going to dive a little bit into uh the modeling part of it and then we see we we can we will discuss some of the cool things that we came up with in the last couple of days so the competition is not told it's only three it was only announced three days ago so you can go and take a look and a lot of people have asked me how to start with a kaggle competition so this is how you can go ahead and start with a kaggle competition and this is this is a competition that you should try you should you should invest some time in this one because it's going it's it's not it doesn't have a lot of huge data so you don't need a lot of resources so you can try it different kinds of techniques and now without like i i won't take a lot of time and over to you andre just just one second andre let's try now okay hello everyone i'm glad that i'm able to talk to you and do with a few track we will show you some interesting things about this new competition first of all if you didn't read some description about the competition it's quite interesting because it's a multi-label classification problem so we have really a lot of labels all of them are binary and we need to build model to predict them well let's see how can you start competition when you start competition first of all i suppose you can just load the data and put some data in gradient boosting whatever the data is and see what happens but i think that a better idea would be to read the descriptions to understand at least something about the data and well this way you could get some idea about which models could work which approaches should you do and while what scores could you expect so what is this competition about this is a domain based competition that i mean that uh some people who don't have idea about this domain like me for example or will have some difficulties with starting this competition and it will be necessary to read some papers some other things to understand what is going on so here this competition is about drug discovery nowadays drug discovery is usually done using some kind of models which use information about some diseases and biological mechanisms behind them and scientists can find some connection between the data and find some kind of protein target which is associated with some disease and then they can create molecules and can modulate it and this biological activity of some molecule is called the mechanism of action or moa i hope i am not wrong because i tried to understand this and i hope i was able to understand this correctly so as far as i understand the data which we have is about some sample of human cells which is treated with some kind of drug and then we see the information about responses and these some responses are used to search for some kind of known patterns some in some huge databases or to the known mechanism of actions and this way we get this information i hope i wasn't didn't confuse you very much i just suggest you to read the description more so about the data itself we have several files you can see them here uh first of all we have of course train features uh some data in which we train the models it contains uh several things uh first of all is information about trends information about cells and some additional information which i will show you when i start working with the data but what's more important we have two files with targets first file is called train target scored it contains 207 columns one of which is id so we have 206 columns with targets and well uh if we want to start training gradient boosting models as far as i know current gradient boosting models don't easily support multi-label problems so we would have to train a separate model for each column while some of these models could run fast in total the time would be quite high and well the the main problem in this is that we train separate models and they don't capture some interconnections behind the targets but if we use neural nets then we can train one single model to predict all the targets and we can think that it could exploit the interconnections between the targets and we have an additional file train target snow scored which has even more columns but we don't have to predict this oh sure that's nice great so this is an additional information for train data we don't have to predict it for test data so well we just can use it while training and we will show you how to do it and uh for now let's try with that exploration and let's see what is this competition about what that doesn't have and what can we do about it so i usually use faison there are some prominent cognates which use r but well for me python is preferred but as you can know from my kernels let's see first of all let's run the default code and we see that we have this tradition these five uh datasets which we saw let's read all of them pandas is already loaded so we just can use it and let's do something like this so we need to read these files uh of course we could run some kind of uh loop and use and read the data into dictionaries but it's well it's easier and more common to read them separately and use them i like using code and cooking things to make things faster so it will go like this and here now we have read all the files first of all well if we are doing a day we also need some thing for plotting the data let's use standard maple leaf for this well now let's look at the data what do we usually do when we start first of all we look at the shape of the data and look at some part of the data so here we can see that in fact the dataset is really really small well people got used to hundreds of thousands so maybe billions of rows and here we have just a small set but the number of which is quite high and if we train gradient boosting models usually it will be slow to use all this data so what do we have here first of all the id okay and well just send it check it's also good always good to do something checks so we have this number of rows since data set if everything is correct then the number of ids in data sets should be the same number of unique ids yes everything is okay but well if we saw that the number is lower then some ids would have several rows we could use some aggregations and so on now what's here here we have a cp type this is well a treatment type and it has two values uh one value for when the sample was treated and another value for when the sample wasn't treated it was in the control group let's see we can see that almost less than two thousands of samples weren't treated this means that they control group but for us it means that all the drugs in this rows are zero and what's more important it means that in test data it will be the same so it would make sense to already use this for post processing so we train models we make predictions and after that we find these rows and set all predictions for this rows to zeros good next column is cp time this is treatment duration so that the cell was was treated for 24 hours there are three values in this column three different values let's see well we have roughly the same number of rows so well it's it's good but it could also mean that this won't really help us with using this machine learning and we have cp dose this means simply a high dose although those of the drug well let's see also almost the same numbers nice and now these hundreds of columns we have general information and information about cells so it all starts from zero so we have 100 columns for cells and all the other columns are about genes this is a very basic exploration we understand now what diet we have and now we can move forward first of all it would be a good idea to look at the targets we already know that targets present multi-label binary classification classification problem so we should have a lot of columns with only zero one values yes something like this and uh you can see that uh there are some real names well maybe it would be worth really reading about each column but well it would take too much time for now so just let's have a look at this uh so first of all it would be interesting to see how high imbalance do we have because maybe most of the rows are zeros maybe most of the rows are ones it's very important to understand beforehand so we can do something like this and here we see for each column number of rows which have one value let's not use the first row because it simply shows information about id column and now we can sort the values so what do we see here there are two columns which have only one row with one so these rows have really really huge imbalance and there are other roles with the low number of predictions but what's more important you can see that in general the number of positive labels is very low because we have more than 20 thousands of row of rows and we in the most value is roughly eight hundredths so we can say that we have a really huge imbalance and uh this will make the competition uh difficult so that we don't uh uh so those models can work with it for example if we look at this then we can say that maybe quite a good strategy would be to set all predictions for these columns to zero because well uh if average number of predictions is one among 20 thousands and maybe we should just not train the model for this label and simply set all predictions to zero and it could be well better as a the best baseline approach which could be difficult to beat so do you think do you think like because of this imbalance there will also probably be some shake up in the end well this is a very good question uh first of all it's necessary to look at the metric the metric is an average lock loss a very good thing about this is that first of all it doesn't depend on the size of the data and second we can optimize it directly this both of these things are quite good and also i think we have if i'm not wrong we have 25 percent of the data in a public test let's see let me remind yes the public leaderboard is 25 which is well not high but it's much better than in some competitions so uh well i hope that uh cv would be stable and there will be a list will shake up because one of the main things which contributes to this is that i think that the distribution of the data is roughly the same because all of these sample styles drugs are real things they i suppose they could be randomly shuffled and so on so i think the shake-up wouldn't be really strong yes in fact i think if we talk about tda there were there are already several great kernels published to this idi for this competition for example by heads of tales and by me so i don't think it's really necessary to go deeply into all of the data but we still need to look at some interesting things right so first of all i think it would be good to look at the data itself so let's take some single row of the data [Music] for example it looks like this yes let's take all the columns which have in which that with j and let's do and so let's take only them and let's try to put them these are values yes well it's some strange sequence and here i suppose it would be necessary to read the information about the domain knowledge because on the one hand it could be some not interconnected data and well and the feed forward nets would look the same on the other hand there is some periodicity in this data so maybe there is some record information and we could use it but there is also one funny thing let's try to sort the data if we sort the data well we can see this strange scene i'm not sure maybe it's just i am a dumb data scientist knowing nothing about the main data and the sorting the data doesn't make any sense well on the other hand maybe we need to sort the data to find some real information about it but well let's leave it for experimentations for later i i don't know why i have seen this kind of uh i have seen this kind of similar graph in some previous kaggle competition golden fishes yeah some like a graph very similar to this one and then came the golden features yeah well now i'm just saying it's a bit suspicious well i suppose we'll see in some days or weeks whether the suspicions are confirmed or not uh well now i think it would be cool to look about a little more in the data let's uh look well there are a lot of columns so we can't uh take all of them but i want to see some distributions of the columns so for example uh let's take i suppose some kind of first column with from j let's see and block histogram well it's has more or less normal distribution and uh if we look at most of the features from starting with j they have quite similar distributions and well i'm not sure what this means maybe they were already normalized so we lost real information but maybe they really look this look like this what's more interesting if we look at cell information it also looks like distribution but there is this uh bump in the graph i think it would be interesting to look more in the data because maybe there was some artificial clipping and in fact there are some low values but hosts made some artificial clipping at -10 maybe in domain there are there is some reason to for this various to stop at -10 so we can see that uh there is there are some interesting things hiding the data and i think it would be really really fun to try working with them well i think that's it for the introduction and for first look at the data and i'll commit this notebook and of course i'll make it public and available to everyone and i think now we can move to a more interesting step to any practitioner to building the models and apple shack will do this so thanks a lot andre for the initial part and now we can talk about the boring part which is the model but also very useful part so well let me share my screen now i hope you guys can see my screen if you can just give me a shout out in the chat so today we are not turning on my video because we i had some problems setting up everything so you have you have seen the data now so now it's uh like uh you you should know tr you should know how to like how to approach this kind of problem so the problem is uh majorly a multi-label classification task so you have to remember this is not multi-class and that there is a huge difference so every sample can belong to many classes at the same time simultaneously so to to start with any problem what i do uh in the very beginning is to um create a cross validation scheme so uh that's what we are going to do first so whatever you do the first step is going to be split the data so we can we can create the splits by just like uh writing a simple function split data function so but there is a problem it's a multi-class multi-label classification problem so you have to use a kind of k4 cross validation which is suitable for multi-class so i i was talking to andre today in the morning and he mentioned we can also use simple careful classification do you want to say anything about that andre well i think that only experiments will show because some competitions use some unique schemes of validation like maybe krupka fault and so on here i think we need to try both ways and the most important thing is to see how does the skull correlate with leaderboard if it collides well then this kind of scheme would be good yeah i think that can that can also work uh but uh when when i saw this problem first i started with a cross validation scheme which is suitable for multi-label classification so what we do first is we import pandas obviously and then we import another library or before importing that let's uh write the main function so so uh i'm just writing the main function and what we are going to do here is just going to uh read the data set so let's say your data set is stored in a file inside the input folder and that's called train uh what was the name of the file again let me check train target scored so train targets score dot csv and uh the next thing that you have here let me hide this uh is like create a new column so once you do that so why i'm saying this because it's also very important when you're working in a team if you want to there is a problem they see my screen not yours currently no they are seeing my screen well you can see my screen okay sorry sorry about that so my screen went away uh okay okay let me fix that uh just give me a second now you should be able to see my screen again so i i didn't do much i just did read csv so i read the data and then i'm creating a new column called k-fold and why i'm doing that is uh it's because um when you're working in a team you want to work on the same folds so in this composition or in any competition in future uh if you in the end you want to combine your models together you want to do stacking or assembling or these kind of things so you have to have the same same folds so i just set this value to -1 and then we we can we can do some randomization here so this is how i like to do it you don't have to do it this way you can you can just use random argument inside what i'm going to show you and now we import uh another library so let me let me also write targets so targets will be df.drop so there is one one column uh which is the id column so it's called sig id and you just drop that column uh if you drop that column what you're left with is targets so here you can do dot values or to numpy so anything is okay and now we import one more library which is iter strat so dot ml stratifiers so from from this library we will import uh multi so multi-label stratified k-fold so okay till here you have seen in many of my previous videos so then i can i can just say mksf is my this multi-label stratified k fold and we create a um create splits so and splits equal to five and we don't we didn't need to care about any other arguments and then we can do for fold underscore comma train and valid so these these are my indices uh in enumerate and here i can do uh mksf dot split x is my df and y is targets so once i have that uh after after this i uh just fill in the values so df.lock uh only for the validation indices so k fold will be equal to fold and then here in the end you save this csv file and now you can share it with your teammates so we can just say it's called train underscore folds so now you can share this file with your teammates and in the end if you want to do ensembling or stacking all of you have the same fold so you won't be over fitting the data and that's very important i'm not saying that this is the best kind of uh cross validation uh uh for for this problem but uh it it might work so for for now i think it is quite good the next thing we come to after this is the actual model itself um or do do you want to go into model should i start with the model and uh then you talk about the lightning part because i think you should write code invitation lightning to show the best approach the best practices for it okay so i will start with code in pure pytorch [Laughter] is that okay yes okay awesome so um this time i'm not going to create a lot of files and because andre is going to show you something cool uh how how he converts this to pytorch lightning code and that's much better it's much better to do it that way and we're going to keep one simple single model and uh when when you're creating a model so you have to like uh import a few things again so let's import torch and uh import uh torch dot nn and now we will be creating our first uh uh model but before creating model let's create let's create some kind of uh data set object so let's let's do it like this and here we have a in it function and this these are like simple things that you should know and i i i show my screen i'm sorry again my screen is shown in the translation how does that happen it switches on its own it seems so okay we are back so yeah let me know if we uh lose this screen again so now i will keep looking screen so now now it should be my screen yes okay great so yeah we are going to try to use neural networks and the only reason um is that because we can we can try to predict on all the uh different targets at the same time and we if if you if you want to use uh xg boost or any kind of gradient boosting library like gbm then you have to create one model for each and every target and there are more than 200 targets so it's going to take a lot of time and apparently it's easier to just create a neural network in this problem so what i've written here is i have two inputs one is data set one is feature so i i will just define themselves.dataset is dataset and i will tell you what uh these are in in a very short while and features and uh the what is the length of the data set the length of the dataset is same as uh the shape of uh self.dataset so we can just do return um self dot dataset dot shape zero so this is your length of data set the number of samples you have in the data set and now we can write a very simple get atom function and this will be uh this will take a item which is an index and will return a simple dictionary in which you have x and y so x is your uh your features so torch dot uh tensor so you have to convert them to tensor and it's it's very simple so i'm not i'm not doing anything uh extraordinary here self.dataset and here you have item which is the index and all the features and that's it and d type torch dot float and we do the same thing with the targets so here you have a multiple columns of targets so here i will just do features i should have called them target but that's okay and everything else remains the same so we are returning uh this dictionary which is pretty simple and easy so there is no problem here what you can what you can also do is uh go a little bit complicated but we will probably discuss that in the end if we have some time left after you have created this data set you would want to create an engine so so and what engine will do is it will have a training function and a validation function so i create a class engine and define init function first and which has um self object and model and optimizer and the current device so devices you you can also get it from some kind of config file if you want and we will just take everything to serve uh self.optimizer is optimizer and so devices the device which can be cuda or cpu whatever you want to use so remember to use cuda because you will be training neural network and then you can define a kind of uh some simple loss function and this loss function will have uh will take targets and outputs and uh loss function is also very interesting in this problem because you can you can play around with a few more things uh that that probably you can talk about andre if you want to yes well uh i can say that if we talk about writing code that if we talk about libraries then you can you use engine i use uh stacking i think that's right no no there is no single approach to this because well if you if some person writes code for himself and maybe shares only the results like for example predictions yes then it is possible to write anything totally possible you could write your own framework and i know a lot of people who write their own frameworks and use them and well success i know some teams where there are several people and each of them has their own framework as well it's work well on the other hand if you often need to share the code or you need to teach someone or something like that it's often a good idea to use some common framework and well there are many frameworks for iteration nowadays i won't i'm not going to bash any of them or do say something negative well positive i just say that currently i'm using python because for me it's currently the easiest to use because it is it requires you writing really i thought there are of course some piece which you need to use but the code itself is almost pure python you don't need to write some unusual classes which aren't used outside of your framework you don't need to make some unique passes of the data and so on i'm not saying that it is the best because well there are frameworks which are really good they were proven to be good and they're used by many people it's just my personal preference for the current time maybe it will change in the future i have already switched between at least three frameworks and well maybe something will change in the future yes and well this library engine is well it's simple fast and useful so it's quite good yeah so as long as you know what's going on inside so then it's fine you can use any kind of library or any kind of framework that you want but you have to you have to know what's going on inside so uh yeah while andre was talking i wrote this whole training function so i will explain you what's happening here so it's taking a data loader so it is the training data loader in our case and then we are putting the model to the train mode and we are initializing our final loss to zero so uh what i'm doing here is i'm going through each and every batch of data from coming from the data loader i'm doing self dot optimizer zero grad so optimizer and device and model are fixed so that's why they are in the init function and our data loader can change so that's why it's in the uh it's an argument inside the train function and then we get the input uh which is x and the targets which is y and then i i pass everything through the model so which we have not defined yet and then everything goes to the loss function and now here it shows me the error like too many position arguments is because i didn't i didn't have i don't have self here and uh that's because i i'm not using it so i don't need it so what i can what i can do here quickly is make it a static method and then the error will go away so this is also one thing that uh you you should remember you should know um and then i am doing uh everything else that you usually do when you're building a pie torch model so in the end i return final loss so i'll add loss for each iteration uh to the final loss and in the end i return final loss divided by the length of the data loader so which is the average loss and the same function if i just if i just take this thing and if i just copy it here it can also be used for validation so it's it's as easy as that and so i will just say validate and here instead of putting it in train mode i'm going to put it in eval mode uh everything else remains the same i don't need this i need this i don't i don't need this i don't need this and i think we are done yeah so uh now you have written your engine with uh training and validation and uh it's it's quite simple uh but it's still there's a there's a lot of code that you have to write which is left and then then we can we can try to improve on it uh and but once you once you have if you have written it once you don't need to write it over and over again so you can you can just keep reusing your code so now uh what we are going to do is we are going to read the data so we probably we can create a new file so i already have a train.pi here so i don't need to create a new file and we can start populating it with some some things like uh let's import some libraries so let's first import torch import numpy we might need numpy i'm i'm not sure if we need numpy yet and we can import pandas uh to read the csvs and uh we created this utils file so let's import that to import your tools and these are these utils so if you write engine in this way you can use it in any kind of problem you want and define some uh global variables so let's say cuda and epochs so let's say 100 epochs and uh now we can write a function to do the training for a given fold uh so def run training and which takes argument called full and here you read the data so i'm going to skip some some some of the things here so you read pd dot read or maybe not uh csv and uh here the file is called train underscore features dot csv so you read that file and what you do is what you can do here is you can write a process function so let's say the process function is in utils so process data and that takes the data frame and that returns another data frame and what process data function is going to do is uh there are three categorical variables inside this um data set uh in this competition so what we are going to do is we can we can create a function called add dummies which takes the data frame and the column and uh you create one hot encoding of uh using pandas get dummies function so get dummies data and the column name so this creates one out encoding of those categorical variables this is something you can also do manually because there are not so many categories um and now now we have uh ohe columns so i can write a new variable which columns i think i did not import pandas so let me import pandas um so everything looks fine here and this will be uh let's say this is a f string and this consists of column so the column the original column that we supplied underscore some c for c in osg dot columns so we create uh so we we will end up creating so if you have like three variables we uh three different categories we end up creating uh new uh three new columns so now we say oh underscore columns other if you don't do it this way then uh it will be a little bit messy ohe underscore columns sorry this one should be dot so now we have changed the column names and data is data.drop and remove the column from the data frame itself so column axis go to one and in the end you can do data equal to data.join ohe so now you have added the three new columns if you got three new columns as a result of one hot encoding and return data and uh now based on this you can create a process data function so process data so here you can include a lot of other things if you want but we don't need to do that so add dummies and that takes the df data frame and the column name so the column name that here that we have is cp underscore time and you can do the same thing for other columns so cpu underscore dose and cp underscore type and return the data frame and this is the process data function that you can also use for test data so i don't have any comments on scotch do you have any comments on scotch andre honestly i think i saw it but never really used it i know that there is coach ignite the reason for there was storage bearer but honestly i didn't use any of them the libraries which i have used were fast eye catalyst and fighters lighting yeah i saw scorch a long time ago i don't know if they are still developing it i have no idea i think neptune site has a great block where six or eight uh python high-level libraries are compared by their also so you can everyone interested can find it and read it okay i will i will take a look so um yeah what is the next step next up we go back to our training file so now we have the data frame which is processed already and uh if you if you do this dummy variables thing then you don't then then you don't need to create about creating an embedding layer which is anyways going to be three features in it so that's also going to do the same thing and uh you load the train dot fold strain fold and underscore folds dot csv which you have you have already created in the previous step so now you have the targets and targets will include all the columns except uh the signal sig id and uh k fold column access equal to one dot columns so this is all your targets now and similarly you have the features so features will be uh from the df variable which is the data from the training data frame that we have and we don't need we don't have the k fold column there but we have sig id and accept that everything else is our features so now now you got your targets you got your features but these are in two different data frames what we are going to do is we are just going to merge them so we are merging folds on the data frame df on the sig underscore id column and we are doing a left join so this pandas makes our lives easy a little bit and there are many different ways to do it maybe you can do it in a much smarter way so you don't need this and uh then we can divide our data frame into train df and test df so how that works at the df df dot k fold is not equal to fold which is our argument and we did make a mistake and here we just do a simple reset index drop equal to true because it's going to change our indices and we don't want that for now and similarly we have valid df and valid df is nothing but uh just one sign change changes and with which is this one this is not equal to sign this becomes equal to equal to fold so this becomes your uh training data frame and validation data frame and now now you can extract uh the features from it so this has features plus targets and you just want the features so train underscore df and features and here i'm using another and here i'm using another function uh dot 2 underscore 2 array can you see my screen just so it's working fine right yeah it should be fine okay okay uh if you cannot see the screen just give a shout out in the chat so now we have xtrain which is uh yeah trained ef and features and similarly we can do uh x test or x valid sorry here we will just do valid and this will be valid df features so now if you're doing it this way your features are always going to be in the same order so you won't be making a lot of mistakes there and similarly you can do y valid which is your uh train df targets so here you have targets dot two array and another one will be y train okay sorry y train and y valid train df and validif okay so now we are good and now i i will be skipping uh some steps so what we have to do is to create a train data set so just because we are running a little bit out of time and here what you will do is you will call utils.mmoa dataset and you will fill these values which is xtrain and y train so set and features i shouldn't have called them features but i should i should have called them target budgets okay and train loader becomes uh torch dot utils dot data dot data loader and here train data set and you specify batch size what batch size you want uh so let's say i can fit larger batches in this uh for for this data set so i can just do one zero two four and that that works and num workers let's set it to eight okay so now we have uh we have all these things and uh so values gives the same thing as dot two array yes it does as well as as as far as i know right uh do you know andre if it gives us something different um so like if i do dot 2 underscore array or if i do dot values oh i think it should be the same i think i think i think so too so yeah as if i remember correctly it's the same thing but i might be wrong so do take a look at the documentation of pandas so now we will have our model which we have not created yet and we need something here and we send the model to device so yeah you see it's taking so so long to create this kind of model and then you need an optimizer and that's where uh these frameworks come in handy so you can you can try using them if you want to torch dot optim so we can we can just use we can use sgd we can use atom so it totally depends on you and model dot parameters and we will specify some kind of learning rate so it has also been seen that you can also try some kind of weight decay here if you want to that helps you improve on your results and you can also have a scheduler and now you if you uh were paying attention you might have noticed like when i created the engine there was no scheduler in it so this is something that you have to you have to remember and you have to do it on your own if if you're uh it depends on when the scheduler should be stepped so if this scheduler should step after each batch then you have to call it inside here somewhere and if it needs to be stepped after each epoch then we can we don't need it in engine we can just have it in here and we can just write it inside the for loop so then you have the engine which is utils dot engine and that should take model and optimizer and device so device and uh now you can write a simple loop to train your model so for uh epoch so for underscore and range because we are not using that value and here you have your train loss which is uh which is engine dot train and train loader and your valid loss valid loss comes here which again i did not create so here i have created only train loader but valid loader has to be created in the same way so now the only thing left for us is to so here you can have your early stopping and you can you can add early stopping you can add model savings all these kind of things you have to add it here um and now we go back to uh the very the most important thing which is the model itself so to create the model model uh so the model that we came up with or we have been looking at is uh pretty simple so it's just a simple feed for a neural network so you can just do class model which generates from one two and undefined in it and serve and uh the number of features num features number of targets so what are the number of targets you have and the usual super stuff okay and then you can define your model so it can be a sequential model and it can have a bunch of uh layers so let's say it's a sequential model and python so this thing is not auto completing for me but anyways uh number of features and some some kind of uh out output features number of output features so you can you can keep it at 256 102 for these kind of values and what you can also do is uh add a batch normalization at a dropout add some kind of activation and you can keep on repeating this so you have to find the appropriate number of layers that you want so this was 256 and let's say we add a dropout 0.3 and that's that's all we add so this combination is not just about the model it's also a little bit about post processing some rows belong to control group and they just can't have any values besides zeros yeah so because for the simple procession could work yeah and anyway so if you try a simple model like this uh so one thing i forgot we forgot the output uh linear and here it should be 256 and uh number of features number of targets so this is the number of targets so a simple model like this is going to give you a pretty good score for now for now uh maybe in future it won't so what you can what you can do here and uh probably i will show it to you after andrei converts this to pytash lighting so this kind of model with some tuning is going to give you a very good cross validation score and uh when i say good i can give you a number like 0.0154 these like so like it's much better than uh what you have uh in the public notebooks and that is without post processing and it's it's because it requires a lot of tuning so you have to you have to choose how many layers you want how many what is the number of output features uh what is a dropout so these kind of things you have to take care of and if you if you tune it well uh you can squeeze a lot from this simple feed forward neural network so now um we will move again to andrei who is going to show you how to do this whole thing in a much simpler and elegant manner well okay then i suppose let's switch the screen to me yeah i have switched the screen uh i still see your screen for now oh yeah on youtube there is always some lag so yeah let's wait till it appears so uh as we yes okay uh my screen is shown so uh as we are supposed to be too late i'll repeat some things i just copy some of these things yes so first of all i have run all the i have opened again my notebook i have run several cells and went to my plots now let's prepare the data this is similar to fishex functions to them if i just in my style but the result is completely the same so i uh get dummies from this column with this prefix and concatenate with original data and then group these features original ones okay now here is a data set which is written by akishak here is it i just changed names from pcso targets here is the model it's similar it's just maybe some different activations and so on but it's almost the same and for now i suppose it doesn't matter uh how it looks like so now to pytorch lighting uh first of all as i said it is very similar to by torch but has some of course some so [Music] uh first of all we need to have internet because for now it doesn't appear in kaggle uh uh docker so we have to install it by it and that's why there is a little problem because if we install it then we need internet but if we have internet that then we simply can't submit from the same notebook so we have two choices first choice is to train models in one notebook and make inference in another notebook a second choice is to add wheels to some data set and then installed by the selecting from it okay so i have installed it and well suppose i can import it uh and i want to say it first that well uh considering that we have different styles then there are some differences in the code and uh making all the code that is the same we will make will take some time so we will see some different pre-processing but you should see the general idea yes so uh python for training the model requires three separate things first of them is data module in fact the main thing from this is to prepare the data so when you write python code usually you prepare the data you make data sets you make that loader slide right and here we do all these things but just in the separate model to um we write it in a separate class to have it in one place and to be able to use it in all projects and i want to repeat that the main bonus of lightning for me is that it has the same template for different projects so you have the same classes the same methods for image classification segmentation guns table data and so on yes so you change the things which are inside so let's see what we do have here we have this class which inherits from lightning data model we have a need first of all of course self then ash params this is hyper parameters lighting having the integration integral possible integration with different hyper parameter tuning libraries so we could pass some hyper parameters here and change the model but we don't do this still we need to have some default value and here we have the data and targets as before now prepare data this method is for doing some things for the first time for example you download some data yes you have some initial conversions and so on but it should work only once all that things are done here this is for better work on multiple gpus so uh training on multiple faults is the best way but as a default approach we simply do train test splits so we have data we have targets and we split 10 for validation 90 to train data and of course we have to fix random seek yes now we load the data set so to set we need to send here two things train data that switches this syntax means that we drop the first column which i remind you is id so we just drop it and that's it the same for targets we drop the column with id and pass the array i don't pass the data frame and pass numpy instead because this is faster and here we have the standard scene with trend loaders so train loader and data set batch size now workers training shuffle validation we don't shuffle and we need to return data loaders that's it we have prepared the data now for let's see what's oh i usually prefer doing some typing let's stop with visits it's advanced topic i don't think it's usually necessary on kaggle uh by the way one point is that some frameworks are cool but some of them some good choreo practices require really really a lot of code so often it isn't reasonable to write it at all in kaggle competitions uh when you are doing some exploration and first experiments yes so next is the main class i will write it right now life i show you it's similar to this thing which is in my public notebook but let's see it's usually they are called some elites something so let it be lit more so we need to inherit from a special class lightning model with this one yes so of course we start with init i got used to writing code in the ebas because they have a lot of really nice things like code compression some headings this selfs and so on so it i like well you could say i am not well i forgot how to code this in uh notebooks on google for something so i hope i good enough oh and we assigned this various visual past i in fact in about the model uh one of approaches which is uh commonly used in default tutorials is writing some the model inside this class but i think that's well it's not optimal it's more difficult to change the code it's much cooler to define model outside and then pass it here uh and use it so what do we this is usual forward method of any model we just need to repeat it so we pass i suppose like this it'd be just x and we do this now for more specific uh pattern typing things yes first of all we need to define optimizers and schedules so usually we do it something like this we just try to optimize a scheduler and uh use it in python i think you do you write something a bit different you have we have a special method for this configure myself itself uh so we just can copy these things here so module parameters learning rate scheduler let's see here right like this yes and now we need to return it i really don't know why people do it but there is there are some advanced cases when people use several optimizes several shadows i suppose they switch between them so but this means that it is necessary to return a list of optimizes and not optimized but itself so like this and then we can have several optimizers here and now we have a scheduler a default way of using cheddar would be to just write the same but if you want some advanced behavior it's necessary to write more code so we write a dictionary well list of dictionaries but i have one of them first of all the traddler itself then we have interval uh you know that different editors work on different intervals sound broken batches but dedication pattern works on epic so we need to write it and one more thing uh monitor vowel was very close what is this uh we know that when we use some shadow we need to pass some work to it right so here i say that later there will be some variable called very close and the trader should look at this adhe at it and decide what to do with it whether to change learning rate and or not okay that's it now for training in fact this is almost the usual training code just without the backwards and the other things which people usually forget and one more bonus thing is that model are automatically switched between model training model level and so on so you don't need fear that you can forget it here we have three things itself of course but should be the data and budget breaks i don't usually need bash breaks but it's usually written so let's have it so in the forward function you don't return anything oh i suppose yes you're correct it was my mistake we need to return this uh so premium uh often batch is uh when we train image classification models usually but usually in better set we return image and target so it could look like this yes but let's see what do we have here in our data set uh in our set we have a dictionary with keys x and y so we need to use the same so here we have uh data x and target now we just pass it to our model uh it's written like this we could write so forward but this also works so like this and now we need to calculate the loss so let's see if i remember it correctly i think it's yes then we just defined the criterion let's define it here it's not very correct to write such code but let's do it for speed and then so loss equals self with so this way we calculate the loss now what do we need here uh here we see the specific lightning in fact there is some newer ap currently but it doesn't work for me in all cases so i will use which is a bit older but works correctly uh first of all i define the dictionary with log with low locks yes so train loss is loss and we return a big dictionary there are several things here uh first of all loss this means that this value will be used for gradients and this is the loss which is used for optimizer to change the ways of our model next we return blocks looks these things which could be locked somewhere uh pythonx lighting has integration with weights and pre-access comet ml neptune ai and several other popular experiment tracking scenes and this here will be sent to the logic which you define i didn't define any numbers so but i will keep it here so that you could see but we also usually want to see matrix while training so we have this key progress bar and i put logs here so while training we will see the value of train loss good this is for training step so what happens at every training step but usually when training stops we want to do something with the results it's especially important for the cases when we have for example a rocket look metric because we have to get all predictions and all targets to could create rocco yes so how does it look we have method training apple and self outputs outputs is this word which means that this is the list of the things which we get from here so the list of losses in this case uh so we can calculate average loss as something like this as far as remember so we stack all the losses like this in outputs so we stack all losses and take them in them nice and we have and again we write logs like this oh and here we return almost the same thing but we don't need to return loss because here we just create something and return the results here is it and now we do almost the same or really the same for validation so we have just i can think that they can save us some time and write this so we have a validation step we also get batch we also calculate the loss just the names should be different this is our valid loss yes and uh of course if you have some specific things which you need for validation you can do them with them here and a similar validation and we just do the same but yes between this here so here we wrote the second important thing for python and the last thing is the class for training the model itself it's called trainer not surprising yes it has really lots of parameters you can see them in documentation but let's see which are most important for now reviews well everyone wants to try and train on gpus by the way i want to point out that the amount of data here is small so you if you like draw you can train on cpu it will be several times slower but it will still take maybe 10 seconds per epoch so it's no problem you can use cpu for this tabular computation with deep learning in any expedition in new york situation right now here we define number of epochs well let's have only a five not trained long there is some build pain functionality to check that everything works but let's make it work and wait summary from excellent full it will show us information about the model itself yes good now well now we need to improvise everything so model equals sleep more aspirants i have no values for them so let's simply put an empty dictionary here we need to neutralize the model does it have some parameters oh yes so um i think this one well anyway we will see whether i'm correct or not oh let's rename this net model net unless we have a data model more model [Music] targets is the rain targets scored well supposed that's it and let's see how many errors i made while writing this really fast two things here and when you're live coding and training you make me some mistakes oh i suppose what's wrong here ah yes s i didn't write it oh inputs of course of course inputs no nobody has impulse here roommates can learn [Music] model selection i suppose like this okay one more attempt what's wrong now oh i didn't press enter on data set oh my mistake okay once more a is an a is in capital so that is a problem i think i just didn't press enter on this on that cell okay okay okay so first of all we can see already our model we can see the number of parameters on each layer it's well it's fun but something is wrong again very data set i did something wrong let's see [Music] self let's see the model ah yes i suppose i didn't use it i suppose you're right there maybe i may be really [Music] yes i used to write in everything in the year and where i have all errors about these things okay uh yes i have different names for these things so targets this is what happens when you use part of code from one person and part code from another person a lesson for everyone when you run experiments it's okay when you do some serious things oh something's happening validation scientific it's uh a thing from pythos lighting which is remembrance validation data and checks whether everything is correct or not uh not everything is correct oh yeah and i thought that there is there was something wrong with the number of fishes okay we have model net equals yes [Music] let's see yay it's training so i fixed all the problems we see varied loss here and well it's similar to the scores on the leaderboard and that's how you would train modeling python and you you can also see like it's it's uh really fast to train neural network on this data set so you should definitely give it a try yeah and i'm not sure should we do the submit from here let's let's leave summit as an exercise for people or maybe i can uh maybe i can talk a little bit about uh tuning the parameters for neural network and you can write the code for summit in the meantime okay let's try this would be an awesome idea okay then i i will show you guys some um information or some code on how to tune your uh neural network parameters so that you can you can squeeze uh a lot out of this so now let's go to my screen awesome so now we are in my screen and um i wrote i i wrote the bunch of code before before this even today so yeah yeah so now now yeah we switched to your screen for some time but now now it's fine again so i really have to take a look at why this is happening but uh okay anyways so so what we are going to need first of all is um a library to tune hyper parameters so you can you can use sk opt you can use hyper opt you can use optional so optuna is something that i recently have started liking quite a lot so uh you can you should definitely give give it a try and um then everything else is more or less the same so here you can see the model that i have is in the training file so i didn't i didn't have that earlier here i had it in the utils dot pi but uh here we have a um in the model right here it's just because you can see you can take a look at it uh and i can show you easily and then we have the run training function so now run training function usually previously took only one parameter which was a false parameter fold parameter but now it's taking more arguments one of them is called params and one of them is save model so if save model is true then it saves the model if say model is not true if it's false and doesn't save the model uh simple as that we do load the folds and then what you can see here is uh another interesting thing i'm also using the non-scored training targets so this is also something that you can use and so this should be folds so i'm just fixing it folds okay and then uh i'm extracting targets so so how i'm using this non score is i'm just adding all the columns together so it gives me one column and i'm just using one single column from the non score target i call it nscr and this becomes my auxiliary target so you can also try to uh like create an auxiliary target using the other target file which is given to you and that's not given for test so you have to remember that you have to remove it when you're evaluating your model so everything else remains the same we create the data set we create the data loader and uh when if we use python lightning we we will use less code which is nice so uh then in model so let's let's go take a look at the model so here we have the model and now you see like in model you have a lot of uh new arguments and uh these arguments are so if uh so everything is adjustable so you just supply what you want and i also have a for loop for adding uh different layers and then in the end i create a uh sequential model using all the uh using the list of uh layers that i have so as simple as that so everything is adjustable so that that's how we wanted if we want to tune hyper parameters for the neural network or if you want to find out how many layers i need so uh now number of features number of targets this this is always fixed and the number of layers hidden size dropout everything else comes from params so params the argument params is a dictionary which has number of layer num layers hidden size drop out it also has learning rate and you can also modify it to have different kinds of schedulers if you want so now this is my training loop and this training loop returns best loss for a given fold and a set of parameters so the next thing that you need is an objective function that takes an argument called trial and er then you can use so this uh dictionary that i've created here the suggestions for uh the number of layers how many layers should be so it should be between one and seven uh what should be hidden size so let's try all the values from 16 to 204 right this this kind of thing you can create using optuna so i'm suggesting int for number of layers then just some kind of uniform distribution for dropout or log uniform for uh learning rate so uh these kind of things uh you can create and you can modify these ranges now when you're done with that uh you do all five folds of training uh so it's training for five folds using a set of parameters suggested by optuna and don't save the model because you don't need to save the model you're just doing hyper parameter tuning and specify the fold um so for for each fold calculator loss which is your temp loss in this case and append it to all losses and in the end return mean loss so what's happening here is uh i'm creating a partial function so um here i have this partial object so okay so this is not i don't think this is correct way of this is done correctly but anyways or or maybe i don't need it maybe i don't need partial so i can just use the objective function as it is and i'm i'm creating a new study and i'm minimizing the loss which is being returned and i'm doing 150 trials so in the end it will tell you which trial is the best trial and it will it will show you all the different parameters that it has been trying and what is the best score that you can get and you can uh you can squeeze a lot from just this simple kind of hyper parameter tuning so yeah it takes it takes it doesn't take much time uh for one uh for one epoch or for one fold i think everything goes in a few seconds uh i think uh every proc is less than a second even on kaggle kernels uh and then in the end you have a tuned model so now uh if we go back to andre he's going to show us how to do a submission using what you have trained let's see after youtube logs and we can see slow very slow okay great so we have trained the model here and we saw it now time to make predictions first of all we have the same pre-processing of course we need to do it the same we convert fishes to dummies and drop old features okay that's it now we need to write a test data set why class is a bit different because we don't have targets i remember mind you that we don't have targets so we have only data set we have on the x valid return and so we had to modify the original data now we create this data set as from our test features the same way as before we create test loader from not our test data set of course no shuffle and huge bus size now for reference inference itself i prefer to create the array with all the potential values at first and then fill it it usually it is usually faster next i create i trade i take the model and move it to eval if you remember that we had the lightning model it's model and it had a modular attribute so i just put it in inference model and make a work then i iterate over batches in our test loader and apply model to each batch take the prediction and pass it to numpy and put it into predictions okay just be sure how our predictions look it's a future array of some values okay uh now why do i do now we have to do this magic no well personally i hate words about magic because i think there is totally no magic in data science there is only data properties there are some unexpected links and so on but there are no leaks but well let's do the post processing first of all i read the data again why because i want to get the list of original values uh so s is uh s here is an entity frame of one column then for each target i create columns with zero values it's more or less idiomatic responders so for features uh so i put our predictions in the columns uh let's see i think i made a little mistake here yes my post processing is a bit off uh because i do it differently from our from our original data so let's see i have s i put all columns to zeros now i put predictions uh into this in all columns so s looks like this but now i want to remind you about our post processing so we have this data test features which is we have cpu s train features one okay of course yes made again let's see we have cp type vehicle and this is control group which should have zero values in all predictions so let's take all the all these ids yes we take this ids and now we need to take our predictions and put zeros into all rows uh let's see so they they will always be zero even for uh private test data public test data yes yeah so yes look um be in this here we have these rows 358 and we take all these columns and zero done now we can put submission but i want to remind you that i had to use internet here to install python so i can't make submission from this kernel so i did this inference to show you how it's done but i'm saving the model and it is necessary to make a new kernel with and use this model and make inference like this well i suppose that's it yeah that's uh you do you cannot use internet so if you pause internet then you will be able to do this but it's it's it's better if you train your model somewhere and then make a data set out of it and then create only create the predictions yeah definitely so um i think i think that's uh that's quite a lot of information for one video what do you want to talk a little bit about the the thing that we were discussing andre let's see i think we can talk about some ideas how to improve the models first of all shaq already showed you that you can use non-score targets as auxiliary in the mic model using them now the next point is you can think about this control group if control group is always zero and can't be one maybe maybe it would be a good idea just to drop this rows from train data but well you need to test it and try whether it's work but also well you could try some cool things with the models uh in one of the kernels i saw already that well you can take these categorical columns and not do simple one-of-a-kind encoding but for example do a categorical encoding but well let's move one step further uh we have genes and cells sorry we have genes and cells they are maybe they have a sequence like i already showed you right so well why don't we use recurrent models here right could you tell us something about more more about this yeah sure so so these are signals right so as far as i know uh they are maybe they are not but i'm not a domain expert so what you can do is you can probably try to use um recurrent neural networks here like lstm or you can use uh gru so which is also very cool and this is something that i have found that might help in your cross validation score so i haven't had time to submit it uh yet but that's something that you can definitely explore with the uh with these two columns that you have in the dataset starting with from g and starting from c so um i think there are 772 uh values in this sequence and another one has 100 so yeah you can you can try to play around with models so i'm not going to show you the model um that um me and andre have been working on for uh like the lstm model or the grew model but you can you can definitely try to play around on your own yes and well in fact uh you are limited only by your fantasy maybe you can take uh convolutions and make one one-dimensional convolutions maybe you could use i don't know dirt or something like this maybe you could use i don't know uh guns auto encoders anything well you are limited only by your imagination resources well and maybe little by experience but uh we are only starting the competition was live only for several for a couple of days so i think we will see a lot of very interesting and unique approaches and i'm sure that domain knowledge should be quite useful in this competition yes true andre and i hope this uh tutorial uh kind of uh pair programming has been helpful to you so there were some technical difficulties but i will i will try to make sure that it doesn't happen in future this is the first time i'm doing it on my channel so uh forgive me for that and if you like the video don't forget to click on the like button and subscribe the channel and share the video with others we will be sharing some of the code that we have shown shown today andre has already shared this part i will also be sharing some some some of the code that i wrote today so don't worry about it and thanks a lot for joining thank you very much and have a good weekend bye goodbye
Info
Channel: Abhishek Thakur
Views: 12,009
Rating: 4.9561243 out of 5
Keywords:
Id: VRVit0-0AXE
Channel Id: undefined
Length: 101min 10sec (6070 seconds)
Published: Sat Sep 05 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.