Accelerator Power Hour for data science professionals with Kaggle Grandmasters (Cloud AI Huddle)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome today's Thursday June 25th welcome to this virtual AI huddle I am honored to be joined by Chris Dodd and AB sheep yeah have a shake - cor for this accelerator power hour my name is you Frank well and I am really excited for the next 60 minutes hope you are too so let's move over and meet our excellent presenters here's Chris and Abhishek themselves how are you guys doing today we're doing great I'm doing great I'm doing fine thank you and hopefully everybody yeah everybody can hear us okay I see everyone's introducing where they're from all over the world just about every continent of Antarctica absolutely amazing great to have everybody on so what we'll be doing is first up Chris will be telling us about how to compete in a catacomb position using tensor flow GPUs and you'll be using the Bengali handwriting classification competition yeah Christiaan is a three-time kaggle grandmaster and data scientist at Nvidia he's earned a BA in math and has worked as a graphics artist photographer carpenter and teacher what an amazing career he went on to earn a PhD in computational science and mathematics look at him go so then in the second half we're gonna have Abhishek giving us a rundown on his approach to deep learning using PI torch and GPUs and so he'll be showing us how to harness the power of TPS and PI torch to make almost any model train superfast and Abhishek is the world's first four-time grandmaster he worked as a data scientist and when he's not cackling working or ranting you can find him teaching applied machine learning on his YouTube channel so you can check that out it's in the description below and currently he's also writing a book titled approach almost any machine learning problem so please I guess give a warm virtual round of applause for our presenters and yeah with that I guess we'll we'll let Chris get started he'll go into his presentation and Abhishek will have you come back when for the second half and we'll you know chat with you then sounds good alright Chris let's have you go into your presentation wonderful welcome again everybody to the accelerator Power Hour with Kaggle grandmasters my name is Christie today I'll be talking about using GPUs in cattle competitions so I'm told there's thousand there's over thousands of people here and I have to say this is the largest classroom I've ever taught it might uh it might take me a while to learn all of your names here's the diagram I use for all of my cattle competitions the two most important things are setting up a reliable validation scheme and doing fast experiments this is why accelerators are so important if you use a GPU instead of a CPU you can do 10 to 100 times more experiments that means you'll win ten to a hundred times more gold medals people frequently ask me how do I increase my leaderboard score answer experiments try more ideas here are the five areas you can experiment with one try different pre-process in feature engineering to try different data augmentation and external datasets three try different model architecture and lost four try different training schedules and optimizers five try different post process today we're going practice these techniques by working through an image classification task cattle held the competition a few months ago to classify Bengali handwriting we were given we were given a graphene and we had to build a model which would identify which of the 168 graphene roots was present which of the eleven vowel diacritics was present in which of the seven consonant diacritics was present we'll be working through a Jupiter notebook together from my team's fourth place gold medal solution let's get started alright so the first thing we're gonna do is we're gonna load the libraries and you can see that today we'll be using tensorflow next we're going to initialize the GPUs now this is a great thing about tensorflow tensorflow know if you have one GPU or many GPUs tensorflow will automatically use all your GPUs it'll take care of communication and multiple training for you next we're going to enable mixed precision if you have a recent GPU using mixed precision can speed up your GPU as much as two times and it can allow your GPU to store as much as two times as much data in RAM and next we're gonna initialize a log file so for every experiment you're gonna want to record which pre-process which data augmentation you used which model and loss which learning scheduled and optimize it in which copra post process you're going to want to write that to a file and also save what the validation score was so you'll have a record of all your experiments okay so the first area of experimentation is pre process so in this competition cavil provided park' files and the images are sized 137 by 236 and the pixel values are between zero and 255 so what we're gonna do is we're gonna load in these Park A's and we're gonna resize the images to 64 by 64 and we're gonna normalize the pixels for from 0 to 1 so know this is our first choice in our experiment if we were if we were to do a different resizing or a different normalization this would affect how accurate our model is later on we see that we've loaded in 50,000 images and now they're size 64 64 and color 1 so for the purpose for this notebook for this demonstration today we're only uploading in 1/4 of the images so that the notebook runs faster and let's go ahead and display the images to make sure that they will resize correctly so you can see here you hear some examples of the Bengali graphene's and we can see they look right and then let's look at some of the labels so we see here that the first image it has which is up here in the top left that first image has graphene root number 15 bowel diacritic number 9 and consonant diacritic number 5 so that's one of our 50,000 training images all right the next area of experimentation is data augmentation so we're going to be performing our day augmentation inside a data loader now it's very very important that we optimize our data loader for GPU speed so when the GPU is training on one batch our data loader our CPU will be creating the next batch so it's very important that that the data loader can make a batch faster than the GPU can train on a batch the main the main thing to consider when you're trying to optimize GPUs for speed is GPUs can do a tremendous amount of work at a fantastic speed so we always want them working that's why it's really important that your data loader provides the is fast enough because we don't want the GPU sitting around not doing anything furthermore it's good to consider increasing your batch size as large as it goes to force the GPU to do as much work as it can when your models training if you look at a monitor of your GPU usage you should see it working at 100% now a common mistake people make is they try to resize their images inside their data loader notice that was step one pre process anything that we're only gonna do once we do that once and get that get that out of the way we're gonna be building our data loader using a Karass sequence notice that we'll be doing you know for our Dan augmentation will be using cut mix augmentation and we'll also be doing rotations scaling shift and course drop out so let's go let's go ahead and look at our data augmentation and make sure it looks looks correct whoa that looks weird let me explain what's happening so we want to make sure our data augment augmentation is doing what we want it to do so we're using cut mix so cut mix is actually blending mixing two images together but it's hard for us to visualize and see if it's working correctly so what I've done here is I have this debug flag yellow equals true and what it's doing is it's turning the first image to completely yellow so right here we can see that so you can see there was the first image and then cut mix removed a piece of it and then the blue is the second image so we can actually see that cut mix is working correctly it's taking out big rectangles and putting in a second image furthermore you can see down here that the image was rotated and shifted in this picture right here you can see all the course dropouts that's what we just removed sections of the image so if we take off this yellow debugging we could see so these are the images that our model is seeing so each time it's random and each time the data augmentation will do different augmentation so deep learning models need lots of examples to train with so date elimination is very important because now instead of having 50,000 images we're essentially making as many more images as we want all right the next area of experimentation is Bill is our model choice so we're going to be using transfer learning in the last five years convolutional neural networks have begun to classify images better than humans and listed and in this picture here you can see names of many of the state-of-the-art image architectures you see you see efficient net you see dense net you see se net so what's great is we don't have to build these networks from scratch we can just download them from the internet and furthermore we can also download them and have them pre trained on tens of millions of image net images this means we can download a load model and it already has some intelligence in this plot here so there's many models to choose from we're gonna pick a model up here on the top left it's efficient that'd be four so we notice that the number of parameters are small and the accuracy on the image net task is very high so with the parameters being small this means it's gonna Train very fast and it's gonna achieve a high accuracy now how we do this in the code is very easy in tensorflow it's just one line of code this is it we just call efficient at before and tend to float down loads it from the internet and then furthermore we're saying to use the pre trained image net weights also we're not gonna include a top because we'll be building our own top to the model for our specific tasks so our top is going to output three different numbers one number from for the graphene route one number for the vowel diacritic and one number for the constant and diacritics we also make sure that the output tight is float32 because if you remember we're using mixed precision inside our model and we don't want the float16 to come out additionally we're gonna be using cross-entropy loss this is just softmax followed with a cross entropy we're gonna be using the atom optimizer and we're gonna adjust how much attention the model gives to each of the different tasks we're gonna it's we're actually gonna have it put a 1.5 weight emphasis on getting the graphene route correct and one on the others so so we're using transfer learning you see how easy that is to do maybe someday they'll have transfer learning for humans then we won't have to attend to this data science conference we can just download the download from the internet and we'll immediately know this stuff and then we would be using transfer learning to learn about transfer learning okay so we could we could have our model output a metric but the competition metric is macro recall which is not included in the terrace metrics so we're gonna we're gonna have to kick so we're gonna calculate this metric ourself and at the end of each epoch we're gonna display that metric furthermore we're gonna write it to the log file to keep a record of it and we're gonna do we're also going to if our model gets more accurate after an epic we're gonna save the model weights also to the disk so everything is going to be logged and let's go ahead and run these the next area of experimentation is the training schedule so this is the training schedule we're going to use the x-axis is the epic the y-axis is the learning rate so if you remember our model already had some intelligence so we're gonna slowly introduce it to the new images so we'll start the learning rate at zero and in the first five epics we're going to increase the learning rate to point zero zero one well then keep that constant for 10 a box and then we're going to decrease it by 75% and we'll continue to go down in a stepwise fashion like this I found that these these step decays work very good alright let's go ahead and start training our model so training a model is super easy all we do is first we build the model we do it inside a strategy scope so this is what we set up earlier with tensorflow this say this is where tensorflow was going to go ahead and if we have one GPU or multiple GPUs it's gonna take care of all that for us next for our data we our data will be produced by our data generator we then create that custom callback which is calculating our metric and recording everything to the log file and then lastly all we do is we just call model fit and we see below that the model is is is learning here and this shows the progress so today we're training on an NVIDIA GPU v100 will see that each EPS epoch will be about will be about 20 seconds we just got the first result back so currently our our our validation is point zero five so our leaders score is point zero five not not not too good yet so let's go ahead and we're gonna train this for more rounds but let's just summarize what we've done so far so we're looking at here is a single experiment right we chose a pre process we chose data augmentation we chose a model and loss and we chose a learning schedule and now we're training this experiment and we're seeing an output of what the validation is and if you remember since we have a reliable validation this score we see this is what if we submit this this will be our leaderboard score so you see we can now and this is all of this all of this experiment is being saved to the disk so you can see we can run many experiments and we can now find the most accurate model all right so we're gonna let this train for a number of epochs it's gonna it's gonna take some minutes so let's let's go ahead and ask the UN ask the audience if they have any question yeah let's do it so we've got a couple of questions from the QA and let's see if we there's some from the live stream I think the first one that we can talk about is some folks are asking why cut mix make sense okay so the first question is why why cut mix so here's the idea but there's many many different augmentations so this is a very very important question which augmentations you use you're specifically asking why cut mix so my general rule is you choose augmentation to try you choose augmentation to try to make images that look somewhat realistic so in this case you know we have a typical graphene it has different pieces that has a routing consonant so if I take this isn't one graphene and mixing with pieces of other graphene it's essentially we're making new graphene's that somewhat look realistic so it very much it very much is appropriate and work and as you've seen if you if you participate in this challenge it's very very effective so you kind of found this a little bit empirically like you probably tried a few different approaches as well yes exactly and that's and that's what you'll need to do and all these competitions it's a lot of trial and error but you have some direction if you read the forums and you look at public notebooks people will start to discuss what works so in this competition for instance everybody was saying that cut mix works very good cut out works very good so that means that encourages us to go ahead and try these for ourselves there's another question in the kind of QA about see why did you need to use the loss underscore weights I'm not sure they just referring what part of the code they're referring to there okay you're probably you're probably referring maybe to what we did here yes so this is an optional thing you don't have to use loss underscore weight by default so basically our model here it has three outputs right it's outputting three different things so if we don't if we don't include this then each output will be trained in Wooley but what who is we actually are gonna say we want to apply you know more backward gradient learning to output one so this is actually increasing the learning rate for output one the reason we did that is in this particular competition the metric actually counted your accuracy of this one twice as important as the other two so it was important to really get that one correct so we went ahead and we and we emphasize very cool so this is like because in machine learning we often talk about weights in the context of the parameters and so here we're using the way the term weights in the same way when we're talking about like a weighted average so that I could see how there's a little confusion there yeah yes yes I see yeah we're waiting waiting the loss of them went into the head someone's asking about why you started with a learning rate of zero did you though I thought you had a very yeah so we actually we did start at zero let's go ahead and look at this plot so here's a picture of the learning rate so in epic zero the learning rate is zero an epic one maybe it's you know 0.002 and then so you can see here it's increasing we don't actually get to learning break point zero zero one into the fit at POC now again this is just trial and error you know what if you started with a high learning rate it actually worked it still works very good but you frequently see people do this when we're using transfer learning the reason is you've already downloaded a net from the internet and the weights already have values if we that the model can already recognize certain images so we're gonna train it now on graphene's and what happens is if you set the weight too high sometimes you just immediately erase all of its previous training because you hit you just you modify the losses how how fast you modify the weights so by doing this gradual increase we start from what it already knows about images and then we slowly accustom it to the new images and a lot of literature says that this in the end will make a more accurate model and then sometimes it does but again it's it's an experiment you should you should try different ways and see what works that's great to really keep doing more questions or you know if you need to go back to your the training I don't know how how it's coming along answer more questions will peak in out here yeah so already wow so look already our leaderboard scores up to point eight seven three that's nice we've gone 17 at Bach we'll let it go a couple more all right wait for one more question and then we'll and then we'll answer a couple questions so maybe there's a question about how did you efficiently experiment with kind of the resizing of the images in the pre-processing stage in terms of Imus I think it probably was referring to the size that you chose the dimensions you could choose any number right yes yet yeah you're right you can use any numbers they're given to you you know 137 by 236 so obviously you know you can just try the original image size don't even resize it so you always have two things going on when you choose the resize so I recently made a post how the resize it's not always bigger is better when you show your model a different size your model sees different things so you could be surprised you might use a smaller size and all of a sudden your models more accurate so it's a bit of experimentation but there's a second thing going on which is when you use bigger images it requires that your computer has more resources for example you know you need if you want to also have big batches you know you need to better fit all of those image that big that whole batch in your GPU Ram so not so you know there's two things you have to balance here you want to experiment with different sizes just play around and you'll be surprised what works and what doesn't but you also have to kind of adjust to your specific hardware you know if you have the advantage for instance so if you have multiple GPUs at home you can you know use that strategy wired together shoot you can you with four GPUs you could have images there are a thousand by a thousand and still be doing large batch sizes so you got a kind of balance you want to try different ones but you also got to sort of stay within what you can handle on your own country and along those lines you know people are asking about whether you always train using keggle notebooks versus your own environment and then kind of similar to that how to know how many epics it rained for right and this kind of ties back to the resources question yeah so I could say so I've now been competing at cattle for a year and a half and I could say for my entire first year which I you know I won a bunch of gold medals and Silver's I worked exclusively in cattle notebooks that's all I use so cattle notebooks they come with a they come with one Nvidia p100 so it has 16 gigs of ram so it's sort of an older GPU it's you know it's it's it's kind of limited its power but nonetheless you know using just that GPU I was able to do very well so recently I begun I've started working with NVIDIA and now I have access you know through the company I have access to many GPUs specifically I have access to the v100 so now I'm lucky that I could actually train with you know be 100 si even wire and a couple v1 hundreds together so this has allowed me to do a lot more than I previously done I can now experiment with big bigger image sizes I can train faster and also as we mentioned earlier I can do more experiments if you looked at my track record I started to work at a video a few months ago and in the last few months I've already won three gold medals so yes there's no denying the fact that if you have better resources you can do more experiments faster that's key to finding that accurate model and you can also it allows you to do some you know things like bigger images that you cut out it's good Chris all right well let's hop back in and see how the model the model is doing here so it's trained for 27 at box Wow our leaderboard score is already over 90% we're gonna go ahead and stop this I'm just gonna hit stop you'd find if you were on this notebook you know 150 a box you'll get this number actually up to 98% and no we're only training with 25% of the data at a very small image size if you were to use a hundred percent of the data and try you know 128 by 128 this model can get as high as 99 percent alright so the fifth area of experimentation is post process so this is an often overlooked area the reason that this is important is because frequently the competition metric which is in this case is Matt macro recall loss does not automatically optimize that and make that as high as it can be no loss is going to optimize recall so we're gonna optimize it by hand and if you know the definition of recall recall is the percentage of the true positives you've identified divided by the true positives that exist so you see you want to try to find the true positives and furthermore that fraction is increased most for the cases that for the graphing routes or consonants that are rare that there's only few cases to begin with so what we're going to do is we're going to transfer predictions from the common classes to the rare classes so let's go ahead and load our save model which has been trained on 28 box we're gonna go ahead and and predict on the validation set as we saw earlier that's has an 8 has a macro wreak average recall of 0.9 oh 8 we're now going to move predictions from the common classes to the rare classes and reevaluate Wow our leaderboard score just jumped up to 0.9 to nine guys this is a big secret post process you know I could tell you in many competitions you can actually take a public notebook and you can actually do a peep a post process which actually optimizes the competition metric and many times you can turn public notebooks in the silver medals so don't forget about the post process so let's review what we've done today so what we've done today is we've done one experiment and I told you that the overall strategy is to do many many experiments and we want to keep a record with our log file and you want to keep trying different things and try to get your model local validation as high as you can we mentioned the different areas that you can experiment with you can try different pre process in feature engineering different data augmentation external datasets different model architecture and loss different training schedule and optimizer and different post process so after you've done lots of lots of experiments and you found your best model that is when you then take you and you've been saving all your models so you can look at all of your logs and you just take then the final step and every competent the final step is you load in your saved model and you go ahead and you use it to predict the test set and we'll just quickly do that so we're loading in the test data we're gonna go ahead and predict the test set we're going to go ahead and apply our post process to the test set and then finally we write a CSV file and that's what and that's what we submit to Kaggle and then there's the excitement it's submitting and it's scoring and you're waiting and then Wow you see your leaderboard score so that's it I I hope I hope you I hope you enjoy the talk thank you everybody for attending thanks so much Chris that was amazing as always you are a master a grand master as it were and if you like I'm sure people would love for you to you know hop into the chat afterwards and take a look at any other questions that folks may have asked and you know feel free to address that kind of over over the time that remains and that'll also kind of get captured into the the recorded video at the end so folks will have those answers but yeah for now let's have you unless there are questions you want to address from from the lodge room now in your presentation we can have you drop out of the presentation mode and we'll wait Abhishek will be on with us shortly okay yeah sure while we're waiting if there's a UH oh okay having sex here welcome have a sec perfect timing well good jump jump with some questions but I'll just go ahead I'll just go ahead and jump in in the forum's and I'll just I'll just tell I have some answers and they and then I look at looking forward to your drag out a check all right correct let's let's have you yeah take over and go for it okay so yeah I saw Chris's talk quite amazing talk by Chris and gpus are awesome obviously but you know what's more awesome GPUs so in this workshop I will be showing you how to build a framework that you can use with almost all kinds of image classification problems and we will be using PI torch and teepee use since we are limited in time so I might rush a little bit from one place to another but if you have any questions feel free to ask me and we will be building a system like a framework in such a way that you can easily switch from GPUs to teepee use using Python and apply it to any kind of image classification problem so first 15 minutes of this workshop I will spend talking about the history of GPUs I'm just kidding my workshop involves a lot of coding so if you have access to a laptop or just fire up a cable kernel and just code with me if you want so let's start and now you can see my screen and this is the this vs code that I generally use for making YouTube videos so the first thing that so you can see two files data set an engine dot pipe and I have already created a a kernel which is empty and but there is another kernel which is not empty and that's melanoma detection with potage so you can use that Cardinal as a little bit of reference if you want but you don't have to have a sack so the first thing I'm going to interrupt you one moment to have you increase your resolution so that folks can see your code even more sharply no problem sorry about that I promised I would remind you and so here is it better now yes it is great okay great so you can take a look at this melon it a melanoma detection with potage it's a GPU kernel but we will be using GPUs and we will design a data loader and we will design some kind of engine for training and validation so let's see let's start you can see some of the code here I have imported a few things I have imported Python PIL to read the image I have imported torch Excel a so that's that's what you need when you're training your models on GPUs using PI torch and there is one more line load truncated image is true it's because some of the some of the times images are not truncated properly and then PIL won't read it so first step is to create a classification data set and in this one what what we will do we will create it in such a way that we can just reuse it for any kind of classification data image classification problem so we have an init function which takes image parts which takes targets and takes a resize and whether to resize or not and augmentations so it doesn't consist of anything else only these things and then I write self dot image parts there is a mid part so image paths is a list of strings paths to images and we have elbow targets with targets and that's your that's an MP array and then you have self dot resize to resize and that's a tuple or it can be none and self dot augmentations whose augmentations so augmentations here comes from library called albumin tations I hope I'm pronouncing it well if you have not taken a look at it do do that now it's a quite good library for augmentations and then I return length of the dataset and here it will just be return length of self dot image paths hey so and then the get item function so you have to be familiar with my torch a little bit here and I do item and then we can we can read the image here so what we will do is we have a few lines so I have already done it and just to save time here pasting it but I will be explaining it so we open the image using pil image so that takes your image paths and the given item index then you have to target which is numpy array so it extracts the targets then it checks if resize is not none so that means we are resizing the image so we take self-taught resize one and zero and resize the image so you might want to resize it to 64 cross 64 like Crested and then we have then we are converting the image to an MP array and then applying augmentations using albumin tations and you have the Augmented image here so we extract the Augmented image then we transpose it so channel comes first according to PI torch and then we return image and targets so now this is this is a data set class that we have implemented which can be used for any kind of image classification problem so even if you have label multi-class image classification problem you can just use this data set class and it will work fine now the next thing that we need to do when we are dealing with by Tosh models is to have a data loader so we will implement the data loader as a class here so I will I will say probably classification data loader okay and then we it will take some argument so we will have an init function so it has to have the same arguments as these so we won't be using the classification data set directly thus it will have some more arguments now what are those other arguments that will come a little bit later so I'm just going to take all this stuff from here and paste it here and we also have self dot data set so I can I can say new variable self dot dataset which is classification beta loader and inside this you have a lot of different things like image path targets ok so let's take this a and now image paths will be self dot image paths right so I will do it like this and you have targets you have resize and augmentations and all of them have this self dot in front of them so that's your init function for classification data loader but you have to design it in such a way that it works for CPU is for GPU and CPU also for CPU so I will write a function called fetch which herself and bat size it has number of workers has dropped last which is always false not always but sure by default its Falls and shuffle is true and it has dpu equal to false for now and that becomes your fetch function to fetch something should return the data loader and your data loader will be same ole dodge dodge u2's brought beta dot beta loader Hey and here you have self dot data set at size to self dot bat sighs sorry just bad size let me just change the name of this variable okay and then you have sampler which is by default none so we just keep it none for now and drop last the same as the argument and num workers same as the argument num workers the number of course you need to process the data so you have all this and then you return the data loader so we have not done anything interesting here you could have also done it in a simple button script so what makes it interesting is this sampler so when when you're using TP use you have to have a distributed data sampler which distributes the data on all the available codes of the TPU so I can say if TPU then I define my sampler a little bit differently so I can just do Tosh utils stated or this shape muted distributed sampler beep and inside that I have some some arguments so one of the argument is your data set and then you have an argument called num replicas so num replicas is the number of different codes of views that you have and it's usually eight so we can also take it from porch actually and it's xn dot xrt underscore world underscore size so we have imported this here so if it's available it's imported it's not available locally so that's why I keep it in try except block and then you have rank so rank is the current ordinal so xn dot get ordinal and then in the end you have shuffle this is just the same as shuffle and here now I can replace this with sampler so now you have now you have designed a classification data loader for image classification problem that just works for all of all kinds of image classification problems image scan classification data sets okay so now to use it it's it's quite simple so we will come to that but before that you also need something caller engine dot PI so you can see I already have some imports here so the new import in this case we have the parallel loader from torch Excel a you need to wrap your data loader inside parallel parallel loader for training on GPUs and then I have also added TQ diem so you need it you might don't even need it if it's your wish so I just use it to track the progress of training so now I can diviner engine class hey and inside that I have a train function so the train function takes a few arguments it takes data loader model itself it takes optimizer which can be Adam or STD or whatever you want to use it takes the device so that's CUDA or CPU or Excel a device and it takes a scheduler which can be none and then you have GPU will to true or false so putting it false yeah so once you have these things you can now write your training function which will use TPU when it's true so if you have to write a training function which doesn't use TPU it's quite simple and you have already done that quite a lot so you can just put the model in training mode and then you can you can say okay if I define TQ diem instance so cliquey diem then I have data loader and total here is length of the data loader so now I go over each sample and now I go over each sample so for bath index and be in enumerate is 0 and I transfer first step first thing I do is to transfer everything to CUDA device or any device which is there right so now if you if you look at how I have implemented the model in this kernel you will see that the forward function has two names two arguments image and targets and that's what we will be using the same names and you also see that when I implemented the classification data set class I have image and targets same names so what you can do is you don't have to like go through all the so you you don't have to go through all the data set items so you can just do something like this for Cavey in data let's call it data got items and here you can say data K equal to V dot to device so you don't have to do it for each and every item inside that's coming from your data loader object so you have that and and then then you can just do something like underscore so first thing that I do when I create a model like this in pi torch the first thing that's returned is the output and second thing is always the loss so loss is calculated inside the class so I can do underscore comma loss is model and star star beta so now I saved a lot of lines of code and this is something that you can just reuse anywhere you want and now what we have to do is we have to do loss dot backward and that's that's quite simple and optimizer dot step and if scheduler is not none then do scheduler dot step okay so this is like basically it you also to Tim miser but zero grad P so this is basically it and this works for GPU and when you're using GPUs it's a little bit different so the first thing that you have to do is wrap your data loader inside this parallel loader so we will do that first so if TP you just write upkeep you and if that's true then you wrap the data loader so to wrap the data loader it's quite simple so you can say something like parallel loader so I'm just using what's used very often and hey and I have data loader and I need the device on which it's going so when you're training on people use your training in multi processing way so if you have ever used job multi processing it's it's like that so it creates it creates a copy of all the functions and so next thing that you do is you take PK 0 which is same as your same as this one but there's a little difference so you have to say instead of data loader it's a little dot per device loader and here is your device ok yeah that's right I don't have linting install and this comes in else now hey so now we got we got everything you just have to remember that this is going to print a progress for all kinds of for from all the course so you will get eight of them but it's very easy to just have only one I will leave it as an exercise for you so now we have to do one more thing if the loss will be calculated across all the course individually so you want to combine them so you want to have some kind of function that reduces everything all the scores from different codes so I will just call it reduce FN and you have values and what it does it just returns sum of values and divided by length of values a so if if you then the loss is not the real loss so it's coming from all the course so then you want to use the reduce function so you can say reduced loss is then brought so there is a mesh reduced function that you can use and here you can say some name little loss reduced and when you have loss and radius function once you have all these things you are all set to train your model on GPUs but that's not it like you also need to write a evaluate function so I have written it in and put it in small kind of library that I use from time to time it's called well that's fantastic machine learning so it's nothing but just a wrapper like all the rapid libraries that you have and if you go inside there you will see inside data loader for image classification you have the classification data set and it consists of everything that we just wrote it consists a little bit more and if you go back to engine you will find the classification engine or the general engine that you can use so it has a not more stuff than what we just did but just to save time I cannot I cannot show you how to implement all of them so what it does is like we were here so if if you're using TP use and it also uses some kind of average meter so it stores all the losses and then in the end it returns the average loss so we have that and now we can we can try to convert our code that we wrote here to tip you so here we have we have the classification loader and we have the trained datasets we have the training data set loader same for validation and all so let me see now we have to change we don't have to change a lot we just have to import the stuff that we need and we just install this library so I'm doing it from there and you can also write it in the kernel but it's just going to make your code messy or you can use utility scripts so now this is this is our training code and training code takes a fold so before we start we we divide the data set into five different parts I'm using a stratified K fold here which is not advisable in this specific competition but you can you can probably use it and then you have the training function that takes a fold and then it splits the data into training in validation fold and then you have the device you have the number of a box bat size of training bat size of validation and then you load the model and send the model to the CUDA device and you have mean and standard deviation of the imagenet dataset so what we are using is we are using SES next so which is pre trained on imagenet data set and you have the albumen tations augmentations right so the first thing that you do here is to normalize the image using the mean and standard deviation value so these are not made up these come from imagenet and then you can apply all kinds of augmentations that you want for training data set and you need the same thing for validation data set so you have the normalization function that's going to exist all the time if you want some test time augmentations or validation time augmentations then you can probably include those augmentations here if you want and then we create we create lists of training images and training targets which is an empiric so and we have a list of validation images and validation targets which is validation targets is also an umpire a and then we create the data loaders right so there will be a very small change and you will see that how to convert when it converts to TP use but most of the most of the stuff is going to remain the same so you can use even when you're when you're creating this data loader which is like really generalized you can use it for any kind of classification problem and you can switch between GPUs and TP use so here is our kernel the chip you kernel now and here we have so this will be available on the same link right after the workshop so one thing that you need to do is you need to install the PI touch Excel a package so I'm using I'm always using the nightly version so you can just choose one of the stable so it's not stable anyway so they're developing it continuously this is they're making a lot of changes and then I install this package know that we just built and I have efficient net so here in this one I'm using a cirrus next but since dress was also using efficient net I thought of using efficient net and all the other computers in this competition are also using efficient net anyways and we have we have this base model which is efficient net which is pre trained if you shouldn't be zero retrained on image net and then you have this and we change the output the fully connected layer so we have the when 1280 input features and only one output feature and in the forward function we calculate the loss and this part remains the same where we create the fold so it's create in the same way and here we define the model so that this is very important that you don't define the model inside the training function you define it outside the training function because pi torch is going to patrasche xla is going to make copies of it and you will probably run out of memory if you define it inside the training function so I just define the model here and it's inside this multi processing model wrapper which is also from torch Excel a and then you have the training function which is more or less the same so what changes here is the classification loader so I've changed a little bit now you can also make the same change in the old GPU colonel so you have the classification data loader which takes the image parts which is a list and all the targets and it doesn't resize anything so we're using the original size we should not do that and then you just write dot fetch and then you have the training you have the arguments training batch size you have dropped last equal to true so when you're using pie charts with GPUs just remember to do drop last equal to true when the model is in train mode and you can also shuffle it if you want and cheap use true because we are training on GPUs and then we do the same thing for validation data set but you have the valuation bat size and you have shuffle as false you can you can make it true if you want so we are not we can you're not recording anything except the loss anyways and then everything else remains the same and here also everything is the same except the fact that now you have a new argument called use TPU you add that's true and then you are able to Train this model so I was I was doing some experiments so you can just ignore this part and whenever you whenever you train on GPUs you need to create a multi processing function wrapper and you need to spawn the processes so these two steps will remain the same for any kind of pi torch model which you want to train on GPUs one thing you need to remember that whenever you use a print it's going to print from every core of GPU and to avoid that you can just use the master node and instead of print you can just write xn dot master print and same same year that I've written and I tried this model and I think I'm running out of time so I won't be able to start training it now this step is going to take like a couple of minutes this step is going to take one minute and the model training is going to take some time but it's in the same setting so the setting was 32 back size now you can increase the back size by a lot because using GPUs so your model will become a lot faster so compared to the efficient net kernels that I have seen on this competition this thing is taking two minutes versus eight minutes so it's already four times faster so you can make faster experiments and with PI touch it's much easier because you can you can include any kind of augmentation you want you don't have to convert that data set to you don't have to convert the data set so you don't have to worry about that but you have to remember one thing it's going to create copies so it's nice if the image is stored somewhere and in this case the images are stored so there was a nice data set where this guy converted everything to 2 to 4 cross 2 to 4 images so the images are stored in a directory so that's nice so don't try to load it into memory because then your memory is going to crash one thing that you have to remember when using by Taj and GPUs is this memory thing and you should always keep that in mind and you should create your code in such a way that you don't you don't use a lot of memory because after after a few a box or and if you have a bigger model and after a few a box it's going to break and you can also use the garbage collector from Python so I use it after every epoch so use it like that don't don't use it in the data loader engine dot pi because then you then because it takes time and your training will become slower and you will think that you use are slow but they are not so I will share this notebook right after this workshop working version of this so this is the working version and if you have any questions I'm happy to answer I know I've been a little bit fast but yeah um the time was less so thank you very much thanks Abhishek that was amazing we have a couple of questions coming through on the live chat and in the Q&A link I'll just select a few I don't know random ish depending on what kind of stuff you want to talk about early on there were some questions around albumin tations versus PI torch transfer arms in terms of the difference between them I think just one is just an external library right there's different the difference between albumin tations versus the PI toward transforms okay but for me it's like I just find it easy to use one thing so just choose what you want to use and just use that so whatever you can do whatever you want to do in torch transformations you can you can also do it using alimentation so I'm just using alpha mutations because it's much easier for me and the code remains the same so if I have to do it and if I'm creating a training augmentation I just need to add the augmentations here and normalization and everything is also possible so it's it's it's it's a matter of choice short and then there is a question around kind of still around these augmentations right there saying there's a handful of libraries that perform augmentation or augmentation on the GPU for increased speed they give some examples like da Li Dali cornea and fast AI to with only two CPU cores on tagil TPU notebooks data pre-processing is often the limiting factor so is it possible to accelerate the augmentation step using GPUs interesting and interesting question I have not tried that but it's one one thing that's definitely possible is you can use the you can use all the eight cores individually so what I do instead of creating five folds I create a sport and train them in parallel so you can you can try to do that separately oh yeah yeah yeah and I've also made a public kernel on that so probably you can take a look perfect and there was a question about the XR t underscore world call what would be the reason for that to return one even though there are eight TPU workers available in the environment they say they've hard-coded this value and they don't use the function but would love to have it dynamically returned correctly okay I think it's X RT underscore world size and I think I'm pretty sure it returns oh yeah so this is something that I have to check ya later now that's great and then yeah I think some of the other questions people answered amongst themselves what else we got some note about normalization but I think they discuss that they want to sum one wonders what will you mean by transferring common classes to rare classes and how do you choose which ones to convert okay this is a different yeah question yeah it goes back pretty early yeah so it depends so if you if you have real class how do you want to handle it it all depends on that fair enough yeah that's just like one technique to do it yeah yeah all right that was awesome and thank you so much to to you and of course also to Chris for everybody watching for joining us for this kind of accelerator Power Hour just absolutely amazing and if you haven't already Kaggle has a short kind of survey to better understand kind of what you want out of these workshops and so for those of us who are still on please head on over to that link well we'll have them post the link again so that you can click on it and answer those questions in the survey let them know what you want from these types of sessions and and then they can make sure that that happens you know so let yeah I'll switch over our view to the two of us here and yeah so so again thanks so much for for joining me Abhishek and and thanks to Chris and QH so I guess with that we'll leave it there yeah there Cagle put up the link for for the survey so I definitely fill that out and so yeah Abhishek will post a notebook please enjoy and yeah and Abhishek has the his YouTube channel you can check out the book that he's writing always always busy the latch I've really loved your live coding you know were there typos definitely but did the live chat catch them they definitely did so always there they're watching their wife closely it happens so yeah when you're coding something that takes one hour in the air but yeah thank you very much thank you very much for the invitations and I really enjoyed it and I hope others the to and Chris's talk was great as usual awesome alright so we'll we'll call it there thanks everybody for attending and until next time be safe be well and have a great rest of the week you
Info
Channel: Adventures in the Cloud
Views: 13,845
Rating: 4.9768338 out of 5
Keywords:
Id: DEuvGh4ZwaY
Channel Id: undefined
Length: 69min 40sec (4180 seconds)
Published: Thu Jun 25 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.