Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-1)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone and welcome again in this episode which is a very special episode again and it doesn't mean that I have deviated from my original series which was about applying machine learning and the different episodes we were having building the ML framework so there are many new episodes lined up and they will be live shortly in in this week in the following weeks so what we are going to do in this episode is pretty interesting and I always wanted to do that it's to code for ongoing cattle competition and we have seen I've seen a lot of YouTube videos and tutorials and they always focus on some playground competition playground data set but none of them have ever touched an ongoing cattle competition so this competition is quite interesting and it also has like month left so you can get a pretty good rank if you follow this video and obviously you have to make your own modifications so you have your own secret sauce so let's start with this one and the competition that we are going to do today is Bengali AI hand written graphene classification so what's graphene graphene is the smallest unit in a written language so bengali has 49 letters and 11 wobbles and 38 consonants but there are also 18 accents or diacritics diacritics sorry so this results in more than 13 thousand different graphing variations so I'm not an expert in this so oh all that I have read is from this description here so what we are going to do is the we won't go into too much details of what it is looks like an image classification problem so there is an image and you classify in two different classes and it looks like like from from this picture that we have it looks like there are 168 classes for one type 11 classes for another type and seven classes for the third type so let's look at the data that we have so we see a number of files there's some barkay files train image data park a test image Zeta power K there is a train dot CSV file so in train dot CSV what we have is we have ID for an image we have the grapheme root wobble and consonant and you can see like it has different classes for grapheme and different classes for wobble different classes for consonant so looking at this if you have seen my previous videos it looks like what kind of problem multi-label classification problem so it is a multi-label classification problem and then you have class map so class map is nothing but ok it says ok class label 0 is this component class level 1 is this component for grapheme if you scroll down keep scrolling down keep scrolling down keep scrolling down yeah after 167 so it starts from 0 to 1 6 7 so 168 total classes for grapheme and then you have from 0 to 10 for wobble different types of wall and 0 to 6 that is 7 classes or consonant so clearly a multi-label classification problem so let's see so now to start with this problem as I've always mentioned previously in my previous videos and everywhere else like what you want to do is first build a good crossing cross validation system so you need to build some kind of folds because if you don't have good crosses cross validation system you don't know how good your model is and you cannot just throw spaghetti at the wall and hope to stick so you see there are some more details so the details are like each part a files contains tens of thousands of 137 cross 236 of 137 its height into 36 to wit grayscale images and yeah you have the class map okay so starting with this we what we are going to do is first we are going to set up our workspace and as usual you see it's vs code code server so what I'm going to do is let's see what do we have so we don't have anything we don't have anything I'm just gonna create some folders so I'm going to create a folder called input so that's for your input data and I'm gonna create another folder okay I'm not gonna create it here which I'm going to create another folder called source or SRC and we need we need some more stuff we need a gate ignore so get ignore will ignore all the files that I don't want to push to get so some files can be really huge and then I'm gonna go to my terminal and what I'm going to do here is I'm gonna go to my original repo and copy the gate ignore so I have the gate ignore here so I'm removing some part a file CSB numpy so I've shown in my previous videos how to do how to create this gate ignore files so you can take a look there there's a lot more information that I don't need like in weddings and all cat moves info I've just kept it here for now and then another thing that you need to do is let's see what we have now so we have input and source file so I'll go to input and okay in the data for this competition you have this command to download everything using cattle API so if you're not familiar with cattle API I recommend you to take a look it's a quite easy and you can do everything income online so here run this command hopefully everything is set up properly and it's going to download the data so now it's now loading the data you see it's a quite huge and in the meantime we can start writing some code we don't need to wait for this to be over we can just start writing some code so let's write let's write some code and the first thing that we are going to do is to create folds so we are going to see how we can create the folds and we have already seen that this is a multi-layer classification problem so what that means is each and every sample has a hit error in this case three different labels so you can have multiple models so you can have one model for grapheme one model for wobble one model for consonant or you can have a single model that predicts all these three classes now I'm I'm not a expert in computer vision so if I make some mistakes let me know I will start from writing the create folds so to write create folds we see that all our I mean our data to trained or CSV file which contains all the labels it's a CSV file obviously and just to okay now you can fully see everything in this terminal okay so we have to read this easily so we need something and that something is pandas both Fonda's as P T and then we will also we will import so we have to do a multi-level classification so for multi-label classification there is a very good library that's called like to rat and the stratification strata stressed stratifiers hopefully I'm fine okay yes here is so this one pretty good library so it's for multi-label classification and it has a class multi-level stratified capable so that's like stratified ke phool from cyclone but for multi-label so we are going to use that let's see how this works so from iiters strat animal stratifiers and all multi-label stratified k food and we have that and then we write Maine Maine the school the school Maine and you read the CSV PD dot read CSV dot dot slash and put slash train dot CSV so this reads the CSV file into a panda's data frame and you can also like do a small print just to make sure it's reading it is obviously reading and then what we do is we create one more column so we create one column called K fold and fill it with minus one for now it's also a good idea to do some kind of shuffling of data set so I'm just going to shuffle the pandas dataframe so I'm going to say D F dot sample frac equal to one that's a fraction what fraction of rows you want to sample and reset index you drop close to true so when you when you do this randomization of rows that changes your indexes we don't want to do that so we reset the index so now it has an image ID column so I'm going to say the F dot image underscore ID dot values is my X and my Y is DF now here's the interesting part instead of one you have multiple columns graph theme underscore root you have a vowel underscore critic and consonant underscore critic and we are going to convert to numpy array right so now you initialize a multi-level stratified k-fold class so multi-label ratify it k fold is this one and then define the number of splits so n splits equal to 5 let's say we are doing 5 folds now the next step is to go through this iterator I'm ascared and get the training and validation indices so for fold comma training underscore form of validation the score in enumerate and then you have MS KF m sk f dot split X comma Y we got that so I'm just gonna do a print train so what's my training in this is gr n underscore comma eval so watch my validation in this is for every fall it's just going to print that it's like a simple check sanity check D F dot lock not even a sanity check I just want to see what's happening and then I'm going to fold K fall for K fooled with the fold for validation indices so that's very important not for training indices or validation indices and then you can do brandy after a food no gear s key full dot value counts just to check and then you can transfer it to a CSV so I'm just gonna put it back in my input folder chain underscore full stop CSV I'm gonna say indexes false I don't need the index column and this way we have created create foals that creates a false for us and here in the terminal you see that we are also done with the download of the data so I'm just gonna unzip it it's also gonna take a while so it's unzipping all the data for me so now how this how these Barca files look like it's very interesting it's also a little bit weird so in this these Barca files what you have is a column called M hid and then you have multiple columns for the pixels image pixels so we can we can take a look at that so let's say I can just start a small terminal here - don't know okay so here it is so here is my small terminal or I can also do it using Jupiter notebook I think it's much better for this purpose so I'm gonna go here and start Jupiter let me let me fix that okay let me fix that later so I hope this is usable okay so now it's good enough okay so we have downloaded all the data so you see like we have all the data files here and now we can run our script that we have just created so I'm going to the source folder I'm gonna save item create false dot bye I hope it runs no it doesn't it throws an error so there is some kind of invalid syntax and where could it be okay I'm missing comma here okay so let's run it again data frame object has no attribute value obviously it's called values and run it again so you can see like it has printed data frame image ID graffiti wall consonant and the graphing itself and now it's printing a lot of values yeah which looks good and you see like every fold has 40,000 168 samples so pikeville's a little bit over 200,000 images so that's a lot of images and now [Music] what we can do is we can go to Jupiter so let me start to Peter so this is my Jupiter lab and I liked more than Jupiter notebooks because you have all the files and you can see everything so I'm just gonna create a folder called notebooks here and inside that I'm gonna create a notebook let's go yeah like this let's rename it to check data frame okay so we do import for those as speedy and we don't need anything else I think so that's good enough for us and we need to read this Park a file so we have input slash okay let's read any training one so let's see what what we get what does it called so input and this train image data Park a train image g20 or party okay okay so this doesn't read read perky we do not read perky so does it have any problems train image data it's here whoops let's see okay what is happening where am I okay I'm in the wrong folder that's weird as I move the file so looks like something is wrong with Jupiter ok let's 3 let's try to refresh that ok come here in cool let's just first check what am I still it's saying wrong folder so let me fix that quickly so I don't know why that happened but I restarted the I restarted the Jupiter lab and it's fine now so board pandas and has PD and then you have the data frame which is PD dot read Parque so to read the party file you also need package called PI arrow so you can do pip install PI arrow and other otherwise this command is going to tell you that you need to do that so input slash trained now I have it so I read this one so this is going to this is a huge file it's going to take some take up some RAM and some time so let's see how much time it takes Oh quite fast ok so I'll do a head on this and you can see like you have image ID column and all the pixel values so the image is 137 cross 236 137 into 236 and 3 2 3 3 2 so it starts from zero should go to 3 to 3 1 and yeah so that's what you have its huge file so I'm going to stop it and say ok restart kernel and clear all outputs I don't need anything in my memory at the moment so the next thing that we have to do now is to create another file I will call it to create image pickles dot pie so why I'm doing this because if you read from data frame if you read the images from your data frame it's going to be very slow all the time so I'm going to read it from image pickles once you create the pickles you can also upload it on GCP bucket and then use TPU trading you can also do it for data frame but the problem is it's going to be super slow so we are just going to create pickles for all images possible so there are 200,000 images we are going to create 200,000 image because so I import wanda's as PD and put numpy this for the sake of it let's see if we can you if you use it in porkchop lip okay do we need anything else so we need to know what are the files so import clobb so clock can list the files for us and we need to see the progress and put your kid from ticket IAM import ticketing mmm so we have all that now we can write the mean of the script so the first thing we list all the files so we can list all the files using glob glob and input slash so we need to list all the training files and we can use wildcards and glob dot Parque so it's going to list me all the training files as you can see in the terminal like these ones and then for each file in files now I'm here I'm going to add Tikaram files and total is line of files we are going to read this park a file PT dot read RK yes and we are going to create a image ID array image IDs is DF dot image ID dot values and we are now going to drop this image ID column so we can do te f dot drop image underscore ID and define the axis this is a column and now we can do one more thing call it image array so now your data frame doesn't have to immediately column it has only the pixel values so DF plot values so that's your full array we convert it to array because if you do the a to frame it's going to be super slow so now you can print a little bit of information here if you want what is the shape of the array but I'm not going to do that so I'm just gonna say for let's remove T curiam from here we don't need it so okay so just remove this part for J comma image ID now we add to cream because this part is going to take much longer so I want to know and you were eight image IDs and total here I always like to pass the total number of samples for cheeky DM so that's equal to my length of image IDs so we have that and now we can just do chop lip dot dump image array and you extract one image so extract all the pixels of that image and since I'm using Python 3.7 I can use F string so I will say dot slash input / image pickles / name will be image ID dot pickle be kale or whatever you want so we have this now so this is going to save all images image vectors in image because directory but we don't have the image because right at tree so let's create it image because okay so we have written this thing and we don't use numpy I'm gonna remove that and we just hope it works so let's see if it does work Viton create image pickles so we have created the false file we have the fools file now and now we are going to create these image pickle files so let's see if this works or not okay so this seems to be working and you can see it's super fast so probably gotta take like 30 35 seconds for one file and that's 40,000 images so saving all these images on disk so I'll have everything on disk so I'm not gonna show you because it's going to kill my computer but it's everything is going to be in image pickle files image because folder so let's wait for this to finish now okay so this has finished and you can see we have already spent 30 minutes on the video and all we have done till now is create folds and create the image because of files so things are going to move a little bit fast from here so please pay attention you see that my vs code is also complaining now so because of so many files it cannot keep track of them okay so the next thing now what we do is to think about the model itself so in the beginning I talked about how you can create three different models train three different models for each and every class or you can do one more thing you can create a single model and predict all these three different classes so doing a single model is much easier and it's going to consume much less time to train so that's what we are going to focus at focus on and the first thing that you will like to do is to create a data set class so where you get all items from your training data set so one thing that I forgot to mention that we will be using PI torch for this video so I create a dataset class and inside a file called the asset dot pi so now to create the dataset class what do we need so we need to make it a little bit general so that if if you plan to change the model at some point you can change it very easily using very few lines of food so I say okay my name of my data set class is Bing holiday dessert since its training leaders it I'm gonna call it Bengali dataset frame and then you need to define a init function so that has self obviously and it has folds so folds will be something like a false is a list of numbers fold 0 1 2 3 or just one number if it's a validation fold so you need image height image width and do we need anything else so for image problems we always need to normalize so mean and standard deviation so now you can here read the file read CSV now no more read parque input slash train underscore folds dot CSV great so we have done this to read the CSV dream for pandas okay and you don't need everything because there's just one more extra column we'll grapheme in it I think so I think it's called graphing so you don't need that so what you need is here image ID and then you need a graphing route will predict and consonant and you need cable column okay so you have all that now this data frame consists of all the information about targets of all the images that you have but for for a particular run you need only certain false so let's filter that so you can say I want data frame rows only where k-fold is in folds so you see that my fault was a list is between and has values from 0 to 5 and then I also reset the index just for the sake of it and drop it and what you have you have self dot image IDs your DF dot image ID dot values and then let's copy-paste this thing one two three anything else no and D F dot grapheme root it's crappy mood you have to predict this one consonant it's this one so I'm off converter every every column that I need to numpy array and I'm also gonna say self no this is wrong self dot image height let's go to height and probably we don't need that no we don't need that let's leave it okay so we have this one now what is your length of this data set how many samples do you have so define the Len cell and here you can just return the length of self dot image IDs okay cool so we got everything that we need till now and you need the most important function called get item so serve and item index which is a number from 0 to length of your data set and we say image is now we can just load a pickle file so image dot load I can use F string input slash image pickles now and then inside this I have self dot image IDs which item the item index dot pickle so this is going to give me one image and for this to work I also need to import top lip okay great so we got the image but the images the image is a vector so we need to reshape the image so image dot reshape to 137 and 236 dot has type float so we just convert it to float and 2d every because it's one one dimensional vector and 127 mm is already given for this problem all image is going to be of the same size so you can just hard code it the next thing that we will do is to convert this to a pil image so to convert to a pil image we need to import from pil import image great so we got that to image dot from array so it converts a given numpy array to an image image to a pil mhm sorry and i'm gonna convert this to RGB image so it's a grayscale image single channel RGB because most of the models that we are going to use I mean all the models that we have pre chained for from dot vision or free chain models library they all work on RGB and I don't wanna spend time making it work for single channels only what you can do that you can try doing that hopefully that works so now we have almost everything but we are missing something important so I'm going to import another library here and put limitations so that's your augmentation library very good very fast and I'm going to write a new self dot org variable inside the in it so we here we have you can do albumin tation stock compose and then you can define it's not compost compose and then you can define all kinds of augmentations you want so we do this and then we have to do it for training and validation set separately you don't want to have augmentations and validation set until unless have you run at several times an average do some kind of do do some kind of TTA so you have I can say if the length of folds variable is one it means I'm in the validation phase so I can do self dot or I can just copy this part I can do this here let's say this let me fix that okay and here I will have a resize so I already I have variables like image height and image with these arguments sorry so I can just supply image height you see image build and I'm gonna say always apply it's true okay we always resize the second thing that you have to do is to normalize the image okay so I'm gonna say mean STD and Max pixel value is already set to 255 I don't need to do that and always apply to true right so we got this one and now I'm gonna copy this part let's delete this if my length of volt is not equal to one it means it's a training it's training data set and for training data that we can do a little bit differently or we can just keep the same so just for the sake of doing something differently we can include some kind of augmentations I'm going to say okay I'll keep shift limit the same 0.0625 scale limit to one zero one and we don't want a lot of rotations it's characters rotate limit to five degrees let's say let me make it more readable okay so we have this one and this one and do we want anything else well we can have something like a probability when to apply this so I can say apply it with 0.9 probability so 90% of the times it's applying this shift scale rotate and , so we got that and now what we can do is apply the augmentations so it's very simple and easy easy so I'm gonna say self images self dot argh image remains equal to numpy array image and then I'm gonna take image from the output okay so this will apply augmentation on the image and it's going to check of its validation or if it's training I also need to import numpy okay and then I need to transpose the image to fit the torch vision models so I'm just gonna exchange the RGB channels to 0 1 and you can I can just this for sanity check and just convert it to a float so we got everything that we need from the skate item function and we can just now return it so return mah image will be torched our tensor so convert everything to tensor image and type is Tosh dot plot okay now you also need to return the target so grapheme root will be self dot grapheme root and that's also a numpy array okay we also need to convert this to tensor graphene root and you extract given item and D type for this will be torch dot along yeah that should be correct torch the tensor graphene root and D type taut short long and why I've done this part take a look at how Tosh models start revision model except expects the images you will know so we got grapheme root and then we have two others wobble and consonant so I'm just gonna replace her and consonant gonna replace here okay this is picture oh my I don't need what doesn't matter but so now we are done with the data set class we're reaching one hour not even started with the model yet and that worries me a little bit so now the next part is to verify a few data set classes working properly okay and to do that it's also quite simple and easy and we will go back to our Jupiter lab to do that just see the models there so after this we have to define what kind of model we want to use and we also have to define a training file and the last step would be to create an inference file so that shouldn't take long the last step training probably a little bit longer let's see so without wasting any time let's see what we can do with visualization so going back to Jupiter lab or create new or create a new notebook and Here I am for this so this is going to add my source directory to the pot and then I can just import from data set mac trot lab dot py not as PLC I need my clip to show the images do I need pandas no I don't need funders so I can just do that plot lab inline show them show the images here I can also import charge and the most important thing is to import the data set class thing only later set train class okay so I think that's just all I need and if not we will import later so data set is Bengali data set train and here you have folds so I'm gonna say my polls of 0 and 1 you will always have full training polls and one validation poll but in here I'm just saying 0 and 1 image height if you want to resize I don't want to resize so I'm just going to stick in the same numbers the original height width is 236 and we also need mean and standard deviation mean for each channel so this is pretty standard from ResNet zero point 4 5 6 sorry image net models 0.406 and then you have standard deviation which is I don't remember for 0.25 I don't remember these values I have just written them okay distant work argumentations doesn't have compose so we did something wrong spelling mistake it's a capital C so now I need to restart this kernel okay and let's try it again okay now it worked so if I see my line of data set eighty thousand images forty thousand one pool yeah okay and I'm using two folds so now I can just do idx zero one image and images data set and I D X so that will give you the dictionary and image and do I need anything else no I don't need anything else but just for sanity check what I can do is I can I can see the grapheme root and Volvo type critic and consonant critic and then I convert it to an MP mage I will say image dot numpy and then I can see the image that I'm showing now we have to transpose back and PM h comma 1 to 0 and we can now see this image let's see that work torch is not defined of course so we forgot the most important thing in cold torch okay going back okay let's run the steps again and again we have a problem lumpy is not defined okay that's not a big problem for complete SNP and [Music] here we go see the image so that's your image 137 236 and she can just change it and see a different one and this is giving you grapheme root wobble diacritic constant and diacritic and that's what you have to predict so now finally we can move to creation of our training script so we will we will create training and model a little bit in parallel so let's let's start with that now
Info
Channel: Abhishek Thakur
Views: 11,705
Rating: 4.9882007 out of 5
Keywords: machine learning, deep learning, artificial intelligence, kaggle, abhishek thakur, classification
Id: 8J5Q4mEzRtY
Channel Id: undefined
Length: 55min 7sec (3307 seconds)
Published: Sat Feb 22 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.