MULTI-LABEL TEXT CLASSIFICATION USING 🤗 BERT AND PYTORCH

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hello and welcome so in this video i will be implementing a bird classification especially multi-label classification model using the board now we all know what point is but is basically a transformer encoders a stack of transformer encoders which extracts contextual as well as embeddings from our input sentence so it basically transformer transformer encoders okay so so if you want to deep dive into what our bird is basically so just go to this link so this guy has explained in a wonderful way what bird is and what transformers are so i'll provide the link in the description of this video just take a look as if you want to deep dive in the working and implementation of bird and transformers so yeah without wasting any time let's further dive in so in this video i will be implementing a multi-label classification okay so i have already uh uploaded the data to my google drive here train dot cs and test.csv you won't be using csv in this tutorial but i'll be explaining on this data only okay so we will split it into uh train and test okay so let's go we can now we should connect our google drive to our onto our collab notebook so for that we need these two statements okay from google.com okay so i'm just okay cool so now my drive was already mounted as i was implementing it earlier okay so now what we uh what we need to do is first we should check what graphics what gpu we got for our collab session so we got a tesla t4 okay i should just restart my runtime factory reset i'm trying and i'll just okay now i'll run it again so i'll just use my account on which i have uploaded my data set just copy the security code and paste it in just print it press enter and then your google drive is successfully mounted on your collab session okay now if you check what gpu we got again tesla t4 and zero used okay next step is to uh install the transformers library provided by hugging face transformers it is an absolutely amazing library if you want to uh implement bird on your own data set okay so i'll just write so this command will install transformers library provided by having face onto our collab notebook so i'll just clear my output okay so next step is to import necessary libraries will they find us numpy we will need our thoughts of this name because we are going to implement it in dodge shooter this we will be needing when we will uh save the checkpoint of our model okay just don't worry right now okay so um my this is my drive mounted on my collar notebook so here sorry not here and this is my i'll just copy my path and i'll just say train path next step is to load our data okay so trendy [Music] okay so that should load our data in our collab notebook okay so let's take a look at what we got okay we got title abstract and we got six labels that we need to classify okay so we have a title and we have an abstract title is basically like uh what does your abstract uh means right the basic title like the news headlines right so the more data is in abstract and only the title which is like uh 20 directness mainly maximum on we have title so uh we have three approaches so we could only use title and then train it against all these labels we can use abstract alone and then we can train it against these and our third approach would be to add these two and then train against that would be optimal solution because we because we want to train void right so it needs a huge amount of data so we should so in this video i will be joining this title and abstract into one column and then i will be training against all these six cool okay so i will be doing the same thing like uh i will be combining these two columns into one so for that i actually i'm gonna do this string df i'm gonna create a new column let's say context and we're just gonna okay i think yeah okay now as we as we have combined these two we don't need this column id column and these two column right because we already combine this in the context so i'm going to drop those columns okay so let's see first we have which columns okay we have these columns right so we're going to just uh okay train df train df and i'm going to drop actually i'm going to drop this column first draw a title abstract and id write yes labels yes um axis is one because rows are zeros and columns are one and in place is two that will drop our all these useless columns now i'll just rearrange the columns okay so that context comes first because right now context is at last it's just visually good see the context is last i just want to replace it with i just want to rearrange so that it comes in the beginning so for that i can do is train df and i'll just uh put my context first and then on the remaining uh yes all this i think that should work technology mathematics now if we okay good we got our data set ready now next thing is to create a target list right the target list is basically the number of uh labels we have right we have six so i'll just take those six okay um one two three four five six yes so [Music] okay that is for later use okay you can get to know how i will use this later in the implementation so now i'm just define some hyper parameters like maximum length green batch size validation batch size epochs and learning rate okay it's generally 512 in case of bird but i have less data so i'll keep it as two with physics also your hyper parameters are those parameters on which you can play right so you can choose your own settings i'll just choose the default ones for simplicity say okay next would be train but size oops screen let's say valid but okay it box i'm gonna train it on two for so that it shouldn't be long learning right oops this is this was mentioned on the hugging face the website you can check it out just go to login face so this is designed you can explore more okay just data set models and all it provides everything good so hyper parameters are ready so next step is to import the tokenizer okay so from transformers import word organizer and so the organizer provides a tokenizer which is which was developed for board and word model is basically the word model okay we will get a tokenizer from this one line of code dot from pre-trained i'm gonna use the uncased version word base right okay there are two versions but these and bird uncased in untaste it doesn't involve the casing of the letters right i'll just execute it so what tokenizer does is i'll show you an example for example my example test text sorry text is my name is my channel name is or do like and subscribe next we will get a dictionary okay importance is equal to token dot and code plus so this function requires a text and something off let's say our special tokens true match length is because we don't want exceeding characters after maximum drive so we will remove those that truncation means and return tension must pt means uh python might ask tensor so this will uh generate an encodings which will be a dictionary so i'll just show you that special tokens is basically that there are special tokens which are required for bird during the training and fine tuning process so these these tokens are basically the cls tokens and the padding token the separation token okay and also the mask mask is not used right now because it is only used using mass language modeling retraining process okay so basically these are our three main tokenizers special token sorry so yeah that's it okay so we will just execute the encoding so we got then coding as you can see it has got uh three dictionaries three keys actually input ids uh token type ids just forget about this attention mask and uh yeah that's it so input ids are basically just converting do to its index right okay and like to it correct to its corresponding index and subscribe my channel that is it so for example one zero one is for special token that is classification two under cls token that i told you about earlier so that is uh inserted at the beginning of every sentence one zero one token okay and then the tokenization takes place the uh conversion from word to index right so for example two is the index do is at the index two zero seven nine is similarly for this right okay one zero two is basically separation token okay whenever a sentence end the bird itself the tokenizer library adds a separation token okay well the attention was what is attention was it is basically uh the attention mask is one where we have words right where the um where there is not padding token right so there is one and after that it is all zeros because there is because those are padding tokens okay so that is what attention mask is wherever there is a word there will be a one otherwise zero that's it token type ids are basically the uh it calculates the difference between the between those the two sentences so for example if i had another example blah blah blah so this is my segment one and this is my segment two okay so it will be allotted as zero and this is loaded as one this is just for uh you know for word pre-processing it is all mentioned in the paper the research paper you can study about it okay so this is so far uh so far training we will be needing this all these three tensors attention masks token type ids and input ids for the training purpose okay so this was an example now i will go on to uh building a custom data set using the by dot data set and data loader library which is the standard method for creating the custom data sets okay so you should follow all this convention basically okay so next step is to create a dataset plus i'll name it as custom dataset utils.whatsapp so a constructor first obviously we will take rdf and uh tokenizer and maximum okay we will just create the sample bits dfsdf so that recognizer is organizer so what maxlen okay and also self dot title title is basically one record one sample right for example india zero sorry sorry not zero context of zero this this is our title right and targets are basically the all the outputs value this one only this one right okay and here i will use my target list that i mentioned above get the values so what this does is it basically gives us the uh all the array values right so if i do if i'll show you what does [Music] target list what so this is all the any values right right now we have just integers value it converts into a value that we need for our trading purpose okay so we got our targets and then we define two two functions one is length and one is get item for every data set class right so we will define those we'll just return the length next is get item so we get item with uh with the help of an index right for example train df like context of zero this index is what this is right so we get one example using this function which is used in a training process so we will get the title we will first convert it into string and then we will again convert it into string right this has been basically done to ensure that our title has nothing you know garbage valley or something that makes a makes it a problem while training so now we get our inputs we tokenize our input like i showed you earlier using the encoding and code plus method so let's have that organizer none is basically it requires it sometimes required two sentences two strings to actually complete sentence similarity okay that is another project but we don't need another the second example we only need one so add special tokens extra so competitions okay now we will just return these the uh inputs right it is a dictionary as i explained you earlier it consists of input ids token type ids and attention mask so i'll just return the dictionary dimension for example right now my inputs this dimension of this tensor will be one comma 512 the flatten converts it into 512 so that's what flapping does okay and it is required for a training purpose next is attention loss and the last key is from targets which will convert it into flow tensor oops tensor which is required for training purpose okay okay our data class is ready cool so this is basically used when we create our data set using a data loader using the watch library okay this is a standard approach we always use this kind of approach when solving uh classification problems using what okay so next part is i'll just uh i'll just split my data set into point eight and point two factions okay so frame size will be let's say 0.8 you can play with it now my friend i'll sample a fraction of 0.8 samples right so the track will be and random state will be let's say 200 and i'll reset the index so that the index won't get messed up after resampling the rows next for validation ds it's the same thing i do for navigation here okay so our data is ready to be created into a custom data set class so we'll do that custom data set [Music] okay cool we have created our train and validation classes of custom data set class now next thing is to create a data loader okay so frame data loader loader so it requires plain data set and i'll shuffle and patch spices brain patch size num workers zero num workers basically means to use all the gps which are available okay same thing i'll create for validation data loader let's say data loader and here when the data set oops because we don't want to shuffle the validation uh dataset and here okay so we have our uh training data loader and validation data loader ready to be fed into our model okay so far before that we will define that device which will be uh loading our tensors too which should be in our case the gpu version okay because we are using the gp version of a collab so for that we will say torch dot device [Music] watch that so if i print my device now so it should say we have device right qr right we have qra okay so next step is to create checkpoints so what will the checkpoint do is basically when we have a best accuracy it will save the checkpoint and and similarly a load checkpoint function which will load the that best load checkpoint okay so i'm going to create your checkpoint okay checkpoint f file path and model and optimizer sorry not for optimizer it's optimizer so now then i will uh basically load the checkpoint dictionary into the model so that it so that it loads the the last state of that model similarly for optimizer so it should be straight dick click not that the checkpoint basically uh it saves all this information like the state dict state data is basically the weights and biases of our model right and optimizer means the optimization state and now we will get the valid loss the minimum minimum valid loss right in the end we will just return the model optimizer checkpoint talk and valid minimum dot item item basically um provides us the inner value right for example we have this value point zero nine eight tensor and it is a tension so dot item so dot item will return only at this point right so we will have only point magnet good okay so this was our load checkpoint and similarly we will define a function which will save one checkpoint we'll save the state of our optimizer and our model and we will ensure it is the best one checkpoint checkpoint part and next we will just save the state oops in our this file path okay and if it is best we want to uh copy the file okay cool now we are done with the loading and saving of our torch model history and states right next is the interesting part now we will be building our birth class right the bot model so we will define a board class let's say and it will inherit our model module from torch so after inheriting this we should define a constructor method and a forward method okay so the constructor in construction we will have nothing and we'll just make it a shooter class now we will initialize the layers first layer will be our butt layer the birth body base and paste this parameter basically returns two outputs okay one is the sequence output and one is the polar output the sequence output is basically output of every word right for example place and subscribe so it will so there will be a tensor for this one this one this one and this one but in case of pool polar output there will be only an output one only only one output having dimension 1 comma 768 768 vectors there and this is for the classification token so that is the difference between the two outputs which will get after our both forward operation so next layer would be a dropout layer drop out so it will be torch dot actually command dot drop out let's have point and we have six classes right so there will be six 768 basically means uh the as i told you right now the output layer i have the dimension of 768 the bird output layer so that is why input will be 768 and output will be six classes right so next our inner class or sorry our constructor is done next function is our forward function that will take our input id is attention mass token type ids into the model and and we'll forward it and and return us the output the pooled output okay so for that we need input ids tension [Music] and this will take our next will be our dropout output right our output dropout will be our uh self dot dropout output so this becomes the input to this layer and the final output will be linear output linear output drop out and also make sure that this is our pooled outputs okay as i told you earlier it is a dictionary it contains two outputs that is when the sequence output and the cooler output so we want to cool our output so for that we need that cooler output okay so now i think we are done with the model we just return the output and we will create our model let's say bird plus and we will push it to the our gpu that is the syntax are pushing the tensors into our gpu let's run it okay because we have our class okay so the next step is to define a loss function so loss so i'll be using the bce with logics plus you can read more about the loss in a torch bcaa with logic's loss so you can read more about it here okay last function it will take the outputs every loss function takes the output and the ground truth and the targets so we will just return the bc with i think this is the correct with come on suggestions and here will be outputs targets okay params will be our model or parameters so we go to optimizer okay model dot once i've got an empty parameter list right and something is strong good so actually i have forgot an underscore yet so that is why i didn't execute it properly so now if we download the model now it is pushing our model to the gpu okay i'll just clear the output okay now we got our optimizer the next step is to create the training loop that is the main part of the implementation so in the training group we will require let's say it will require the training loader validation loader model optimizer checkpoint file path and best model path okay so so we will define the minimum validation loss tracker so basically it will track the validation loss okay so initially the value would should be maximum right so that is so that is how it will track the minimum values right so for that it will should be the maximum maximum value initially so for that we have np dot infinity value it is basically infinity value okay so we will iterate through the epochs let's say 40 clock in range i want to start from one because one looks good so we have our train launch which is zero train loss zero and validation loss zero initially and now we should put our model in train mode okay so now is the training happens okay the training will take place now so for batch index and batch in enumerate i'll just write index so that won't be confused training loader training cleaning loader okay so this batch is basically so training loader provides number of batch size number of examples first right like 32 so batch will be consisting of 32 examples converted into its corresponding input ids attention mass token type ids right so that is our batch index is basically 0 1 2 3 2 31 right because it is 0 to 31 right 32 examples so okay we will extract the input id like i just explained good ids this will be our batch of input ids and we will push it to our device so that all the calculation and computation take place on our gpu and we will also change the d type which is required to be long data right similarly the attention mask and lastly we have our targets because we are in training load right now we won't be needing targets when we are in the validation loop right because we just need to validate how good our model is performing in case of targets we should convert it into float finally we will have our output as model on this as input okay okay dot zero guard so that it won't hit the graph of touch torch you will all uh you can just go through a playlist for pytorch right just go to youtube and search for any playlist for a beginner's tutorial for pi dots all this is required for the training purpose this is a default uh this is a general configuration right like how we train a python so that is a general technique i'm writing here then we calculate the loss using our loss function we defined earlier it takes our output and tire gets right output some targets and then again we set the optimizer to zero and finally we calculate loss backwards the gradients and make the optimizer to optimize the loss now calculate the training loss okay so this was our training loop so this is on our training loop right so now we will get to our validation model validation loop sorry so this is all we have to do it for each epoch right so yes validation loop this is the training loop okay now for uh evaluation and for validation we will set the model into eval mode now this is basically done to so that it does not calculate anything in the white dots graph model the weights actually so the same thing will take place now we will just copy all this and we won't be needing so this should be indented and here it should be validation loader instead of training order okay okay and we won't be needing any targets here actually we will be because we need to calculate the last line so we'll be in the prediction we won't be needing any targets obviously okay so i have input ids mask token type targets output okay and we don't need the optimizer here we need the loss you don't need optimizer just need to calculate okay cool right and here will be validation loss instead of train loss [Music] okay cool okay so this was the uh validation loop all right so next we create a checkpoint dictionary in which in which we will be saving the best checkpoint the best model and i'll just save the checkpoint using the checkpoint dictionary the function i defined earlier is best this will be false okay cool now we will just return the end model okay cool now we have a training function ready now to train this model i'll set a checkpoint path let's say that's right i'll just mention the checkpoint in the function itself train model train model and here i will be saying let's defaults oops training loader main data loader directory best will be the root directory current checkpoint best.pt okay so now i'm not going to run this because i've already trained it on two apos so i'll just keep it in the background of training i just wanted to show you how how we can implement a multi-class multi-class classifier using board and hugging face transformers library so let's test it on on the model that we just downloaded using the bird right this one not the train one you can you can train it yourself in the meantime i don't have the time to train it right now because i'm recording so let's use the uh the previous model right this is the not trained one if we trained it it would be performing much better than this one okay just keep that in mind so let's take an example let's say example will be we have the test df above right so we'll be using the abstract okay so what is this example let's see okay we represent our understanding okay okay so we will be the next step is to tokenize this sentence right like we did earlier so for that we will be saying encodings i'll just copy paste this one right we'll just encode it same importance and what title we have example and maxline will be yeah that's true so this will return as a dictionary consisting of input ids attention mask and token type ids okay with the help of which we will be using that into our model the word model okay so but without before that we should put our model to eval mode before uh predicting right model.org this is the basic uh the basic code which is required every time right let's consider it as a template so we will have input ids good ideas i'll just copy all this from here okay i'll just so instead of batch you'll have encodings okay now we get the output from our modi whoops what the heck okay okay we got output okay then we will get the final output as we will apply the torch dot sigmoid uh layer on the output and convert it into the list because right now it will be having 768 vectors right but you want to convert it into a sigmoid layer okay okay [Music] okay so we got the is 5 4 5 7 5 6 so basically this is the highest right okay so uh we don't actually get so much information from this like which label is what so for that i'll do one thing let it train df dot columns okay so so we don't want the context right so we will go from one one index yeah now good we have six labels and i'm going to convert it to list okay cool okay uh also to get the maximum argument to get the maximum uh the index having the maximum value we'll just use a numpy r max function on the axis one right i think it should run to one yes because the second one is the 0.57 which was the highest one right k1 now if we just do this line like this physics okay cool okay i'll just copy this and you know okay here something is wrong okay i'll just undo anything okay we have print here okay this should be enough square braces and this should be int so that will lose one dimension okay and now we'll just it should work physics okay so this abstract was physics actually so cool we are finally implemented but classifier for multi-label text classification this was all for this tutorial and i hope you enjoyed it and if you did please do hit like and subscribe and please share it among your friends so and also keep in mind that i have not used the trained model right so that is why it was uh it was showing accuracy between 50 56 percent right but if you train it on at least 20 to 30 epochs then it will obviously provide you with more than 70 80 of accuracy so please do train it using the code which i will link down below in the description okay so yeah that's all thank you

Info

Channel: The Artificial Guy

Views: 2,137

Rating: 4.8769231 out of 5

Keywords: nlp, natural language processing, bert, transformers, text classification, deep learning, machine learning, huggingface transformers, multi label text classification

Id: f-86-HcYYi8

Channel Id: undefined

Length: 49min 54sec (2994 seconds)

Published: Wed Jun 23 2021