Custom Training Question Answer Model Using Transformer BERT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello all my name is krishnak and welcome to my youtube channel so guys today in this particular video we are going to do the custom training of question answer modeling okay and this we are going to do with the help of transformers i've already uploaded two videos on transformers one was how we can do the implementation and the other one we actually tried to do a text classification by using fine tuning of pre-trained model that basically means we used our own custom data set and we trained a text classification model uh using transformers hugging phase transformers uh today we will basically be doing custom training of question answering models uh and this also we are going to do with the help of hugging phase but understand for this we are going to use a separate library which is called a simple transformer okay now this simple transformer uh this library is based on transformer library by hugging face only so if you really want to quickly do some kind of custom training and try to evaluate that particular transformer you can definitely use this okay so this is the entire github page uh of the simple transformer amazing work by the entire team who has actually contributed into this because with very less number of lines of code you can use the hurricane phase library and you can actually do all this kind of things like sequence classification token question answering and all i'll be covering one by one as we go ahead but let's see today example and in today's example what i am actually going to do is that let's let me take one specific data set i have actually uh created a site kind of data set for you all and that particular data set looks like a data site with respect to custom question answering things so here you can see the data set like prediction test and train so in this first of all we need to understand how do we create suppose if i really want to do this custom training of question answer models you know how how do we first of all create a data set so in order to create a data set we already have some input structures that is actually defined when we are actually using the simple transformer so this two law these two links i'll actually give you i'll also give you the entire data set now suppose if you really want to create the data set your data set should have this kind of format see the first format and it's very clearly written for question answering task okay and this is the entire documentation page of simple transformer for question answering tasks the input data can be in a form of json file or in the form of python list of dictionaries in the correct format so it can either be in the json file inside a json file it should be in the form of list of dictionaries okay now what should be their elements inside that because in a question answer data set we should have a question we should have a answer right in our training and test data set similarly so let's see what all things they will be there so the first thing that you should really have a keyword which is called as context this context is basically having the entire information okay suppose over here you can see some example uh it is basically saying missed born is a series of epic fantasy novels written by american author brandon sanderson so this is the entire paragraph i can consider okay and then inside this there will be another field which is called as qas now this qas field is actually you can just consider that it may be a list of questions and answers you know so this qas is nothing but it is a list of question and answers the context is entire paragraph or text okay now inside this qas suppose i have only one question so first of all i have to provide an id which is just like a unique identifier for them and there are some more pattern parameters like is underscore impossible you know this in is underscore impossible if i go and see over here it is a boolean value it indicates whether the question can be answered correctly from the context or not if i make it as true that basically means yes the question can be answered if i make it as false it cannot be answered then the question is who is the author of mistborn series so here by this particular paragraph you know just by reading this particular paragraph you you have to tell the answer as brandon sanderson right so this is the entire thing so this is basically my question who is the author of missed one series and in the answer you'll be having multiple answer options you can provide in this particular way in list format so here the first answer you can see it's brandon sanderson then you have answer underscore start at 71 what is this 71 this when you count this particular elements right one by one when you count this character the answer basically starts from the 71st character so you just count it out you will be able to know it so this is one one entire information about one context and question okay similarly you may have many number of context and you can basically create a data set like this okay and within one qos you may have multiple questions also like this this is one question this is my another question like this you can also prepare so what we are going to do is that first of all we will try to create this particular data set and the data set needs to be in the similar format if you really want to use simple transformer with the help of simple transformer if you probably want to use the hugging library okay this is very much important guys you really need to understand so considering this what i have done is that i have created this three data set one is test train data set and prediction data set this prediction data set is just like a data set which will actually help us to evaluate so let's proceed and first of all i'm just going to upload the data set over here so let me quickly reload it let me change the runtime also and um okay this is gpu i'm just going to save it like this now let me do one thing let me upload the dataset so i'm just going to select this three files this is in the form of json format and now i'm just going to upload it it's okay you can also prepare this particular data set not a problem at all fine so we are done with the data set first and remember in all this particular data set you will be able to see i have used the same format see mr burn is a series with all the other other options what are the options inside this you know everything is basically mentioned here also you can see in the test json all the things are actually been mentioned over here and in the prediction json also you will be able to see the entire file properly mentioned okay and i can also expand this more details will also be trying to come okay so let's go and see this particular data set now now before going ahead first of all we need to install the simple transformers remember we are going to use this and this in turns uses hugging face library because here you will not be getting amazing details with respect to custom training of fine tuning right i think i just got some information with respect to training but for the other purpose it is very difficult but this really has made it very very simple simple transformer so we will go ahead and what we are going to do over here first of all i am just going to install this so pip install simple transformers until then i'll just hide this and let this particular step take place now here you can see that it has started and probably will tell me to reset the run time also i think it will tell me let's see the first time i think you have to do this okay please make sure that guys uh this is going to be amazing because now you will be able to do any kind of custom training for any kind of cast let it be question answer let it be conversational ai let it be uh text summarization let it be text classification anything that you want everything will be available with the help of this now i'll just restart the runtime it will tell me to do the restart for the first time so that you'll be able to use the newly installed version now let's go ahead and probably i will now execute this once again now you'll be able to see that it will be saying that okay it is already been installed requirement already satisfied perfect now go with the next step now in next step we have already uploaded three files prediction json test json distance ranges so we need to read it right now in order to read it it's very simple i'm just going to use import json with open r train.json as a read file i'm just going to read it so let's execute this and if i go and see my train you'll be able to see this entire context along with this what all the questions are there and probably i need to train my model with this okay so every information is basically there remember the format is in the same format that is basically written over here that is it is in the python list of dictionaries okay in the same format nothing i'm not changed similarly i'll be using the same code to read the test data set so this is my touch data okay i will basically evaluate my model based on this and prediction data set what i can do i can just copy and use it or if i want i can also read it okay now now the first thing uh let's see how we can basically implement it so for for this particular purpose we have to use two different libraries two different classes one is question answering model this will actually help us to train the model and this is another is question answering arguments now this argument is also very very much important guys because there will be able to set up many customized arguments now whenever we are trying to use this specific libraries guys we have lot of options with respect to models you know so there are a lot of models we can actually use let me just show you uh which all models you can actually use if you have if you go and see this right which all supported models it can support so this will basically be supporting all the hugging phase transformers model okay like this albert bert cambert distilbert electra long former mpnet mobile bird robot this all models it will be able to support okay so what i will do is that i'll just create a custom code so over here i'll specify first of all i'll import this okay so i'm just going to import these two libraries one is from simple transformer.question answering i'm going to import question answering model and then i'm going to import the question answering arguments okay so this is important now here i'm specifying that what is the model type so remember guys inside this question answer modeling if you see over here there are two parameters that goes uh if you see this documentation page one one is basically the model type the model type can be any of this that you have probably seen where did it go just a second okay i'll just show you okay so first of all first parameter is model type model type can be any of this okay and then second type is model name okay so model name is basically the type of model name that we'll be using okay one is the model type and one is the model name so what i have done over here based on this first of all if i give this particular model type the model name will be fixed i've just made it fixed and if you really want to find out this particular information also you'll be able to see this just a second i'll show you from where you can get the pre-trained model name so if you click over here this is my pre-trained model name see it is actually uh going to the hugging face library like bird base uncaged bird last on case so you can select any one of them so what i have written if the model type is bert i'm going to use first of all with cased you can also use uncased if i select the model type as robota then i am going to use this particular model name if the model type is distilled but i am going to use this so you can select any of this particular model over here my model type will be bert so based on this if condition it will be selecting the bird base case model name okay so this is very much simple so here i have actually selected my model type and model name my model type will basically be bert and model name will be birth based case so in short i'm going to use uh bert in order to do this training entire training of the custom training of the question answering modeling okay now after this whenever we select a specific model from the hugging face we also can set up some parameters okay now what all different kind of parameters you have or what are different kind of parameters you can play with okay that we are going to show and over here i'll show you two different techniques for configuring a model okay one is with the help of this particular library that you have installed that is question answer arguments okay because we are using question answer modeling right question answering model and for this we have to set up the question answer argument now if i go and click over here when i'm initializing okay these are all the parameters that you can play with you know they are all different different functionalities let me just show you some of the functionalities if i go down over here uh suppose you have maximum answer length so that will basically show you that how many maximum answer length you can have over here you can see one of the feature called as n underscore best underscore size okay and underscore best underscore this basically means that when your question answer model is giving the output it should at least give 20 outputs because here the 20 value is there suppose i asked a question what will be the answer for that particular specific question i may get 20 different answers max to max okay so for that particular value we said this okay so here you can see that you can also set up your now suppose for question answer argument i'm actually initializing it and then i'm going to use train batch size i can use 16 i can use 32 i can use 64. i also have parameters to set like model arch.evalu during training is equal to true i can do this and then i also have an underscore best underscore size which basically says that you have to give me max to max three answers and you can also set this as a number of training underscore epochs right so this all arguments later on when you initialize over here you will be putting it over here okay you can put it over here i know the name is model arts but i have used train underscores because i'm not going to use this because i just want to show you how you have to basically configure the model the other way is directly use some key value pairs with respect to all the parameters that you're selecting now in this particular case i have written reprocess input data true overwrite output directory true cache eval features true output directory is this so basically every training will happen all the weights will get saved inside this directory which is the best model directory the best model with the best accuracy will be saved over here okay so something like this evaluate draining time mask sequence less number of training hip hops here you can also set the number of training epochs so you can play with all the parameters over here okay you can play with all the parameters over here every parameters that you want to save you can save it over here now you may be thinking krish what is this van db project and band kwa rjs right guys remember if i really want to see the training visualization we have to use another library called which is called as van db okay i'll also talk about this particular library uh probably i think i've installed it somewhere let's see okay before this what i'll do i'll just install this particular library so i'll write pip install van db right so this is specifically used for some kind of visualization which i'm just going to show you so here you can see that requirement is already satisfied and then that is fine so let's consider this training arguments and this is an advanced methodology and now i uh invalid syntax training batch size okay what is this error let me see okay i stopped using comma over here now let me just execute it perfect now this is got executed now all we have to do use this question answer model okay and just give your model name your model type and your model name which we have actually initialized on the top and this arguments parameter will have this all the training arguments now you see over here guys none of the folder has been created right now when we this entire training will happen all the folders will get created like this folder the output directory folder the best model directory folder everything will get start getting created okay so what i'm going to do let's initialize so i'm going to initialize the model let me write the comment over here initialize the model okay so now the model will download the pre-trained model because here i've given the model type and model name and then it will consider all these arguments that i have specified how do you know this what your arguments are there just go and hover over you guys okay just go and over here all the arguments you'll be able to see and with this help of the arguments you can actually see it out and for this you check out in the documentation what all all the arguments will actually do but you can also do early training or they'll stop trailing everything you can actually do and probably i'll show you in the upcoming videos also in much more depth now uh this model has got downloaded here i don't see any output folders because if you really want to retrain it again so you can remove the output folders that will be created so i'm not going to execute this i'll keep it like this okay and now what i have to do i have to just say model dot train underscore model with my training data set and my eval data will be my data set so how many number of epochs it will run it will run for five epochs so let me just quickly execute this is just more like a sk learn model only guys only prepare the data set and start the training you know that simple this simple transformer has actually become okay and here i'm actually going to use bert see now epoch 0 you'll also be able to see the loss value and all this also becomes a square 2.0 if i if you don't know about square 2.0 i'll just make a video next video about that now see over here guys over here you can see something called as project project page okay if you execute over here here you'll be able to see the weights and bias see see the graph over here everything with respect to the graph everything will be shown evaluation loss incorrect how much is the correct similar training loss everything you'll be able to see with the bird case two three and based on every epochs you know this information will be put up and remember guys initially i i can see this page directly but for you you know you'll be getting some authentication so you have to create your account in one db if you don't have an account so you will not be able to see this specific page okay so let me just reload it now probably for every epoxy right now how many epochs has been done so okay five epochs has been completed right so here you will be able to see all the information bird base caged everything see training laws there's this this global step correct correct similar everything this this this kind of charts are available for you all right so so if you want to try different different things you can definitely try it out and you can see some graphs with respect to your training law same and here obviously the training loss is also decreasing the evaluation loss is so decreasing here you can see that uh some of the information there are since there are five epochs so five different values will be there two in the first epoch two incorrect one correct like this right uh something like this similar similar this is the train loss value now you can see it started for four 4.8 and now ended in 3.6 right so this obviously the training loss is decreasing i just did it for five epochs if you do it for 10 epochs probably you'll be able to get some amazing answers now let's evaluate this particular model on test data this will give us two values one is result and text let's see so what is the result i'm just going to make this and see what is my results oops it will give us an error because it is a result so obviously it is saying correct is equal to zero incorrect two that basically still we need to train okay uh because none of them have actually got correct in the test data set but let's see well let's see one example with respect to our tray because obviously guys i have just trained it for five five epochs let's let's do one thing let's train it for 10 epochs okay so in order to 10 i'll just make it 10 okay i'll execute this i'll initialize the model and before initializing uh yes i'll initialize the model okay and here you'll be able to see this output directory has got created but this is my best model everything right this is my meta model here you will be able to see that dot pt file is there scheduler dot pt all this particular information will be there right and if you try to open this you know eval results every details whatever performance metrics details we are getting will be able to see over here now uh what i'll do i what i have done i've just made it at 10 epochs let me first of all remove this output folder i will not require this output folder because i really need to start the training again so here i'll i'll remove this by using this particular command now if i execute this you'll be you'll not be seeing any output folders okay now i will just train it now this is going to run for 10 epochs okay so we are going to wait uh till this entire training will happen so i'll just close this again so here you can see project page run page so let me just quickly open this here you can see the loss will be decreasing with respect to the global step and every time every epoch you'll be able to see that your loss will be decreasing so it started for four point nine four point four three point nine four you know so as we go ahead it will still not decrease and probably will be able to get some good answers and here i've just taken two to three data sets you may be having a huge data set now here you can see that how training is actually happening right quickly so third epoch we are there fourth epoch like this obviously the data set is small so uh if you really want to try try big data set try to create that particular data set by your own and you know and for this yes you need to know a technique of manually creating that particular data center and i hope that particular data set is right you know and now you can see 3.095 is the epoch and obviously when you have some bigger data sets it will take time so again you can see running loss is now 2.59 sixth epoch and probably it will get over very soon now another four epochs are left okay um till then here are your result so here you can see with respect to the epochs uh this is becoming better in sixth epoch i'm having this loss this is also getting less training loss is also decreasing evaluation loss is also decreasing okay so 2.141 um i have this particular thing everything is perfect right train and test yeah so 8th epoch 2.147 so this just to show you an example guys you may be thinking krish why this correct value is not increasing understand guys i have just taken one data set if you play with a huge data set definitely it will work this just to give you an idea how to work on to it now here you will be able to see incorrect was one one one one remaining one was actually correct i had given two problems over there and you can see how the training loss is actually decreasing finally we were able to make it as 1.84 now if i go and execute this and if i go and see my text or text sorry text so here you'll be able to see that incorrect text correct text similar text where does the series takes place region called the final empire this was our data set right with respect to the touch data set just have a look on to that and if i go and see the result probably i don't think so since our data set is small obviously we are not getting that much but evaluation loss is that incorrect is one similar is two okay now let's let's test with this example okay uh i've taken one example where the context is when is a mistborn of great power and skill the question is what is min when's speciality okay so this is basically the question let's execute this and now we are going to predict on this particular two predict data set and see what is the answers so if i execute it the answer is bond okay so it is basically what is what is wind speciality vince is a mr missed born of great power and skill so it has taken this particular bond as the answer but again if we train it more and more definitely we'll be able to get some good accuracy okay and probably uh you need to trade you need to train for more epochs try to train it for 15 epochs and then you try to see the answer because as you could see that the loss is still decreasing right you can also add one more option called as early already early training stop right so there is also an option in this in the arguments you can use that otherwise just play it for 15 epochs 20 epochs and let's try to see this particular result and this wise here you can see that the output fold is created the birth the best model information is present inside this okay with respect to all the configuration so just try it by your side anyhow i'll be giving you this entire thing along with the data set uh and just let me know whether you like this particular tutorial again guys it takes a lot of difficulty to create a tutorial you know make the data set probably prepare everything for you so please make sure that you subscribe the channel press the bell notification icon i'll see you in the next video have a great day thank you bye-bye
Info
Channel: Krish Naik
Views: 7,809
Rating: 4.9400749 out of 5
Keywords: yt:cc=on, question answering model using bert, huggingface question answering tutorial, xlnet question answering github, fine tune bert for question answering pytorch, simple transformers question answering, huggingface fine tune bert, bert question answering github, bert for question answering huggingface, krish naik machine learning, krish naik deep learning
Id: 3XiJrn_8F9Q
Channel Id: undefined
Length: 25min 11sec (1511 seconds)
Published: Thu May 27 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.