Train a Small Language Model for Disease Symptoms | Step-by-Step Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to AI anytime channel in this video we are going to train a very simple and small language model on our data set so we'll take a data set from hugging face it is related to healthcare it's about digit symptoms and we leverage gpt2 which is an open-source uh language model I'll not say it's a it's not a large language model it's a very simple language model uh that's we going to leverage and on our data set and we'll try to have a new model okay in the end of this video that will basically execute the task that we want to achieve so for example if you have disase symptoms I give the name of the disase and it should generate me the relevant TXS which is associated with that disease you know so for example you know if I write kidney failur it should tell me that you know you have urination problem or you have obesity and so on and so forth right so those kind of uh task now in 2024 we're going to look at uh the rise in small language model uh because people have started talking about small smaller language models and the Microsoft has released Five Five series of uh language models you know and the there are Orca and lot of other models which have been called as smaller language models and now you know we have been living in a hype of generative AI so whatever happens in the community people I started talking about it so we'll try to see that how we can do some something small and this is a beginner friendly video if you already know that how to leverage gpt2 to create a new model which will generate some tokens please don't watch this video we'll be wasting your time so if you don't know how this works and how you can leverage these models on your data sets to create a model less than for example 500 MB of size to you know perform a specific task so how can we do that and that's the agenda of this video so let's jump in and start you know building this so if you look at here I am on uh hugging face repository of models and you know I'm right now on distal gpt2 okay so gpt2 again created by open AI when you know it was not uh again it's again a nonprofit but that time they were completely looking at the open source thing uh and that's that's where they created gpt2 and now you can see it says sought for uh distilled gpd2 is an English language model uh pre-trained with the supervision of the smallest version of Genera pre-train Transformer okay gpd2 like gpd2 distal gpd2 can be used to generate text and that's what we want we like to see if it can learns the patterns from our data or it can learns that how to generate text that's the agenda of this video now and I will tell you that what kind of huge cases you can probably work with and I feel that's that's a good way of doing it because once I was working on this video I realized that I'm going to use this in some of my ug cases I will tell you in the later part of the video now it's developed by hugging face and that's why I'm going to use this model uh and Transformer Apache 2.0 you can use it in uh for making money as well it's commercially available uh you can see English language model pre-trained blah blah blah 124 million parameter so only 124 million parameter version of gpt2 okay and that's what we're going to do uh which has oh by the way distal gp2 which has 82 million parameters we developed using uh knowledge regation design to be faster and you can find out the details here now uh you can see a lot of spaces over here but I'll show you that we're going to build a cool thing here I am very optimistic about it and I know you will love that and you can see here I'm going to write the code which is on collab you know simple GPU T4 uh now if you come over here this is the data set I'm going to use it's by qn and D or somebody you know the the Creator the credit goes to him uh it's not my data set it's by somebody I like the data set I like the idea and that's why I'm creating this video but the credit goes to him he is the creator of this video guys so please you know star or uh star this repo or something like that okay now I'll show you we have to do a bit of pre-processing on this data set we'll do that we'll go a bit of traditional machine learning way in this video uh you have to understand language model is like fascinating right I mean not language model large language models of the world okay uh language model is not new okay we have been seeing models which generate tokens you know we have been seen this for like years now okay the Marco chain and a lot of other models or the algorithms of the simulation techniques where people used to you know generate tokens now a simplest language model can be an engram model okay and then n can be anything 1 two three whatever okay now Byram unigram or whatever models now if you if you consider an NR model the next tokens prediction in that sequence is going to be conditioned as n minus one so previous now for example I like to now this is a simple sentence that can be anything right when you training it I like to eat I like to draw so that draw it Etc will depend on the previous token if you are using an you know a Byram or unigram models or any other models like so it depends on the previous tokens where the formula is n minus one by the way now depends what is your n so it's simple the simplest language model can be an unique Byram model for example so don't always see that only Lama 2 or uh gp4 is the language models that you know people have been using it but that's a part right let's uh let's write some code here so what I'm going to do first is going to install few libraries that will help us uh perform this now of course we need torch which is which comes by default in collab nowadays I'm going to have torch by the way they also have Transformers now installed by default but I'm going to install this because I'm not sure if you're using collab or you know any other GPU rentals or if you're doing it locally so let's do that so I'm going to have sentence piece which is a dependency of Transformer sentence piece which helps you with the tokens uh sentence piece pandas I need DQ DM last time I did a mistake some one of my uh subscriber told me that I was not able to uh correct it correctly when I was writing tqdm but I hopefully this time I wrote it correctly data sets Okay now these are the things that we need uh for the installation now once we install that what I'm going to do is uh have data sets first thing first because we're going to need the data sets from hugging face so from data sets import low data set which is correct uh I don't need glue etc for now okay low data sets and then data set dict dictionary of data set okay you can see here data set dict and then I need data set itself okay so this is from data sets now next thing is import pandas as PD and then let's get very quickly here import HD import data sets and from TQ DM for the progress Etc import tqdm and import time this should do now once I get this the first thing that I'm going to have here is the load data thing so let's load the data so how do we load data let's call it data sample and data sample equals I'm going to use the low data set here and for that we need this we need to copy this so let's copy copy it here and once we copy I'm going to paste here okay now this is the way you can load any data set if you have a data locally you can also do that you know you can also def if you have a CSV file you can also load that from local you know I maybe I can give the command in the description box okay if you don't find anything just go to hugging phas documentation you will get it uh data sample now let's see how what are the information we are getting here data sample you can see it's a data set dictionary uh features we have four columns for example code name symptoms treatments and the number of rows 400 which is too less and that's why you know I once we train it for more AO or even generally you know like it it will start overfitting because you know you have really small data here 400 I will say nothing because once you train it 80% will go to training 20% will go so 80 rows will again go to validation of course so you have around 320 of training which is too less now I will recommend that you should have around you know around see there is no thumb rule like that but you should have good quality 2,000 rows okay 2,000 rows of your data if you are training a kind of question answer or any kind of instruction based or whatever you should have at least 2,000 plus to do that now let me just add here uh few things now this is done we have to convert this to a pandas data frame you can see right now it says data set dictionary so we have to convert this into a data frame so we can use that later so we need pandas here so let's uh do that so what I'm going to do is updated updated data equals and then again inside a list I'm going to have lot of key value payers so you have name so the name here will go with item of name item of name you can see its name here so item of name and once you do that item of name we going to have for symptoms as well so let's do it for symptoms and I only need name and symptom so that's why I'm going to only have two columns I don't need code or treatments because treatments probably this model will not perform that good uh because we're going to have maximum sequence length I will explain that later but this video will help you work with other similar videos like for example lamini which is around 770 or some 700 million parameters there's a model then you have FL T5 by Google which I recommend using it so those kind of models you can use it for at least a bit longer sequence I will explain that in a bit now you have your symptoms here so let's do that so symptoms uh this is how you go symtoms and in symptoms I write I term and then here symptoms so let's do that symtoms okay now I'll come out of this key value pairer for item in and then is go your data name which is your data sample data sample and then you just write train okay now you have now let's have DF so DF is the variable name of my data frame now pd. data frame updated data and then I'm just going to do DF do head five or if you just write head by default gives you five now if I look at this data frame and I will tell you the use case that I working on which is very interesting by the way now you have your name and then you have symptoms there can be lot of huge cases around it I'll will tell you couple of huge cases now if you look at uh uh Turner syndrome or some syndrome I don't know why I choose Turner syndrome which is on Rose three if you look at sort stature gadle dice Genesis Webb the neck or something like that right now the name and the symptoms now example if you open this websites like prao okay practo or any other Healthcare Medical website or a medicine website if you give the name it starts giving you symptoms so there is symptoms inside that page on the on the page if you look for a a medicine or or digit name it starts giving you different symptoms so it has basically tax within it okay now and this will help you a lot of other things as well if you're are building a caching technique a caching mechanism if you're having a caching mechanism you want to cache all of this in a database you can start creating taxs out of it of all of the names and you start saving it in a database so the next time when people hit the system or hit the API it will not go to a large language model or an API that you are paying it you can just Fitch it from a database so that's one of the use case where you can create different tags for a disase you know that will help you with the symptoms that's one of the use cases you can do it you can also do a reverse here you can take symptoms and start giving the disease name okay using this so it depends what kind of use cases you want to do but for this if you look at here the symptoms basically the symptoms are having one or two words of course that can so basically one token means four characters that's like how the standard is in natural language there's nothing that you know it has been uh like written some that okay uh you have to save on tokens mean four characters but this is how we say it now for this video I'll keep word and token as same that's what I'm thinking if you look at here the symptoms the symptoms are having before if you look at the comma one of the symptoms have at least one or two words or Max of three or four words not more than that that helps you with this smaller models okay because if you have a longer sequence the smaller models will not help because that's context window is not there or if I don't don't talk about the context window just imagine you cannot feed all of this to generate a new token it does not accommodate all of those in that layers or the system whatever you call it now that's why these are some of the limitations with the smaller models but these are two small by the way when I say when Microsoft say a small language model it's going to be more than 1 billion okay but when I am saying a small language model it's around 300 400 m million so please be clear on that what you are doing with uh your terminologies when it comes to small language model now these are some of the use cases that we're going to look at uh in this video so let's see if we can achieve that now the DF is so let's just extract the symptoms so I'm just going to do is just extract or let's just write it if we need that later just extract the symptoms and for that I hope you can see me a DF symptoms and in the DF symptoms and by the way generative AI feature code completions and suggestions has been integrated within collab and that's why it's suggesting me that uh okay DF Sy terms uh do apply is correct I'm going to use Lambda here so let me just do that uh apply ah it was showing me Lambda uh Lambda and X so DF uh Lambda and then X and then X and then what I'm going to do is then you have that if you look at that you have a comma okay so I'm going to have comma here and inside comma I'm going to have join x. split and the same thing perfect I think there will be okay I this will not be here join x. split perfect now this looks nice apply Lambda X join xplit let's see that okay now okay this is fine so we don't have to do uh anything else here now let's import this Transformer Library so for that what I'm going to do is I'm going to have from uh Transformers from Transformers import gpt2 tokenizer so we need gpt2 model and GPT to tokenizer so gpt2 tokenizer and I'm going to use the head of that model you can see language model head model LM head model so let's do that and I need torch thingy here so import torch for CA and all and I'm going to have torch. neural networks to torch NN as NN because we need that for optimizers and a lot of other things I'll I'll show you now loss function by the way import torch dot uh Optimizer so that's optim I'm going to use uh optim here so op where is that okay optim as optim and this looks good and the next thing is we need to create a data set here that we can fit to uh this model so this is right import data loader data set and then random split so let's get a random split thingy here this is wrong okay now let's have the Nvidia GPU kind of thing I don't know because if you probably will not be using collab you know if you are on different system this will help if torch dot uh could Cuda is available then let's bind the device to Cuda so it uses Cuda torch. device and here I'm going to have Cuda here so let's do that CA else device equals torch. device if you are on if you are using you know Apple laptop or apple silicon chip or whatever uh you can that then we have to use MPS so let me also write that because some of you might have this questions or you asked this question to me in in your comment box that I'm using Mac laptop or I'm using a Mac system this has happened with C Transformers and lot of other libraries to use llms or inference llm where you have metal and all so you can use MPS there now or let's put this in try cache because okay let me just do that so else and you put this in try not try cach by the way we we are using Python and I'm not using Java but I except exception device T perfectly fine now this is okay okay cool uh let's run this now if you're on CPU you can do that as well uh this is fine now let's just do a device here I I hope it should show me uh CA and stuff now we have we are using a collab here that's why collab Pro here that's why it's showing if we are on GPU we are using T4 you don't need a high Ram but you know I have that so I'm just using it uh next is tokenizer so tokenizer equals gpt2 tokenizer so let's get that from pre-train I'm not going to use gp2 here so let's use distal gp2 by hugging phase so let's copy come here and just paste so let me just paste you can see I have pasted here now also let's get model so model gpt2 yeah perfectly fine and then just bind it to device so it's uses your uh GPU and you can see it's getting 353 or something MB and it will just be done okay you can see it's done now we have our and let me just print the model here it gives you the model pipeline okay you can show you some convolutional layer etc etc you can see uh embeddings Dropout gp2 block layer gp2 attention C attention projections attention Dropout res will drop out so on and so forth right then you can go through and understand a bit in detail that's a model p line the way that they have you trained uh digital gpd2 now uh we need some models parameters uh if you are familiar with deep learning uh earlier if you have worked with CNNN RNN of the world you would have seen these parameters now which which which are really fascinating to see nowadays because we are working with working with this high level classes nowadays we not going this route now uh batch size so let's have a batch size I have 400 let's keep it eight a smaller batch size I'm going to have a batch size here okay let me just see the data sets here once so DF describe which is a method so let me just print it out you can see we have 400 okay there are some duplicates probably unique 392 we should have removed that but it's fine uh let me add some sales and you can see top one is psych what is thissy IA or I don't know how to pronunciate that but you it's something related to I know it's something related to your knobs or the pain in leg or something if I'm not wrong swelling pain dry mouth whatever uh frequency and all of the thing fine okay uh let's now this is the important part we're going to have a data set preparation here so data set prep okay uh in this data set we're going to have a class so let's create a class and I'm going to call this language data set now I'm creating a class here because this will help you and this piece of code is not mine by the way the data set I have of course I have taken it from the internet but this is really good because this will help you you know at lot of other places and you can see it says uh DF tokenizer uh max length I think let's go with that I'm not going to have max length here for now let's keep so we have DF and tokenizer so self. uh DF let me just remove that I'll explain everything so let me just remove that and if you look at here the first thing that we're going to do is we're going to have the labels that stores the column names of of the data frame so let's do that so self. lels and in the labels I'm going to have DF dot columns that's the first thing now the next thing is self. data and in the data DF we're going to use to dict it should be a method yeah to dict and inside that I'm going to give or yes correct records fine now the next is tokenizer so self. tokenizer which is correct tokenizer then let's have self Dot and here I'm going to use a we're going to use a function here that's called fittest maximum length and I will explain that which is important for when you are dealing with the smaller LMS or the language model which are Smalls let's first write it uh fittest uh max length and it's a function where I'm going to pass my data frame okay so let's pass it here you have to explicitly Define it here of course we're going to write that function below now self do fittest max length and then of course this is right self max length x now we are done with our first function here guys now fittest Ma I will explain fittest max length once I complete this piece of code this is right uh this is length that Returns the total number of samples by the way uh in the data set now this is done then what next is we need also a function of get item so Define correct self and the index and this is not right by the way okay so let me just complete it x equal self do data and then idx I don't know idx and then self do excuse me self. labels correct and then same goes for Y self. data idx let me just remove all this piece of code here so now what we have done here s. data ID ex s. labels and then we need to take TT I don't need it yeah to something [Music] text F which is correct ah which is correct and then the pipe and then Y correct now this is my text and I'm going to have tokens so tokens equals tokens equals self. tokenizer and I'm going to use an incode function here not incode let's say use incode plus would be a function yes incode plus want to pass text return tensor which is again a torch uh now it's not showing me so bad return tensor so return 10 s we could torch and then max length and here I'm going to define a ma max length of 120 or something like that max length 128 which is like uh pretty much recommended uh for this and then I'm going to have a padding so padding max length and then I'm going to have truncation equals true yes correct and now just return the token that's it uh we'll deal with that we'll deal with this later okay uh let me see once and now I'm going to write fittest max length you can see it's it's a just fittest max length now in the fittest max length this is right now fittest Max L basically computes the maximum sequence length needed for padding okay this is why we writing this function it basically calculates the maximum length of the sequence in the data frame and find the smallest power of the two larger than the maximum length basically returns this computed computed length of the sequence that we have now how should we do that uh let's have a variable max spint max length equals I'm going to use max okay which is a built-in function which is correct and then length and again Max uh self key length this looks good to me uh okay uh self labels key length and then length Max DF key length return but no not return max length x equal two and then this is this seems correct while x max length and then x equals X and this is correct okay return x h let's run this and see now let's cast the data frame so what I'm going to do is again data sample uh this is capital data sample equals I'm going to have my language data set class I'm going to pass DF and tokenizer so let's do that and now let's see data sample once it would show me some object Etc yeah correct now we need to create train and validation so let's do that so I'm going to have train size so train size equals and here I'm going to pass 80% so 0 8 and length of data sample and then validation size length data sample minus strain size and the same goes with uh train data and so we have to use random split by the way so train data sets and let's keep it data only train data and Val data and then random split and then I'm passing the data sample train size and validation SI this looks good okay okay now we have to make the iterator so let's make the iterators now so what I'm going to do is train loader so let's do train loader equals uh data loader where you going to pass your this is correct I don't know what say suggested Cod data loader train data and batch size equals batch size suff equals true and I'm not going to do a suffling in validation data doesn't make a lot of sense so let's remove suffle equals from validation set just the valid data and then batch size equals batch size that's that's what I'm going to do so let's do Val blah blah blah this looks good now let's set the number of epox so num epox equals let's keep eight as num epox I'm going to do it for eight epox probably we have 300 something8 will be too much it will start overfitting after four or five I feel five should let's keep eight let's keep eight and we'll see and after that uh let's we have a batch size hopefully so let me make it small small batch size and then model name so let's uh uh digal gpd2 and then GPU equals we have one GPU I'm just going to pass zero here so this looks good now the next is our data is ready our params are ready the next is to have our loss function and Optimizer ready so for that uh Criterion this is how we write correct ah uh this is Criterion whatever okay uh and then NN do I'm going to use cross entropy laws which is basically which basically used for classification by the way that basically know classifies or assign the probability for the next Target tokens which combination of soft Max functions and the you know negative log likelihood LW the combo of functions so I'm going to use cross entropy loss let me first uh do here I want to define a few things and I'm going to do ignore I'll show you ignore index and inside that I'm going to tokenizer do pad token ID this this is this helps you uh the computation of loss function basically it helps you there you know ignores the padding tokens during the loss calculation basically it makes it bit faster okay also and helps to prevent from affecting affecting the loss computation that's where it helps and that's why we are using it now Optimizer okay optim do adom so I'm going to use Adam here Adam Optimizer basically it adapts the learning rate by the way uh that's it model. parameters and 2 e minus 5 let's keep 5 e minus uh 4 uh Criterion Optimizer and then end off stop token so tokenizer dot pad token and then it goes your tokenizer do EOS token okay now uh this looks good cross entropy ignore index tokenizer token ID Optimizer equals optim do Adam model. parameters blah blah blah learning rate 5p minus 4 tokenizer pad token this is what I have in my note so this looks good uh okay okay now uh let me just add few more sales here okay and here I'm going to have uh h so let's do excuse me okay okay my monitor got turned off uh tokenizer do pad token and now let's have our results in a data frame that that helps you track better so results equals pd. data frame we create a data frame here so data frame and inside the data frame columns let's define some columns okay let's have Epoch and let me just do a lot of things here okay uh Transformer and after Transformer let's have the batch size and then have the GPU and then have the training loss and then have the validation loss let's do that these are good practices to track everything Epoch duration second let's do that here what what happens it says okay I put somewhere dot okay now results are there now for let's have a training Loop so let's write training Loop and inside that for Epoch in range and inside that I'm going to have number of apox of course this is right number a box and inside number a box I'm going to have my start time equals time. time which is correct and this is how it can also help you calculate your carbon footprint by the way guys that's a good idea that's a good way of Now tracking your uh training that helps you create a differentiator you can start putting it in your pitch deex uh which number of ax by the way uh let me just do it here ah okay is it not number fine okay now mod do train mod do train and uh Epoch training let's have an Epoch training loss equals zero to start with Epoch training and then train iterator so let's have an iterator train iterator equals I'm going to have tqdm and inside this I'm going to have my train loader and after that a description correct let's have a description it was showing me something training aoch let me just do aoch + one aoch + one and then number aoch excuse me number APO number of APO and then you're going to have your batch size okay so let's have your batch size and inside that batch size I'm going to have uh your batch size which is again we made it small earlier it was like all caps and then I have my Transformer which is your model Name by the way that's a Transformer we're going to using Transformer and then that goes your model name thingy so model name and that's it this looks good now after that what I'm going to have is uh let me just hit in oh okay I don't know where this this came from I don't need it for now now Optimizer zero grad okay let's remove Optimizer Z grad for now okay of which let's keep it let's keep it uh let's keep it this fine Optimizer or zero grad and then yes I'm going to have an input so inputs equals and that will take your input ID is to uh not divide that squeeze first to one so squeeze okay yes correct squeeze. to device and then uh your outputs so outputs equals and in the outputs what I'm going to have is model dot uh not model dot outputs equals or uh we missing something inputs output will follow later the next thing the targets correct it's showing me right targets equals just clone the input that's it uh we can just clone correct inputs. clone that's it now or output then we go model and inside that model I'm going to have input Ides so input idees [Music] equals um inputs and and then my labels equals Targets this is my labels uh Targets this looks good now then loss equals output loss which is correct loss. backward Optimizer do step and then up this we are good Optimizer do step and then what I'm going to do is have a set post fix yes it was showing right so train iterator do setor postfix and inside that let's have your training loss and then it goes loss. item which is right now train iterator do set fix then Epoch training loss which is right Epoch training loss and then Epoch training loss and then average Epoch training loss which is Epoch training loss length of H why should it be train loader this should be train iterator because we have train iterator do set post fakes correct train iterator now this gives you a now we go go into the validation so we are done with our training step now in validation model. eval yes correct Epoch validation losss so Epoch validation lws let me just do Epoch validation loss equals zero and after we want to have total loss and inside total loss we going to have zero to start with and then valid iterator okay so that's a validation iterator and in valid iterator I'm going to have tqdm and then have Val loader where we have describing validation Epoch Epoch plus one and number Epoch we don't need all of this badge and transformer for validation again it's a reputation of same now this looks okay uh basically you can ignore this if you don't want this veros of course you can ignore this part of code this will help you track each and everything uh you'll be able to understand how it's happening now with torch do nograd okay so let's just do that no gradient by the way we tou that no gradient and insert this for batch in valid iterator valid iterator and inside this I'm going to have my inputs uh batch input id. squee one to uh device and then Target inputs columns then outputs again same thing outputs equals model input ID equals inputs labels equals Target and then you have loss equals outputs do loss and then excuse me I'm going to have total loss here so total loss now total loss becomes no not becomes same thing so it's better to write the follow the same way we have followed uh above total loss equals just loss total loss now valid iterator do set post fix validation loss loss. item this is right now aoch apoc validation loss or the same thing right let's keep it same plus equals loss. item which is correct now we are okay with this guys okay and then average Epoch validation loss equals EPO validation loss by length of validation iterator okay yeah now we're going to save that so let's basically let's end time now so end time and end time equals I'm going to have time do time or that will be h just a second fine now end time so end time equals time. time and then your Epoch duration second equals and then it goes end time minus start time that's basically calculates the duration in seconds okay EPO now we have a new row in our data frame so in new row we can have all of those things okay from Transformer so Transformer basically it's a model name so let's do model [Music] name uh by the way let me just do one thing here and let's start writing here that basically looks good Transformer equals more model underscore name and then you have your batch size batch size GPU GPU and then let me just also put Epoch here because we have Epoch so Epoch equal Epoch 1 Epoch + one and then you have training loss which is average Epoch training loss and then you have your validation loss which is average Epoch validation loss and then you have your Epoch duration second which is basically your Epoch duration second okay now you complete and then uh okay now your new uh uh let's have the results dot I'm going to use lock so result length results Neo this is right uh Neo this is correct what else let's have a print thingy here so print F Epoch ah excuse me EPO and EPO and then here it should be of course EP + one this is nice and validation loss why I'm writing like this in here now validation loss equals or not equals basically it's a key value PA total excuse me total loss divided by length of the Val loader okay now now let's run this guys and see what we are getting here it will take a bit of time but now let's run and see that fine H generalizing well if you look at for two EPO it has generalized well now I told you right because we have around 300 uh training and then 80 320 by the way and then 80 of them and now you can see it starts overfitting after three AP so with the with that data you should not do it for eight apox because now you can see it start overfitting so it's now memorizing your uh data it's basically not generalizing now it's memorizing a bit but now it's we'll see the training loss uh H training loss has gone down of course because now it has you have eight aox I've been going into for loops and finding the patterns uh learning it but by the way that's not a good way so eight apox we ran and have a validation loss of around 0.9 we started with 0.7 so it starts overfitting after fourth or something so you should train it for at least four or five for this data at least if you want toit this data now what I'm going to have is let's try it out okay so let's see input string equals and I'm going to have something called let's do the same kid KY failure I told you in the beginning okay kidney failure uh let's do that and then first we need the input IDs so input IDs equals I'm going to encode with tokenizer that's what we're going to do right tokenizer incode and I'm going to pass my input string and return tensor and that two device let's do that and if you print input IDs this is how it looks like by the way if you want to track it down you can see here it's the IDS of your tokens okay that you have encode if you just do the decode it will give you the text bag okay device CA you can see it over here now for that what I'm going to do is output equals and then here model. generate you know this is where you know we're going to use model generate it's going to generate some tokens guys uh model. generate let's have our input IDs max length let's keep not 128 let's keep 24 now we just take it out and then I don't need that these are all inference params so for that I'm going to have number of return sequence and we'll keep a very small so uh number return why is not suggesting me it's so bad number of return sequence sequences equals 1 what the hell number of return sequences one and then do samples true top k 15 no too much a top P which is correct uh temperature let's keep 0.5 not being too creative and then have a penalty as well so let's have a reputation penalty if it gives you uh let's keep it h too high 1.2 okay uh fine and then uh let's remove it for now we can look at this later model. generate input ID max length 20 number return sequences one do sample through top K 8 let's try it out it says oh we should have use that uh the uh EOS token and P but it's fine uh uh let's now decoded output we have to decode that so decoded output now let me just show you here once I'll just add a say I'll say output and you can see you you got some tens SCE now you have to decode that tour to you get the text back you encoded you got some IDs now you have some IDs you have to decode back now decoded output equals tokenizer do decode uh let's use a decode thingy here and we're going to pass output es skip special tokens to which is correct and you can see we got what we wanted to have you can see save your sword Thro difficulty swelling nausea or vitting which is wrong by the way but we are able to generate the tokens okay now as the model has started overfitting u maybe you make it less EPO or see how we can tweak the hyperparameters and train it again but let's try it out with something else now kidney failure was night nice what else we can think of I don't know what else we have but let's trite out something like you know uh pneumonia do we spell like this eing and with ual presence of P or urine in the infected area this is again not the right response but it's able to generate the response okay this is this is important but again wrong so it doesn't make sense if it's generating wrong responses so we have to tweak the hyper parameter but we'll keep on trying let's see if it's able to get it from even train data we had something called Turner syndromes or something okay let's try out so it's a it able to memorize it so it's now the model has memorized your data and it's able to generate tokens for that particular uh uh training set now if you have your larger training set and maybe with this model itself of course after tweaking the hyper parameter it might performs even better but this this is the right response let's try it out something else uh but let me see what kind of data it has okay it has let me uh try something with depression if it has something for depression or something like that correct so now you see the responses are coming because this this has seen the model has seen in the training steps and that's why it's generating that but you can see performing good uh let me so uh let me first save the model okay uh you can just do torch. Save By the way and torch. save you can also push it to hugging face rep if you want to create let's call it a small let's call it I don't know what to call small Med LM let's call small small digits LM or something like that okay and then you ah okay and then you save that with PT so I'm saving that small digits LM or I should have saved that you can see it will probably will be saving here you can see it has saved here but let me save in my drive so torch. save model and uh drive because I'm using Google collab so I can save it on my Google Drive and I'll show you that how I'm going to use this in my upcoming video the model want to create an app to create tax out of it and that TX can be used in casting or those kind of mechanisms okay uh small digits LM and let's save that it says Parent Directory blah blah blah Drive oh I didn't mount it okay I have to mount this so let's mount my drive meanwhile I will open my drive here quickly so let me just open my drive but that was a very interesting exercise guys I hope you learn something uh in this video how you can take a model and generate some Tex tokens for a specific ug case which is very small uh in the size and you can see we have we can generate up to 128 by the way the max length that here uh let me just see okay it has mounted probably perfect so small digits LM it has been saved and you can see it has been saved here as well now I'll just do a refresh I'll come back it's name small digits LM okay let me just do a small Digis LM and search and you can see that it I have my model here let me just download the model start downloading the model it says 318 MB so now on the 300 around size 300 MB size you have a language model that can execute a specific task task which is very uh domain specific where you have your data you are TR leveraging a model to create a new model that can you know execute those kind of task or you can implement the use cases on top of it and that's what we achieved here you can see it's performing I not say the best model of course not not even I'll say that good but this can be that's this can become better after tweaking a bit and taking a slightly different approach with the hyper parameters as well and of course improving the data which is not that good at this moment but yeah one of the use case let's let me try it out something as uh here like mental disorder now if you have mental disorder or something let see what it gives no it's something to TI okay so it has got this wrong yeah but fine because the model has overfitted and of you should not deploy this model in production but the whole agenda of this video was to give you a way that how you can take a model on a smaller data set a small model and create something that helps you generate some tokens that's that's what the agenda now let's see once it downloads the model you can see it has downloaded the model around 319 MB now you can use this model of course the better version of it in some kind of application to perform a specific task right now you can use this in Medical Healthcare financial services for example you know based on the age of person you can just start recommending you know different plans insurance plans or products Etc right with a very small model yeah that's all so the video uh I think this is what I wanted to achieve in this video guys and the code will be available on my GitHub repository and uh please go and find the notebook there and improve it further let me know what do you think of it and if you have any thought and feedbacks please let me know uh in the comment box uh and if you haven't subscribed the channel yet please do subscribe the channel like the video if uh if you like the content I'm creating and if this content as well and share the video and Channel with your friends and to peer thank you so much for watching see you in the next one
Info
Channel: AI Anytime
Views: 9,282
Rating: undefined out of 5
Keywords: ai anytime, AI Anytime, generative ai, gen ai, LLM, RAG, AI chatbot, chatbots, python, openai, tech, coding, machine learning, ML, NLP, deep learning, computer vision, chatgpt, gemini, google, meta ai, langchain, llama index, vector database, gpt, gpt2, small language model, slm, SLM, hyena, mamba, stripedhyena, RMKV, train your own llm from scratch, fine tune llm, train your own LLM, train LLM, train small language model, phi-2, phi2, orca, orca llm, microsoft, medical llm, medpalm, medical, doctors
Id: 1ILVm4IeNY8
Channel Id: undefined
Length: 57min 6sec (3426 seconds)
Published: Tue Dec 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.