Finetune LLAMA2 on custom dataset efficiently with QLoRA | Detailed Explanation| LLM| Karndeep Singh

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello all and welcome to my YouTube channel so today in this particular video we are going to talk about how to fine tune llama2 model and also we are going to talk about what is the Laura and we are going to just understand why this Laura is required and we're also going to understand what is quantization and how it is helpful in fine-tuning the llama2 kind of models right so this is the this is the whole agenda that we're going to cover in this particular video so let's just get started the first thing uh before we start with this particular tutorial I just wanted to make an uh make you understand about the uh the fine tuning steps that we are going to progress in this particular video but before fine tuning uh we should also understand why the fine tuning is required uh in in such generative models so essentially let's start with some examples and let's say you're working in a medical domain and you want to extract some information or you want to generate some informations based out of your domain knowledge that you want to do track with the generative and and when once you start interacting with the particular model uh let's say you start prompting and you start giving the context and you start uh giving it to the model and then the model is unable to understand the domain that you're talking about so maybe the language that you're using for a medical domain would not be uh captured in the knowledge base of a particular model so hence it won't be able to give you best answers that you required for this specific domain and that's where the requirement of fine tuning comes into a play and by the way when you try to do prompting and and do and wanted to extract some information by using the prompting and when you're trying to play out with some prompting uh to make your model understandable about your particular data that particular information or that particular step is called in in context learning so in in context learning is generally uh is generally when you're trying to uh to our prompting steps you're trying to play with different prompts to extract better output from your model so that particular step is known as in contextual learning and you're trying to understand uh you're trying to extract the information from the knowledge base of a particular model so that is what uh in contextual learning means we are not learning anything but we are trying to play with some prompts to understand or to extract the knowledge base from the model to get the right result out of it right so this is the in contextual learning and the first step always it should be like you should take up the already existing model you should try to do an in contextual learning that is you are going to try out some prompts and play with some prompts and just understand uh like whether that particular model is suitable for you or not and once you get certain output and based on those certain output and based on those those certain prompts you decide that okay this model is unable to understand the language that I want to understand so maybe you can now uh think about fine-tuning the generative models so I think this is uh very clear for you so now let's understand uh uh why this fine tuning would be helpful for your case so in so whenever you see that the domain is uh is a challenge for a particular model so you can start finding that particular model uh to understand your own language of your domain so in the in this case when you see a medical domain so medical domain might have some technical uh names so that won't be that that majority of the models won't be able to cover that so we specifically need a model model that can be able to understand the language that you are working with so if if it is a medical domain then you should fine tune that particular model based on the medical domain right so that's the need of fine tuning a particular generative model to update the knowledge of a particular model okay so with that uh understanding we can move ahead to understand how the process of fine tuning will look like in this particular tutorial specifically uh because there are multiple steps are involved in achieving a good fine-tuned results uh when you see a chat activity kind of not so you want to get into that stage uh we might have to include multiple things uh when when we are when we are trying to say that when we are trying to find in a particular model but let's speak about this in particular our description okay so if you can see on the screen there's the overview of sfft and rhf so sft is nothing but supervised fine tuning method and rhf is nothing but reinforcement learning human feedback so these are the two steps that has been generally applied in in training the chat GPT models so first initial step of any of the model before even fine tuning uh we generally have to pre-tain the particular model on a specific domain so if you see that particular uh this particular image on this particular image if you see there is some pre-trained data that is available and then you take that preternator maybe it could be any of the internet data Maybe Wikipedia or any of your data that is unlocked internet you take that particular you try to put into some model and then train it based on that particular training you get a pre-trained model so now this preteen model is able to understand the language that you are trying to make the model uh understandable right so this print model could be uh like bird it could be a GPT model gpt1 gpt3 uh it could be flank model it could be uh any other generative models or any other uh natural language processing models so this pretend models uh and this this could be also be a llama model so it has first it has to be first strained on a language of pre-trained language so that it can understand the contextual information about it it can understand the words it can build build a embeddings around it and then that's what the knowledge base of a printed model comes in so this is the first stage and then the next stage comes in where we we are trying to fine tune it so once we once we have this pre-trained model we're trying to fine tune this on a certain domain so this is where let's say uh Lama 2 is available at the pre-tained model stage and then we are going to do a super What's Happening Here is in this supervised fine tuning step we are trying to take a domain data so let's say we are taking a medical domain data and then we are trying to fine tune it uh using a instructor data set so we are trying to make sure that we are trying to prepare this printing model to understand some commands or some instructions uh by by providing it right so we have to prepare the data in such format so essentially you might have done some kind of classifications or or regression problems over this kind of uh printing model might be you have taken the birth model and you have fine-tuned it to classify some objects right but here in this uh fine-tuning steps in generative steps we are trying to modify this uh training process by giving a restriction data so we construct a instruction data where it will have a instruction where it will have a context and also a response uh right so this is where the fine tuning step becomes different uh than the classification training or classification fine tuning so this process is very important to understand uh where we are trying to supervise fine tune based on the instruction data set so let's say we are going to pick up this llama2 and then we are going to take up a domain data or your specific domain data and then we're going to prepare that domain specific data into instruction instruction data and then we are going to fine tune that particular llama2 model and then we're going to take up an rlhf method where we're gonna take this fine tune model and do a human feedback format by using our reinforcement learning so that we can make sure that the model shouldn't hallucinate the output of a model shouldn't be toxic and there are other parameters that RL check method gonna help us to further fine tune uh the output of this find supervised fine-tuned model so this is where the whole step of uh of our generative model comes into play when you compare it with the chat CPT so this is the whole flow a chat Jeopardy model has followed and that's why we are able to see this particular steps and then again you can do a in contextual learning after that so you can start prompting it once this uh methods are done so in this particular tutorial we are just going to focus on supervised fine tuning method we are not going to go ahead after this supervised by Infinity method like we're not going to do a rhf but in this tutorial we are going to get an comprehensive knowledge on what is supervisor fine tuning and how we can construct a data set instruction data set and then how you can find in a llama tool right so this was a kind of an overview obviously there are multiple details available um I will go through the uh some details of it like what are the what are the important information that you need to uh need to consume before doing a surprise 90 right so let's go ahead with this particular thing so essentially you have to install some of the dependencies so I'm using uh I'm installing bits and bytes Transformer uh I'm also using pept so pepft is a parameter efficient fine tuning the library available from Transformers you can use it to to quantize your models to make your uh to make your model very very efficient while training so I will talk about it and Laura is one of the pept method that you can use it and then we can also use accelerate and then we are also going to use TRL Library which is very very uh very very good Library I should say for fine tuning uh your generative models so it has ah it has all the dependencies all the capabilities available to supervise fine tune and also perform a reinforcement learning uh fine tuning so this TRL library is very helpful you can you can take a look onto its GitHub page right so we are going to use all of these dependencies for this particular fine tuning right so right now uh these are the Imports that I'm doing and and I'm going to load the data set which has dialogue of a person conversations so that different persons work who are doing a conversation with with them so we are doing uh we are taking that particular data set with a dialogue and we are going to take a summary of that particular conversation that happened between the different persons so this is what we are trying to do in this particular tutorial we are going to take the dialogue of those persons and we are going to do a summary of it so we ask the model to generate a summary for the conversion that has happened between the uh two persons or three persons so this is the data set that we are going to use to summarize or fine tune the model to summarize the conversation right so this is what we are going to do so we are going to load this particular data from the hugging phase and then you can see there are some number of rows and number of columns available uh for from this particular data set now uh we're going to prepare uh this this particular data into instruct data set right I talked about it right so you have to prepare this data set into a data set so that you can train your uh generative model right so this is what I am going to do here so you can see uh I have to put this dialogues and summary in this comment where uh this is this becomes the summarize the following condition becomes your instructions and this particular dialogues becomes your uh kind of input and then uh this summary output which is the summary of this particular conversation becomes your uh as a response right or it becomes your summary so this is the data set this is the format that I'm going to prepare and to prepare this format I have prepared a function which is kind of called format instructions which which is something like this it takes in the instruction and then I give an instruction like summarize the following and then I also give an input tag and then below the input tag I give all the dialogues that has happened for that particular conversation and then also pass in the summary so that's how this function is helpful to prepare this the data and then uh also I am going to use a another function which is an um which in overlap for this particular function so it will prepare uh this particular dialog summary and and format this particular dialogue and summary into a certain certain extract data set that I have given here right and I will store it in the text Key so all the instructions so the all the instructor data set that I'm going to prepare will be stored in this text key so this particular function will return the dictionary output where it will have this dialog summary and the instruct data set prepared now we are going to process the data set by using this process the data set function it will just randomly staple the data set and also apply the generate instruction data set to each of the uh Row for this particular data right and we are going to also remove the ID and topic column which is not required for our use case so we're just getting rid of it so that's the function that I'm going to use so now we have to apply all of this steps so I am going to take up the data set and I'm going to apply all of this into the data set and then we are going to select some rows so we are going to take 500 rows from the train data set and we are going to take 50 50 samples from the test data set and validation so this is how the [Music] we have these three columns available dialog summary and text so for training we have 500 data and for testing and validation we have 50 data [Music] if you want to visualize this you can visualize this particular output of particular text how this looks like so once we have this data set prepared and we have the instant data set prepared so what we can do is we can start downloading the model so before we start downloading the model we have to focus on what type of model we are using so here you can see I am using plumber to 7 billion parameter model it is not chat model that I am using there is also a llama to chat 7 billion parameter model so in this use case we are not using chat model because we want to take up the base model and then we have to fine tune it on the topical model right so that's what that's why we are taking this 7 billion uh normal base 7 billion parameter model and you could also try with the other versions of gamma 2. I specifically chose the 7 billion parameter because it could be helpful to train here in this collab environment so I chose 7 billion parameter model and also I'm using bits and byte configuration method so it is helping me to load this particular 7 billion parameter model in in a quantized format so I'm using like a load in bit 4-bit format to controlize it so specifically it's gonna if you see the model it could be like 28 GB of model uh 7 billion parameter model this could be like some 28g model so if you load it in four bits of model so you can fit that particular old model into our 5 GB of RAM so that's why this bits and byte complications are helpful to reduce the size of the particular model uh from huge size from its base size to the uh to the quantize size which is five GB of one and then you can use this quantized weight to to fine tune your model right so obviously you're gonna get a little reduction in the accuracy because obviously you're trying to reduce the the floating points and you're trying to quantize it into certain number so that is a trade-off that we should accept when we're trying to fine tune a particular model with a limited resource and if you have a huge resource if you have a huge number of GPU runs the Rams then you can utilize the whole 7 billion parameter and it will take a longer time to train because you have to update all the parameters right so that's the need so because of that we are using uh bits and byte complications uh which is helpful to quantize the base model into a limited resource and we are trying to fit that particular model GPU and that's what we are trying to do here and then we are going to use Auto auto model for caution level because uh glamor2 is a generating model so it's a question ml model and then we are going to pass it into the model ID and then we are going to pass in the bits and byte configurations so that we load the model in quantize format and then that's how the model is getting loaded and we are also going to take up uh the con the tokenizer and then we're going to we're going to add the tokens on the right side and then also we are going to add the end sequence token to the uh to the particular tokenizer which is very important for this llama tool so that it stops generating whenever that particular end of movement is achieved right so these are the important things that you have to start with so remember we are not loading the whole uh or the bigger base model we are trying to take the base model and we are trying to quantize it and then we are going to use that quantized model uh quantize number to uh 7 billion parameter model uh for our fine tuning right so with that particular model uh I just tried to do a zero shot in fencing so this is where uh your in-contextual learning is coming to play I just tried it for the first uh only for one prompt uh you can try for you know another prompts uh but I just gave up uh I just gave a simple prop like summarize the following position and I pass the dialogue from the from the data set that we have and try to see uh try to see what is the output that we that we are seeing before fine tuning so this is the base model that we're going to see so since this is a base model uh which is not a chat model so this base model is kind of a generative model so it will just see the previous token and it will try to generate a new text out of it right so it's not uh it's not a chat model there where are you gonna give up instructions and then it will give you an answer based on that instruction if you have to get that particular thing then you have to use chat model but this is a very base model it doesn't have capability of understanding your instructions so hence uh this this particular output will be something like uh this if you see it's a it's almost uh it's almost the same uh output that you're getting as an input uh something like this it's trying to complete the compression that he had uh so if you see at the top this is the Baseline human summary this is the actual summary that is required uh that is given by this particular conversations uh in the data set and then this is a model generated output model direct summary so you can see it's not fairly doing very good and generating because it's not a chat model and that it and it doesn't have community of understanding the instructions right so now with this information you have the quantize model you've tested it on some prompts and you know that the prompts are not working because it's a base model now what now you have to fine tune this particular model right but for finding this particular model if you find in this particular model you you need to fine tune it if you try to fine tune it from the from the base model then you have Humanity pointed all the parameters right and there are billions of parameters like seven billion parameters are defined and that is not possible with the with the particular Ram that you have the particular GP or MBM it might go and out of memory so with that thing in uh my uh there is a concept called Laura which is low rank adaptation what it is doing is instead of fine tuning all your 7 billion parameter models you just have to fine tune some of the parameters of a particular model and that can help you to fine tune a model based on some parameter modification not all 7 billion parameter so that is the concept behind the Laura so what is it is doing is you are choosing some of the parameters not all parameters you are choosing some of the parameters property model and you're trying to create a matrix a similar kind of a similar shape of a matrix uh of the parameter that you're trying to Target and then uh you're trying to decompose that particular shape into a different uh low rank Matrix and then you're trying to train that additional Matrix that you have created a similar kind of Matrix and then finally you're trying to use that particular Matrix that you have created and then save it in a particular format right and this this particular thing what we call a creation of a single Matrix or a same parameter is called adopters to see this image this is this is the let's say a metric Matrix and then it has a weight right so let's say let's say if you if you have heard about if you're if you're talking about transform model let's say this is a key key Matrix key we have query key value right in Transformers so let's say this is a query Matrix and then for this query Matrix if you're trying to fine tune it directly uh then you have we might have to fine-tune all the querying key and value matters right but in in in this situation in Laura what it says is you would have to find in all the parameters you can choose a particular parameter so I am choosing a single parameter which is called query and then that query is nothing but a matrix right so what I am to what I'm trying to do is with this Laura Laura will create a similar Matrix which is a and b and if you multiply this A and B Matrix it will it will have a similar shape of this uh the the original Matrix so in in the parallel you're trying to create a singular Matrix with the same shape but in a decomposed form right so that's why uh it's low rank adaptations which is trying to take a single shape of a matrix but in a decomposed form with certain rank rank means a certain number of rows upon use right so so now you have this original Matrix and also you have you have parallely created a similar kind of Matrix a similar shape of a matrix in in a decomposed format with certain Rank and now what you're trying to do is you're not trying to fine tune this Matrix the original matter you're trying to find this Matrix this this Matrix that you did and then once you have fine-tuned this particular matrix by the way this is also known as known as adapter and then once you have uh fine-tuned this particular Matrix you try to merge this change with the original Matrix so that's the that's the learning that you did you had the initial learning bits and now you you wanted to learn a new uh new information right so you created a new Matrix with the same shape you learned this parameters and then you merged it here with the same Matrix and then you get a new Matrix here the same new query Matrix so this is what happening at each and every parameter that you're trying to Target so it just this I explained with the query you can also apply without value Matrix you can also apply with the key Matrix right so this is what Laura is all about and if you see this this particular parameter are very less because you're choosing a particular parameter and you're trying to find in that particular parameter and then you're trying to merge with the original weights and that's how the fine tune my phone will completely foreign so if you if you see this diagram at the bottom you see there is a preteen weights that are already pre-tained by the particular model and then we we created a with the Laura which we created a new Matrix uh for the certain parameter and then we are trying to fine tune this uh Matrix the newly created Matrix and then trying to merge with this uh Britain weights and then we're gonna get a new Matrix here so this is what it's being explained here so during the training this is happening and after merging we're gonna get this you can see this is the merge weight you're gonna merge this two and then you're gonna get a merged weight which is gonna be a different fine tune weights uh after training so this is a comparison that I wanted to show you uh the need of Laura and how the Laura is actually work at the back end right and now with this information I think now it will be very helpful to understand what is happening here in the in the upcoming text so remember we had uh we had taken an instant data set we prepared it and then we want we have our entice one right and we saw that the quantized model is uh the quantize based model is unable to understand the instruction data now uh once we once we understood this Laura concept now we are going to apply this normal concept so this is a function uh that will help us to understand how many parameters that we have to find here uh so this is uh this function is helpful function so that you can understand how many parameters that you that you are going to find in this particular uh particular fine tuning step right so the first step begins here is we want to import uh prepare a model for KB training from the pep so you have to enable your gradient checkpoint and then also you have to prepare your model for KB training and then you have to pass in the the imported model or the loaded model from your from the hugging phase and then you have to pass into this prepare model cable training and that's how the model is ready for uh cable training and then and and then I have printed this particular model uh just the the alarm so you can see these are the parameters you see and in this parameter model if you see uh this is self-attention there right this is llama times right and this is where the Transformer hold a lava module has a Transformer right so you can see this is lava tension and then we have this query key value and there's some other projections available and then geometry embeddings so uh you can find in all of on all of these layers but since we have talked about the concept called Laura so we don't have to focus on all the all on trending on all the parameters we are just focusing on the query key value and other predictions which is uh these are the four metrics that we are going to focus and with this in these four Matrix we are going to apply Laura which is we are actually additionally going to create a similar Matrix for each of these uh quality value metrics uh in a in a in a different metric space with the same Dimension size in a decomposed format using Laura and we'll have each of them and for each of them and we will fine tune all of those embedding and then finally we want to merge that fine tune bits into this original page I hope it is making sense to you and by the way you can see there are like 32 layers of this potential layer inside this llama are two layer so uh so we are going to use we're going to focus on this query key value connection and other collections right so this is what we're going to apply so if from the fifth Library you're going to import Dora config and get pep model so in the lower config you have to mention the rank like what is the rank of this particular Matrix would be the grammation and the Laura Alpha is how many uh decompositions you wanted to make so it can be different you can also change these parameters and try it and you can try and test it and the next module is something like if you what is the module that we're going to do so that is what I I was talking about like we have been talking about the query key value and other collection so this is the model that we're going to Target Target in the model specific so uh in the Llama specific models this is the this is a Target Model name so you have to write it right up here and let's suppose if you're trying to fine tune the plan D5 then query key value is is your target module it will become your Target Model you have to open your model and search which are the target models search inside the model that is you want to focus on the model and then you can fill it over here so that's the focus uh so in this uh in this particular format we are going to focus on very key value and other collections which is uh so these four metrics uh for these four meters we are going to create an additional Matrix uh in decompose format and then we are going to find those metrics and then finally append it to the original weights right and then we are going to mention this Laura Dropout and then you can also mention some bias which is kind of you want to have some bias you can you can fine tune it on the bias also but I'm not choosing to maintain it maybe you can refer a lot of config documentation to understand and the task type is for your alarm so because we are using causal model so you have to specify causal LM so let's suppose if you are using plan T model which is a sequence to sequence model so you have to change the task type to SQL to sequence type and that's how the Loro config is ready and now you have to pass this particular quantize model quantized Lamar to model that you imported into the get theft model and with your Laura content so this step is essentially creating the additional metric for all of this target modules and once you have created this particular additional Matrix inside your model and then that's that is what we call an adapter adapter has been created that is being used so you can understand with that technology and now you can use the method called the function called print Pringle parameters and then it will help to understand how many parameters are trainable so if you see there are like uh you can see there are millions of parameters available right and then you are just training uh like 0.47 of the training right 0.47 is very less uh compared to the percentage that we have you can see the trainable parameters so it's very very less number of parameters related and and you can expect like the weights should be very very less and uh and we are going to get a better performance even by training this small less less amount of parameters and that's the beauty of this Laura training and that is why you you have to use this Laura to make your fine tuning very efficient and uh and also very very uh let's let's uh Hardware consumable right so this is the beauty of this particular pact module uh with Laura adapters right so once you have this quantize model and with Laura config initialized all your additional weights ready with model training so you have this model setup ready now now you can proceed with the training part so you have the instruct data set you have a lot of models and then you are now trying to train this particular model right so for training I am scheduling uh 10 support uh for for better visualization of training loss and validation loss so now the next step goes on here is to impose a training argument so you can mention your training arguments that are being used and you can see that I am using a different kind of Adam Adam Optimizer which is pasted Adam Optimizer and this is a special type of Auto manner that has been released Adam also stores some of is the its uh variables or some of it's a additional weights uh in the in the Ram or in the GPU Ram so this paged uh Adam W is generally trying to use the CPU memory also to optimize the learning and store the optimizer weights so that's a different kind of uh item Optimizer that you can use for your own training Uh custom uh generative models and then you can play up with these parameters I am using specifically uh learning schedules as cosine and then you can play with this particular parameters you can also mention this config use cache refer to pause to avoid some warnings and then finally we are going to use TRL library and then from here we are going to import sftt trainer which is supervised fine trainer from the sft from from the uh from the tra library and this is very useful class that has been given from from the TRL and it is uh very handful uh which which uh reduce lot of the boiler plate codes that are available to do before uh going to train the model so you can see we just prepared the uh yes prepare the instant data set you you did the you initialize the Laura config and then you finally jumped into the training part by using this sap trainer so in this asset trainer you have to mention the Laura model the Laura prep model and then you have to specify the train data the validation data which is nothing but the instructor data set right so you're going to pass it pass in the instant data set and then you also pass in the Laura config and then you're going to pass in the the field where the instant data set is available inside this data so remember in the text key we have this all the insert data set right and then you have to mention the max SQL lens uh I took one zero two four you can you can take it up to 4096 because uh that is the maximum sequence length of a lava 2. and then you have to specify the tokenizer and then you have to specify the training arguments that you've taken up here right so that's the step that you have to mention inside this sfti dinner and then you have to just run trainer.train command and that's how you're gonna train the particular model on your own and you can see it ran for like uh uh 20 minutes per Epoch and and it has trained quite a lot for for viewbox and then I dragged it using tensorflow uh board and if you see the loss uh this is a training epoch and this is a learning rate and this is a training loss you can see that loss is going uh downwards for the training right so that's how also you can take a watch on your training and testing so after like half an hour of training for two epochs because I had just a limited data set so I used the limited data system to fine tune it in a in a in a small amount of time so uh this is what uh I used and I train it maybe you can train for longer time for better efficiency and now once you have trained the model now what you can do is you can save the particular mode so if you when you when you're gonna see this particular model right it it's not gonna save the uh the the base model and and and the uh and the additional adapter bits the lower weights that you have created uh it's not gonna save both of them it's Just Gonna Save only the adapter weights that is the lower additional weight that you have treated so it's Gonna Save only those so if you see uh I'm just saving I'm just trying to save this uh this material model uh with only the additional weights the adapter weights so whenever I try to save this particular model right you can see uh additional uh files are being created which is called tokenizer config obviously and there are some special token map and tokenizer.json and also there is some adapter uh dot Json files also created which is specifically storing the weight streamed weights uh of uh the Laura config so if I have to tell you that what is it uh so these are additional weights that has been gone trained right so these are additional bit that has just been uh stored when you try to save the model and now now if you want to use it for inferencing right so how you can use the the additional adapter bits because uh once the model has been trained and you saved that particular model it's not saving the whole model it's just saving the adapter weights then how you can use it for inferencing so that's the beauty of uh this Laura and what you can do is you can you can import a pepft model from from Feb you can Import Auto Pep to model question LM and you can also import a auto tokenizer and then you have to mention uh the directory of the adapter weights where you have saved it and then in this uh in this class you have to pass in the directory of this and then you can put up all the parameters that are required in it and you can just import this and you can just initiate this particular cell so that it will start downloading the the base model associated with your adapter model and also add it it will merge it into this particular print based model that's what it is doing here it is trying to take up your adapter weights and internally it is also trying to load up your base model and trying to merge it with your original weights original base weights and that's how you're gonna get a train model and then you're also going to load up the tokenizer and that's ready you can start doing the referencing so I for inferencing I took the same same example and I use the test data set and I'm picking only the first row of the data set and I'm prompting with the same data set that I that I use the prompt right I use the same prompt in the data set so I'm giving the same but this time not giving the summary I'm just giving a dialogue uh what I have and I'm trying to get the summary out of it so I just initiated the train model and tried to generate uh the output from the train model and that's how it generated and now let's see uh what is the output so this is the input prompt and this is the instruction and if you see uh this is the input these are dialogues and this is the human summary that is available right uh this is a human sub summary that that the data set has and this is the ground truth and this is a train model generated text if you see this is the model has generated the the summary out of it I can see uh it is doing fairly good if you read about if you read this particular summary and likewise you can fine tune uh this particular model uh for other use cases and you can generate a comparison between a Baseline and the model generated text right so that's that's very important thing and uh and and I will also speak on something which is very important uh like let's say uh you have a one task of let's say you you did the task of summarization of the text right now you wanted to do a different task let's say you wanted to uh um uh translation task uh you want to translate something right but you're going to use La model itself uh to fine tune it so so in the production you cannot use uh like you can use but it is very difficult to use the same compressed format right the compress or the merged data or the merged weights uh from the base model so let's suppose you are trained our summary model and then you merge with the with the base model and you're trying to load it into the uh into the server all right and then you have a single use case but this time you want to trans do a translation with a stream based model like llama model so what you're going to do is uh generally what you what you will do is you're gonna take the base model and then you're going to train an adapter model for translation and then you're going to merge it and then you're going to serve it into them into the server right but with using this Laura what you can do is you can train these different different adapters so you can have a summary adapter a different trained summary doctor like like what we had here and then you can train a translation adapter and then what you can you do is these are very this adapters will be in like in megabytes like in 10 mega 10 megabytes or 5 megabytes it will be very small uh size so what you can do is you can just place a base model of lava 2 in in your main repository in the server and then as per the requirement of translation or as the summary you can try and you can merge your adopters as per the requirements of your task so let's say if you have a summary requirement summation of requirements so you can merge the summary adapter with your base model and then you can you can do a simulation task and let's say if you have translation task what you can do is you can use your translation translation adapter you can you can plug in with your base model so the efficiency becomes very very uh great here because you're you're not trying to import whole base model again and again for different tasks you're just keeping a same base model available in your uh in your RAM and just you're going to import and merge your adapters for you different tasks that you have trained and that's the beauty of this particular model and that's why we uh we are very very fond of a training uh the generative model in In You by using this Q Laura and Laura operations and that is very very remarkable Benchmark that we have we are able to witness in this training of gender DVI so I hope this this makes some some sense of it out of it and uh maybe you can utilize it for your for your use case so once you have done the inferencing and then what you can do is you can uh you can take up this train model and then you can merge it and then you can push it to the uh repository or you can push it to the hunting face Hub so that's the part of it and I think this is the whole tutorial that I wanted to make and I think uh if you wanted to train some of your custom model on your own data set then you can uh use this uh approach and try to work uh on building a data set and then trying to fine tune the model on on this particular uh by using the Laura approach so if you have any questions around the training part uh please mention in the comments and I would be happy to help on you that and uh hopefully you enjoyed this particular tutorial thank you

Info

Channel: Karndeep Singh

Views: 4,101

Rating: undefined out of 5

Keywords:

Id: psR6kMjDttM

Channel Id: undefined

Length: 45min 20sec (2720 seconds)

Published: Wed Sep 13 2023