Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
how can you improved the performance of GPT light models for your specific use case one approach is to fine tune the watch language model such as Walmart 2.0 using your own data set so when should you fine-tune a large language model such as Walmart 2.0 well the answer for that is that when you have to do it so when prompting doesn't work for your use case and when you know how to do and what to do when you're using a watch language models with your own data you basically have two approaches that you can integrate the watch language model with your own data sets on the left we have the fine tuning approach and on the right we have the retrieval augmented generation which is probably the more popular approach since it requires just the prompt and then you have to basically input some parts of your text right within the prompt so why you may use a rack or retrieval augmented generation the simple way to answer this is that you have an easier way to use multiple knowledge bases or just single one so for example if you have some financial data or let's say wire documents Etc multiple PDFs you can use those to integrate some parts of the text right within the prompt and analyze that using the watch language model we have a single model that you can essentially ask it a multiple tasks and this will work with just this one model and then some of the drawbacks of this approach is that it really doesn't have much more knowledge or any other knowledge on your data outside of what you're putting into the prompt and then you are pretty much have to experiment with multiple prompts and evaluate their Performance Based on how good your prompts can be on or how good your prompt engineering techniques can be and getting a consistent type of output for example markdown or Json or some different approach to the formatting output can be a bit of a hard task to do as well so fine tuning is actually solving those two problems pretty much in the general case and but it is of course adding some more complexities to the overall performance of the model so benefits of the fine-tuning a large language model is that probably the model if fine tuned correctly is going to have much better performance on your own tasks or use cases compared to the general order based large language model then you have to do a lot more or a lot less prompt engineering so you basically have to put in some prompts that are that can be much shorter compared to what you have to do within the general watch language model and then the result of that is that you will probably have much more tokens that are available based on the limits of the model and those tokens you can use in order to provide even more data of course but the drawbox drawbacks of the fine tuning a watch language model is that it can be hard to do this requires a lot of resources a lot of time to experiment and then you of course have to have a lot of high quality data in order to get some good results on the fine tuning and if you have or if you plan to use this with external knowledges such as when you're doing retrieval augmented generation this might not work as well as the general large language model since it is fine on your specific cases in this video I'm going to show you an example on how to fine tune a watch language model Walmart 2.0 in our case in order to summarize conversations on Twitter between customer support agents and users and the task will be to summarize conversations into something that is easier to read and short of course so we are going to use a custom data set in order to do that if you want to follow along there is a complete text tutorial called fine-tuning Walmart 2.0 on a custom date set that is available on ML expert dot IO and this one is available for ML expert Pro subscribers only so here you can also find a link to the Jupiter notebook or the Google clock notebook along with a complete tutorial along with the code and examples and you can pretty much follow along everything that we are doing here so if you want to support my work please consider subscribing to ml expert Pro thanks of course the data set that you're going to use is going to be specific to your own use case and in our example I am taking this data set from Salesforce dialogue studio so this is pretty much a collection of unified dialogues data set and instruction aware models for conversation AI provided Again by Salesforce and from here I'm using this Twitter Sim data set which contains these splits so those already for us to use and you can see that we have some summaries right here and we have the conversations themselves so I'm going to continue with using or I'm going to show you how you can process this data in order to provide it into a format that is available for training alarm two point model the next logical step is to choose a large language model that you can fine tune on your own data and for that you'll probably need something that is open and you can use it in a commercial setting so here we are going to use pretty much the state of the art open model provided by meta AI called Obama 2.0 and this module has a lot of advantages it has different model sizes and it is having a context length of over 4000 context are tokens so yeah this is pretty much state of the art model right now and this is visible on the open Arrow leaderboard where pretty much every top spot right here is the fine tune version of this Lama 2.0 model I have a Google Cloud notebook that is already running and here you can see that we are using a Tesla T4 GPU with about 16 gigabytes of vram I'm not using a high Ram option so this should be available for Google Co-op free users as well and I'm installing pretty much the standard Library storage Transformers data sets but then we're going to use a water technique and or QR then bits and bytes and then the TRL or Transformers reinforcement warning library for the trainer I'm going to show you that in a bit we have the Imports and most of this is again pretty standard for loading the model and then we're going to use a bits and bytes config for the quantization that we going to do with the Q order technique training arguments and then the trainer Itself by the TRL Library I'm going to use the Walmart 2.0 7 billion base model so not the instruction or chat tuned model so this is going to do just the summarizations without the instruction training and here we are loading the data set that I've shown you in a bit a while back so I'm going to do some pre-prosync in order to extract conversations from this and I'm going to show you the resulting conversation that I'm going to Output from a single example you can see that the conversation is formatted into a user agent user agent conversation we have also the summary and then we have the instruction that we are going to pass in as a prompt to our model note that we're using an Outback style formatting of the template or the prompt we have the instruction below is the conversation between a human and an AI agent write a summary of the conversation then we have the input with the conversation itself and then this is the response or the summary so if you want to use your own personal data set you might have to format it into a similar format of course the alpaca style is not required but I find it pretty easy to use and read so I prefer to format my fine tuning examples in similar format but note that the base model doesn't actually use any prompt formatting so you might use whatever you like so in order to find out to transform this data set into that I'm creating this function called generate text and the most important part here is that I'm getting the summaries I'm getting the first summary of the of the summaries and then I'm creating the conversation text with just which just iterates over the output of the conversation and I'm using this clean text function in order to remove URL mentions in Twitter multiple spaces and then some strange characters that I found and I'm pretty much using the user and the agent in order to provide the conversation text itself and finally I'm using this generate training prompt which is using the default system prompt that I've shown you uh it is be always a compensation between a human and an AI agent right in the summary of the conversation of course you can try something different as well and I'm just formatting the system prompt the conversation and the summary into the alpaca format that we are using and this is the result that we are getting so I'm pretty much going over the whole data set shuffling it then calling this function and removing all of the original columns since we don't actually need those and this is the data set for the training and validation and test split thus far we have just the conversation the summary and the text for each point then I'm going in Into The Notebook on the hugging face and I'm then downloading the model which again is awama 2.0 with the 7 billion parameters and I'm loading this into a normalized Fallout 4 so this is the quantization that we are going to use and then I'm getting the tokenizer for the warmer this should take some time and then uh what is pretty nice about the recent versions of the Transformers library is that you have this quantization config right within the models so this is Now supported and it works with bits and bytes uh also it works with auto gptq if you're uh going to use that you might give it a try this should support it as well and then I'm using a war config which is very similar to what I did previously we have the ranking of the Matrix equal to 16 and I have a scaling factor of 32 and some Dropout you can play around with these settings and those might work better on your own data sets if you change those but I'm using this causal watch language modeling for the task that we're going to do so this is going to just predict the next token and I did run this for about let me see yeah this all run for about 12 minutes on a Visto V100 GPU actually on the T4 is going to probably take two or three times more but still very feasible and this is uh the response or the yeah the evaluation was that I got let me just have a look at the scalers right here yeah so you can see that the evolution was was running very nicely and probably if you are running this for more than two epochs this is what I did you might get even better validation loss the training was was also trending downwards so I would say that and this training was pretty okay for just the two epochs we were able to converge to some nice validation was so this was good and in order to train this I was been using these training arguments uh you can note those parameters right within the tutorial or the full text tutorial that is available for on ML expert one of the more important things here is that I'm using a cosine ordering schedule so you can see that actually the yeah actually the warning crates during the training yeah and you can see that we were using this specific warning create scheduler and this seems to be working quite well at least for this example so I would uh suggest that you use similar worrying worrying rate schedule as well and the other important thing is here that we are using the bits and bytes compatible atom Optimizer with the weight Decay fix so this is a pretty nice and it appears to be working quite well at least for this training so another thing is that I'm passing in the path config or the config for the water or QR right within the SS sft trainer so this is the trainer that accepts the training arguments we are giving it a maximum sequence length of the that is equal to the Walmart 2.0 and then the field that we are going to use for the training in our case this is the text field so this contains the prompt in the correct format of course and then this is a very nice table that we got during the training and you can see that the validation was was converging nicely of course the training was was uh getting down a lot faster yeah but either the validation or the test set are pretty uh small so it's hard to tell whether or not the results are quite good at least looking at those but at least we're looking at some type of convergence and minimization so after drink is complete I am just saving the model and this will just save the keyword adapter so if you want to merge this you might use this code in order to find the water or the adapter and apply it to the whole model itself so you can have a single model without the need to apply apply the keyword adapter to it either way for the inference and uh comparison between the fine tune and the base model I'm just generating this prompt with the instruction the conversation and then asking for the summary with the correct alpaca format and I took a lot of examples or just five of the examples within the test data set so those are examples that the model haven't seen and I just formatted them into the way that we've formatted the training set so I've created the model and then rolled this small function that pretty much summarizes some text by passing in the tokenizer and then tokenizing the text and then I'm pretty much cutting out the generated outputs after the input length on the tokenizer using a very small temperature size and maximum new tokens equal to 256. so in order to have a look at a single example of this base model I took this conversation and this summary and we took about 24 seconds in order to summarize this conversation and the response of the base model is pretty much uh junk yeah so you can see that we are getting pretty much the same input that we the same output as the input of the conversation so not very good the second example is pretty much the same thing the conversation between change the phone number of an account yeah I think that this is actually Within with the fine-tuned version of the model not sure about that and here we have another response that is uh pretty much the input itself so not very good let's have a look at the fine-tuned version of the outputs so for example with this conversation uh we got this response and if you just take this part we have a very long summary but still a summary that contains all the information of the conversation so I would say that the here the fine-tuned model performed much better compared to the base model but not great since this summary is very long but on the second example I just took all of the output up to the First new line text and here the summary is much better customer is complaining that his account is linked to an old number and now he is asked to verify his account so yeah the summary of this conversation is very good at least this is with the fine-tuned version and then for the third example customer is complaining about the new updates on iOS 11 and we have the conversation right here you can read this for yourself so the predicted summary is customer is complaining that the new update iOS 11 6 agent asked to DM and they will work from there so again pretty nice summary so the fine tune version of this model is performing very well if you found this tutorial helpful please let me down in the comments below also if you have any further questions use the comments and please like share and subscribe to this video also please join the ml expert Pro subscribers to get a full text tutorial on the Google Co-op notebook thanks
Info
Channel: Venelin Valkov
Views: 41,968
Rating: undefined out of 5
Keywords: Machine Learning, Artificial Intelligence, Data Science, Deep Learning
Id: MDA3LUKNl1E
Channel Id: undefined
Length: 18min 28sec (1108 seconds)
Published: Mon Sep 04 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.