How to fine tune Falcon LLM on custom dataset | Falcon 7B fine tune tutorial | Google Colab

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello friends welcome to another video on my YouTube channel and in today's video you will learn how to fine tune Falcon 7B the open source large language model on your custom data set or on your custom task that you want to perform using that large language model so let's get started before we move forward I would like to thanks hugging face and uh tii UA for this Falcon 7B because we are using their product and it is free to use next we are going to use this Transformer reinforce learning library which is helpful you know to train uh Transformer models and then we are going to use this path parameter efficient fine tuning this to TRL and PFT will help us train large language model on you know minimum amount of GPU next uh we are going to use this uh y b e l k d a a falcon c1b shaded for you know uh 16 bit so we are going to use this model and finally uh for data set I have used this build some data set uh where we have text and summary of the text so what we are going to do is we are going to create a prompt where there will be uh instruction and input and there will be output or response from the assistant and we will train using these two fields so I am here uh in my Google collab IP notebook the first thing we'll do we will install few things TRL Transformers accelerate path data sets bits and bytes uh EI knobs and tick token so let me run that cell and you can see that in the runtime if I go here you can see I am using T4 GPU AIT is being installed yeah the next thing we will do is we will load the data set so I'll load the data set it will download the data set as well yeah next let's look at the data set okay so it has like 18 000 or close to 19 000 rows which has text summary and title let's look at the first example so this is the first example it has a text then summary and title next I have a format input function which takes the example and what it does is it creates a prompt like this a human summarize the given text so this is the text and assistant so this is the summary so we will create this kind of input for the Falcon model and uh in this way you can create your own prompts like uh for example in case of uh translation language translation you can say human is saying translate the given text into French and you provide uh your input here and whatever the translation is there you provide that against the assistant so I'll run this function and I'll map all the input data to this function like this and it will create a formatted text key into the data set so yeah you can see we have now a formatted text key as well let's look into first sample yeah you can see that okay yeah so you can see we have formatted text key there and which again start with human summarize the given text so this is the text and uh whatever summary will be there that will be against the assistant I'm not sure I can figure that out here but yeah that will be there for a demonstration purpose I'll sample the original data set and only take 100 samples from that so yeah let's look at one of the sample again yeah now uh let's make sure that we have GPU yeah we have around 16 gigabytes of GPU E4 so the first thing we need to do is we need to prepare our model so in order to prepare our model we will first import few things okay and then I'll provide the hugging face name repo name to download the model let me first run this because it will take time to download the model so this will download the model now this is our repo name so this is the repo name we are using and here is bits and byte configuration so that means load the model in 4-bit and you know quantize the model as well so that with a small amount of GPU we can train the model so these all things are just to make sure that with the small amount of GPU we can train the model next we will use Auto model for casual llm task and from preteen we provide the model name quantization configuration which is our bits and byte configuration and uh trust remote code otherwise if you don't use it it will ask you to just acknowledge that and then device map is Auto and these two things uh uh model config use catchy and gradient checkpoint enable these two things makes their model training faster and then we will pass our model to prepare model for KB training and that will provide us a model for the fine tuning next let's look into this tokenizer so we have created this tokenizer and I'm using Auto tokenizer and from pre-teen we provide our model name and again trust remote code so that it doesn't you know ask you the prompt to provide yes yeah now you can see uh there are eight shades of the total model out of which for our downloaded so let's wait for this to work as you can see that now out of eight the eight one is being downloaded once uh that is being downloaded it will check all the sads okay as you can see there the model loading part is now complete and if sorry yeah it was and we can look at the model as well you can see this is the model and now what we'll do we'll create the tokenizer that will be going to fast I guess yeah now let's create a path config that is Lora config for the training so we are going I'm going to use Laura Alpha 32 Dropout 0.05 R16 a bias none and our task is casual LM and Target module is a query key value so let me run that next uh with the help of this path config we will convert our model uh according to this path and then it will be very easy for us to train in it so let me run it and here we are going to use training arguments my output directory per device training batch size gradient accumulation steps a number of epoch I'm going to use one a learning rate uh floating Point 16 yeah Optimizer paged Adam w 8 bit and learning rate scheduler is cosine and warm up ratio 0.05 and you can read about more on these parameters training arguments on Transformers Library so let me run this so now with the help of model tokenizer and these training arguments we will create our sftt trainer and here two things are important one we are providing our data set second we are specifying which column to look so in our case it is formatted text so I'll run this yeah it will map the model to the trainer and now finally we can use train method on our trainer to train our model on custom data set or prompt you can see since we have used only 100 sample it will be going to fast so let me come back once the training is complete now you can see that the training of our model on our custom data set is complete and let's if we go to this part you can see we have now training results inside runs and here we have our events and everything so you can look into this next let's save our trained model into this train model directory so I'll run this so it will create a folder yeah trained model and inside that you will have our adapter model bin and adapter config.json file so now let's load the trained model for that we are going to use path config and path model theft config will come from this adapter config.json file and model will come from this folder so for that I'm going to again use a auto model for casual llm and from preteen we provide a train model directory and for path model we are going to use from preteen we provide our train model which is our Auto model for casual llm and trained model directory and for tokenizer again we are going to use base model name or path and let me run this this will take again uh one or two minute Max so now you can see that the checkpoint loading part is about to complete yeah now uh we will we will create a generation config for our inference and for that I am going to use train model generation config and then here uh I'll change few things like Max new token temperature top p a number of return sequence and pad and US token ID so let me do that and let's see the generation config as well yeah you can see this is our generation config I'm going to use Kuda for my device this is my query okay this is a piece of text that I want to summarize so I'll take that and I'll create my prompt now the same way we have used to train the model so I'll run that first we need to get the encodings so I'll take the encodings so these are the encodings we have input ID attention mask and everything so once that is there I'll create the inference so this is our train model I'll create means to call the generate method provide input ID attention mask generation configs and Max new tokens so for Simplicity let's use 50. I hope you understand that the more the max new token the more time it will take to generate the answer so I'll run that now inference is done let's see the output so this is our output so we'll use tokenizer to decode that and let's see what we got yeah so it went into you know kind of a infinite Loop uh human summarized attacks and things like that and assistant I'm sorry I don't get that please repeat the question human I'm sorry I so I know this this doesn't make sense but let's say if you train it first on the whole data set instead of just sampling it to 100 sample second we train it for more epochs instead of just one epoch then this is going to surely you know work more better than what it looks but in this video I just wanted to explain that you can train your own large language model on your own custom data set and I think I have made my point here so I guess this is it from my side of the video in case you like my work please consider subscribing to my channel that helps thank you for watching peace
Info
Channel: Raj Kapadia
Views: 5,241
Rating: undefined out of 5
Keywords: raj kapadia, fine tune falcon 7b, fine tune falcon, fine tune falcon 40b, fine tune llm hugging face, fine tune llm locally, fine tune llm tutorial, fine tune llm on custom data, custom data llm, llm with custom data, llm fine tuning
Id: CvSP0ZYoMCs
Channel Id: undefined
Length: 15min 55sec (955 seconds)
Published: Wed Aug 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.