Fine-Tune Llama 3 with LlaMA Factory in Free Colab

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

just few months ago fine-tuning a large language model was only for machine learning engineers and the people who knew the inside out of the models but now as the days are passing almost every week there is a new tool which enables us m models to find tuna model easily locally and even in the collab on the freeer Lama Factory is one such tool which allows us to easily and efficiently fine-tune the large language models not only that but you can use various techniques to fine-tune the models including reward modeling sft DPO orpo and the list goes on and on and it only um not only supports the models which are known but also 100 plus model and that that list is growing very very fast another great thing about is that it is free open source and the license is Apache License which is always great by the way when we say fine-tuning a model what it means is that it refers to the process of adjusting a pre-trained or base models parameters to fit a specific task or data set improving its performance and accuracy this involves feeding the model a new data set and adjusting the weights and bies to minimize the difference between the models prediction and the desired output this process allows the model to adapt to the new task or data set without starting from scratch for example normally what happens is that whenever a new model is created then it gets trained on a huge Corpus of data which include sometime toxic data harmful data so after a model's pre-training or the training what happens is it is fine-tuned with safety alignment or it is it gets trained on a data set which ensures that model avoids uning in a harmful or toxic way but still of course this is not a perfect solution so sometime um model does respond with the har P content anyway that is discussion for another time but now we know what fine tuning is why it is needed and then we have this llama Factory which is a tool that enables us to fine tune lots of model easily and freely from our own systems or even in collab or from the bigger servers if you have a um more pressing needs so Lama Factory enables you to F tune over 100 models efficiently you can employ methods which I mentioned or po DPO PPO and it also supports 16bit and 32-bit full tuning and low rank adaptation adjustments with options for 2 4 8bit q to reduce GPU memory usage I will drop the link to it GitHub repo inpt description and you can yourself read through which options are available and there are lot of them and if you want to know which model it supports it supports models like various flavors of llama llama 2 3 lava mistol mistol mixture of expert quen X JMA baa jm5 and the list goes on and on and then it also supports Advanced algorithms like gallor badm Dora long Laura Lama Pro mixture of TAPS Laura plus Loft Q agent tuning it also have lot of good stuff happening like flash attention UNS sloth um rotary positional encoding scaling RS Laur and the and then the list goes on and on you can also embed monitors like tensor board vanb ml flow and you can also do faster entrance with gradio with CLI VM and the list goes on and on I'm not going to go into that detail but let me now show you on a free coab as how you can finetune a model so as you can see I'm already logged into my collab and then simply go to runtime select your free T4 GPU from Google so really good of Google to provide this D4 GPU let's save it and now first up let's install some of the dependencies for this Lama Factory you can see that we are simply uh cloning the Lama factory. where we installing UNS sloth we installing bit and bite andex formers if you don't know what UNS sloth is then I have done various videos on unslot so I would highly suggest you read through it very very fine tool by Daniel from um you know who lives where I live in Sydney Australia and I also did an interview with him so interested if you're interested search the channel I think you are going to enjoy it so let's wait for all of these prerequisites to finish and then we will proceed further and it is almost there once that's done I'm going to check the GPU all I'm doing in this command I'm just import sorry I'm just importing the torch and checking the GPU but let's wait for the above command to finish and then I will run this one all the prerequisits are done let's also import this torch and check our Cuda because we are using GPU so it should be fairly quick that is done let's load our identity data set from the GitHub repo which we have loaned so that is also quickly done as you can see and you can of course use your own data set easily and then you can specify here it in identity. Json data set is done now you can execute this command to create the Lama board so let's wait for it to run shouldn't take too long now and by the way you can also run it through CLI which I will also show you shortly and it has generated this gradio link you can just run it from here and you see how good that is this is your English you can select your model which model you want to do I'm just going to go with maybe llama model and I'm just checking if I have Lama 3 8 billion I just want to keep it short and then this is your adapter path if you want to find want to do Laura you can select your adapter here or if you want to do uh the Q Laura or free or full so Laura is better you can refresh the adapters we don't have any adapter for the moment but you would have to separately download it and then you can either train it with supervised fine tuning or you can go with DPO PP but because I have selected the data set which is for sft that is why it is selected or you can um select with any of your own data set which which is in this DPO PP or Oro format and then you can go with it now you can also select your data set from here easily whatever you like and then you can also specify your hyper parameters like how many box you want 1 three then maximum gradient n Max sample and then lot of other stuff is there which I have already described in my other videos what these things mean and then there is your Laura configuration lot of stuff is there you can simply click on start and off you go but I'm not going to go with this one I'm just going to show you the CLI one now these are the CLI commands as you can see now if you look at this one it is pretty straightforward we are just importing some of the libraries and we are specifying our um technique of fine tuning as supervised fine tuning which Tunes a model on specific data set with label data and then we are specifying that do train is equal to true that we are training it and then we are specifying our model our data set our template of the model and then we are also specifying that our uh fine tuning type fine tuning type is Lowa or low neck adaptation which uses adapters to modify the pre-trained model and then Lowa Target is which means that which uh layer attaches to which layers to attach low adapters to and then it all means that we are attaching it to everything then our output directory per device thing the per device train bat size mean that the batch size for training per device whether CPU or GPU and train with a batch size of two means that we are training it with bat size of two and then gradient accumulation step these are the number of steps to accumulate gradients before updating the model LR scheduler type it is the type of learning rate scheduler to use and then we are using the cosine here which is a cosine learning rate scheduler that adjust the learning rate over time so this logging step is that how often to log training progress and we are saying it to log every 10 steps meanwhile I'll run it so that uh and then warmup ratio is 0.1 which means that um this uses a warm up scheduler with a ratio of .1% of total steps and then we are saving the check checkpoints after after th000 steps and the learning rate is uh the initial learning rate and then number of training EPO I have just set it to one to speed this up I'm not sure I think still it will is going to take bit of a time and then Max sample is around um from each data set is just 500 and then Max grade Norm uh it's a maximum gradient Norm for clipping and then I'm using the quantization with 4bit cotization and then towards the end Lura plus LR ratio is 16 which primarily means the ratio of Lura plus learning rate to to the base learning rate so that's it um for this but let's see it is training as you can see let me scroll down to see what is happening it is loading the model which is 5.7 gig this is the pre-train model the base one once St loads because then it starts training with our data set with by using the adapter so it is almost loaded there you go and then let's wait for the training to begin and there you go so the training is in progress and there are it seems 74 steps the ETA is just around 6 minutes or 7 minutes but that could go up you never know let's wait for it to finish so the fine tuning of the model has finished and you can see if you scroll down that all the loss is coming down and we are in a pretty good shape you can also infer from this model and the new adapter if you like so all you need to do is to run this where we are importing some of the libraries we are specifying our base model plus adapter which we have just saved and then we are specifying lur these are the just chat templates and then all we are doing we are specifying user assistant and then we are appending the messages if you run it it is going to to inference here and there you go how good is that and you can do lot of inference from it if you like and you can change it and of course this will change the promt template and all of your parameters as per your use case and you can use any model here you see that within minutes we were able to to fine tune the model um the base model with our own data set and you can do the same so you see user can ask something here like what is the capital of Australia maybe and then a capital of Australia Canberra or you can say explain the meaning of Happiness press enter and then happiness is a positive emotional state you see how good is that and then you have just built a chatbot with on your T4 GPU after fine tuning the model if you like you can simply use hugging pH CLI to push the model to hugging phase hub for the whole world to see so that's it guys uh I hope that you enjoyed it if you looking for more information on data sets and other methods of fine tuning and anything related to it please search my channel hopefully you should find something I hope that you enjoyed it if you like the content please consider subscribing to the channel and if you're already subscribed then please share it among your network and it helps a lot thanks for watching

Info

Channel: Fahd Mirza

Views: 3,315

Rating: undefined out of 5

Keywords:

Id: ucJJ7tM8xkM

Channel Id: undefined

Length: 13min 10sec (790 seconds)

Published: Tue Apr 30 2024