Anyone can Fine Tune LLMs using LLaMA Factory: End-to-End Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to AI anytime channel in this video we are going to explore llama Factory a very recent framework that has been built to fine tune llms or to train llms okay with ease now from title you would have seen that anybody can f tune uh a large language model and I'm not joking uh to be honest anybody technical non technical if you have been working in this field of artificial intelligence or you are a non- Tey guy or working in some other Industries and domains doesn't matter anybody can fine tune a large language model if you just have the compute power that's what these Frameworks the libraries the low code and no code tools like Excel toll and Lama factories and some of the others are showing that how easy nowadays to you know uh fine tune these large language models to to you know F tune on uh some confidential data sets proprietary data sets or if you want to you know fine tune for a specific use case you can do that as well and that's what we going to cover in this video we're going to do that uh in Google collab on the collab Pro on a v00 or a a00 GPU and we are going to build uh probably targeting to build Docker GPT kind of a thing where you know you have I'm I'm a Docker fan but I love Docker and we will take a data set from hugging phase that's called Docker natural language command so if for example if you want to uh delete a Docker container this is a sentence that I'm saying and if I go to chat GPT and write okay give me a command to delete the or delete all my containers and CH GPT will generate some responses and they will charge for you by the way if you can but you can use it the free version of that as well but imagine if you started putting your confidential data on CH GPT so it it doesn't work like that right so want wanted to uh try with some interesting ug case we'll take the data set and we'll use this amazing framework called llama Factory that said easy and efficient llm finetuning and we fine tune the Mistral llm we're going to fine tune Mistral model here Mistral 7B chat the instruction version of Mistral we're going to fine-tune that model on our data set which is on hugging face Docker NL command by Ma quity the data set that we're going to use you can use any other data set you want you want to build an SQL GPT where you want to generate SQL query for natural language from natural language you can do that you want to build Insurance GPT Finance GPT and lot of other gpts that you want to build basically a system where you'll be using this fine tuned llm through an API endpoints and something like that right that's the agenda let's jump in without wasting any time here so you can see it says llama Factory the Creator uh hu yoga okay probably I'm not able to pronounce here but all the credits to him Lama Factory a framework a fine-tuning framework supports more than 100 data sets more than 50 llms with all the different techniques like Po DPO sfts reward modeling and whatever you name it with all this uh Noel ways of training like P luras and qas or even the without lur qas full fine tuning it supports everything almost everything and makes it very easy to uh fine tune this llm through a wavei so they have a gradio integration done as well and I will show that that's the agenda if you come down it says one stop wavei for getting started with Lama Factory and you can read the GitHub that's not what I'm interested in this is a GitHub repository credit goes to them go and start this repository they are going to reach uh 10,000 very soon here okay 10,000 uh start now let's talk about the data set here so this is the data set that I'm going to use it's by Matt quity it's Docker natural language commands and you can see some of the examples this is an input it is an output as an instruction instruction is same for every uh input and output pair because instruction is here like translate this sentence in do a command so there's uniformity in instruction here and I have shown that how you can create because most of you have this questions that how can I create and prepare or format my own data watch the previous video that I have posted on my channel I'll share the link in description where I have SE how you can create one and you can also watch my QA generator uh videos question answer generator video where we have shown how you can use a close source and open source llm to generate question answer pair of your uh Text corpus so now you have data set here input output and instruction some of the examples like give me a list of all containers indic getting their status as well list all containers with UB 2 as their ancestor something like that right and then it has an answer for it so imagine if you have a question answer pair in probably keep it English because if you want to do it for other languages then you have to look at a way of further F tune the pre-train model through vocabulary extensions you have to you know expand the vocabulary and grammar and all those things that's a different use case English language probably that works better and then you have a question answer pair you can do it locally you can also do it through hugging F so this is the data set that we are going to use in this video and let's jump in now on the real de of setting it up in our Google collab Pro I am using v00 let's connect to the runtime and you can try it out on runp Lambda Labs V AI Amazon sales maker vertic AI machine learning blah blah blah right wherever you have compute power go ahead and do that first thing that we're going to do is git clone so let me G clone on the Llama Factory so I'm going to come here and copy this https and then just paste I have git clone so I'm git cloning this repository that's called llama Factory and this will be uh will be visible here in a bit and once it is done you just come here and refresh this and you will see a folder created llama Factory the reason we are you know cloning the repository because we need to uh take some scripts some P files make some changes so we can load the gradio UI it's an open source project by the way so some of you might have a concern that hey I cannot put my data into lava Factory of course you can because you can also set it up locally in your infrastructure if you're running it on premise or wherever on the public cloud or wherever you have your infrastructure set up you can use LV Factory you know as per your wish I am doing it on collab doesn't mean that you know Google will take your data it doesn't work like that okay uh you have your Lama Factory over here and once you expand that you will see all the folders we're going to use Source data and all these folders in a uh in a bit now uh once you do that we have to go CD inside it because we have to install the dependencies to work with L Factory Transformer pep bits and bytes accelerate EI knobs are some of the dependencies that you need to install to work with Lama factories if you want to use qora where you want to load the model in 4bit quantization then you need bits and bytes if you have no idea about this terminologies I will request you to watch not request I will recommend you to watch my llm fine tuning video which is a crash course video of 1 hour 20 minutes watch that video to understand what what what is the meaning of 4bit qora nf4 low memory uses and other 50 or 100 terminologies that we work with now let's go into CD Lama Factory and then just uh uh okay you can see now we are in C Lama Factory if we just do actually that's a wrong wrong that's wrong okay a PWD present working directory and it will tell you that which directory you are currently in and this is what we are in now let's do a pip install thingy here so pip install hyphen R requirements txt that's what I'm going to do here so requirements txt and this will install all the uh requirements that you have to work with as I said okay now let's do that I also have made a few notes that I have to explain in this video for you to understand better while it installing it's very important for all of you guys who have no idea how large language model works you can take code you can watch videos but you have to create this skill to understand large language models the different techniques the algorithms so you should put your effort there not in you know just looking at the high level things from Top like low code no code tools it will not help you in the longer run when I said that anybody can find unit of course anybody can do it but that you have to look at for a sustainable growth in your career right wherever you are working you have to go an extra mile you have to become an architect in your field right so that requires a very versatile knowledge of your field and that's why that's why it is important to go and look at the fundamentals and Basics and not only look at the drag and drop and low code no code tools right so now if I look at here pip install hyphen R requirements TX it has installed all these require requirements that we have accelerate AIO file data sets and blah blah blah now now the next thing that we're going to do is uh we going to work with 4 bit quantization so we're going to work with Q here quantize low rank adaptation and for that we need a library that helps you with all the quantization techniques that's called bits and bytes so let's install that so pip install bit sand bytes bits and bytes that's called BNB okay and it will install and meanwhile it is installing what we are going to do is let me show you the changes that you have to made you do not have to run a lot of command here or you don't have to write a lot of code because we going to use the we y the first thing that you have to change is go to this SRC so click expand this and then you will see something called there are two options one option is bass if you want to do a bass scripting here you have to run it in this collab Pro or wherever you are running you can also set up bass but I want you to show you the web UI the grad UI app where you just click or drag and drop few things like data set name models name some hyper parameters selections and then you are that's it you can just tune and also evaluate so let's do that double click on train W.P two times once you double click it will open a python file you just have to make one change here which is s equals to false because it's a gradual application I want a link to work with let's make it share equals to true it will open in a separate link and also a browser contrl s and close this first change is done now the next change that you have to do is you have to go to data set so let's go into data set so go into Data click on you can see n number of data sets were already there and then you also have you know an option to okay data set info. Json in in this data sets you will see all the different kind of data sets that Lama Factory currently supports you can also make it custom and that's what I'm going to show you that how you can bring your data here let's see that ignore all these data for now your focus should be that I have to create my own data okay that's that's the focus so let's now for that that's what we're going to do here so for that what we're going to do so let's click here and then you write your data name name so I'm going to write my data name as Docker NL now this is going to be my data that I'm going to use and then you use Docker NL and it's a Json so in this Json now the first thing that you're going to put is hugging phase so acronym for that HF and hub URL so hugging face Hub URL because our data is right now on hugging face so let's let's bring it from here and let's copy the repository of this amazing guy Matt who has shed his data okay and I'm just going to put it here now this is what my hugging F Hub URL which is having my data set now after that give a comma it's a Json schema that we are working in and then you have you can define a column so I'm going to define a column because mral is fine tuned on llama 2 so I'm going to use the Lama 2 promt format uh prom format Lama to uh data format okay basically we have different type of format like where you have Al pack you have other formats that you can work with now I'm going to define a columns here so in columns again you put that and inside this columns now what I'm going to do is first is my prompt so let's define a prompt this prompt is nothing but this becomes your instruction over here so you can see this instruction so let me just write an instruction so map that I'm mapping right now this is my instruction and after prompt I have query so let's Define query and then query is nothing but query becomes your input so let's define your input query is your question basically that's your input and then you have your response so response becomes nothing but the output output is your response so let's put output and this is what we need perfectly fine and after that just give a comma here and now once you give a comma I have brought my data set here through hugging phase as simple as that now you have your own data set you bring it over here and just pass that you know information in data set info. Json just do a contr S once you do a contr S you will it will get saved and let's just close it so let me just close okay now that change has been done now we have to go to make one one one more change and that is not probably not required but let's see that at least so if you go to we're going to I'm talking about prom template the promt structure so for that what you have to do let's go to SRC so in SRC you have something called you know llm tuner so in llm tuner expand that and in llm tuner you have your data so expand that and then you have template. Pi so in template. pi double click by the way in template. pi you will see different templates that has been defined so different templates have been defined over here you know for different kind of llms that have their own way of you know accepting this inputs if you look at let me just come back you can see you can see here B blue LM chat glm2 chat glm 3 you know lot of other llms code GX deeps deeps coder a lot of others are here where you can see Falcon and blah blah blah right so now let's come to 554 or something where wherever you can change so let me see where is let me just change this probably I'm just thinking okay instruction query instruction let's try it out so I'm not making any change or or let's remove this if I have to I have to just remove this Japan whatever Chinese or whatever it is okay and I'm going to call this llama to chat or something like that okay and control or let me make it I have a better understand Lama 2 Chad Docker and you can see M and Lama to follow the very similar prompt template can see just the only Chang is prefix over here where they're not using says okay and but they system basically but this is fine I think this is very similar now this is done now once you are done with that what you have to do is you have to run a command so let me show that so you have to run Cuda and we are using v00 which has only one Cuda so Cuda visible devices Cuda visible devices equals zero because you only have one and then you just run the python command SRC and the file that we have Trainor web.py so let's run that now once you hit enter what it will do it will open a it says Cuda command ah I have two times excuse me sorry okay so Cuda visible devices now what it will do it will open a gradio link for you so you can just click on that it will open in a new tab and that's what we are interested in so it will take a bit of time but it will open now you can see local URL you can see over here but we are not interested in local URL what we are interested in the gr. live so let me just open gr. live here now once I open gr. Live it will take a few seconds to come up with the you can see you know this is like okay amazing guys let me make it a bit should I make it bit bigger yeah this makes sense now this is a screen that you see nothing I'll say it's not much different than how you do it from uh like programmatically we do it right we write all the codes like we write bits and bytes config we write the training arguments we use the safety trainer or DPO whatever so you are not writing the code here somebody has already done it for you you are just using the system to F tune on a very high level you can see languages I'm going to go with English here and model so let's select a model you can see it selects Bloom you know it selects your blue LM Chad GM Chinese llama Falcons llama Mistral I'm interested in Mistral model you can also try it out 52 I recently created a 52 find tuning video please watch that video for how to F tune 52 on your data sets now you also have Mo model that surprising you have mixture of experts the sparse Mo by mistr that's called mxr atex 7B I don't have the compute to fine tune that so I'm not going to do that otherwise it will crash I'm going to use mistal 7B chat one of the best performing model that we have right now and you see model name I have selected M 7B chat and then model path also gets change over here it says mral 7B instruct I'm using the instruct version 0.1 of this model the next thing is fine tuning method Laura you say hey why I'm not able to select anything you will be able to select in few minutes once we fine tune it we will be able to see our adapters our fine tune adapters of Laura that we are going to use for inference and also merging it together with uh the base model and push it on hugging phase probably okay now we'll we do that in a bit now coming back up below and let me just Advance configuration yeah that is important if you look at advanc configuration I will not touch the rope scaling and booster though I wanted to explain flash attention that different video coming on both flash attention and Flash attention to very soon I will not touch the right hand side quantization is important because we are going to load the model in four bits so cotization bit becomes four so we using Q right here so so that is done let's keep mistal and see what happens so I'm going to keep prom template as Mistral okay so mistal is fine probably we would have done okay fine m is okay now if you come down you will have four tabs you have train you have evaluate and predict you will have chat and you have export we will talk about chat once we find unit we'll talk about chat evaluate I will not touch like they have Blu and rules but I'm not talk touching those things because I don't find them that relevant at this moment for this uh video so my focus is on train chat and probably if we make it faster we'll talk about export and everything is happening live we are not editing it anywhere now if you look at the stage it has multiple as I said right supports everything sft reward modeling where you have rlf annotated data sets then you have reward model and you know extension and blah blah blah right now we have DPO one of the most famous technique I'm working on a DPO video from scratch where I explain each and everything in DPO from is will take a bit of time a few days to uh release that video DPO is the most sou uh uh stage I will say to perform in know training or fine tuning LM people are going Gaga about DPO which makes sense because it's really powerful it's basically remove the rlf from the uh equation there and then you have pre-training I'm not talking about pre-training I don't have compute to pre-train uh llm here so we going to keep sft as it is let's select our data sets if you bring your data sets on line number two data set info. Json then your data set will be in the first that's why I did there and you can see Docker NL let's select that and I have my uh Docker NL data sets data directory data this is fine S I will cut off the length I will just make it half of it to for memory consumptions and everything compute power 52 but you keep it as you want I also keep learning rate a bit different 2 let's keep 2 e minus 4 which is 0 point and 40 and 2 okay so 2 e- 4 okay so this is my learning rate please remember that if you are not using a00 then keep this compute type as bf16 f f is 16 or probably you can keep it okay depending on that but most of the time you will face error okay so this is how you can also debug it I also reduce the max samples from one lakh or 100,000 to 10,000 Epoch I will keep it for one Epoch only so 1.0 I don't know why it a floating point that might make sense because you can also have 1.3 or something on epox now epox you can see it's one EPO that I'm going to f tune on 10,000 maximum samples for my data sets on bf16 2 eus 4 0.402 0.2 and then 512 as a cut of length I'll also increase the P size to 16 number of samples to process per GPU so let's keep it 16 and what else I have uh to maximum gradient normalization so let's make it also less 0.3 and this looks now good okay uh fine now this is my thingy now if you look at extra configuration the logging steps save steps warm-up steps I'm not making any changes over here but you can do that you have Lura config where you can Define the target modules but I think this will be automatically set when if you look at the source code for M and other LM they would have done it for by by this default you can increase the lower lower rank eight is good this is very intuitive by the way now lower rank if you're using 52 for example which has they say f 2 which is around 3 billion parameter in the model if your language model is is smaller in size and you want to fine tune it your lower ranks should be higher to get better performance a very it's like reciprocal in nature smaller the model higher the rank and that is important now I'm not covering that I have covered that in my fine tuning grass course please watch that video rlf configuration not touching it we don't have reward or DP over there so it doesn't make sense so let me just uh again collapse this okay now preview command so once you click on preview command what it does it gives you all the arguments that okay these are the things that you have and if you go back to here you can see that you know it's right now running and we are okay with it let's click on start now once I click on start you will see a few thing you will see a loss over here it will also appear very soon so let's go back here and says it says your setup does not support bf16 you need torch or emper GPU I said a800 and that is what it is important so let's keep it fp6 and let's click on preview command and now you can see I'm previewing command and then start now once you click on it start let's go back to and you can see now it's show you everything over here now let's go back to your uh notebook and this is important guys that I said bf16 that's what I was trying to explain that if you are not using a00 GPU or A10 GPU that has been supported by bf6 you make it false most of the time you get error and that is like you can also try to read that on internet but that's the story now here what we are doing we are downloading the model weights and you can see it's down it's going to download a 9.94 GB of model we SE Source from hugging face repository and they they would have been using some cash and all mechanism that's it's bit faster but let's see how much time it takes guys I want probably I'll try it out for the next 20 30 seconds to see if it's faster uh then we'll keep the video going otherwise we'll pause it but I want to show you here on the L Lama board everything you can see or the veros like veros is equal to True by default and that's why you can see all the descriptions over here you know it's like they would have been using some of course some kind of socket progr pring or things to update the statuses over here but that's what it is now once your training starts the fine tuning by the way once I mean by training the fine tuning this loss will appear over here you can see it say running Z out of 37 and you will see a a curve over here you will see all the minimas if there are minimas the local minimas or the global minimas of the world you will find it out about the loss graph now it's running so I'll wait let's wait for it because it's not going to take much of time because if you look at at this is only 2.4k rows and we have a very minimal configuration like sequence length and all and only one EPO right so it will not take a lot of time and we using a v00 so let's wait and see after that so okay now let's wait for it and then we going to go in the reasoning steps and all we'll see how how we are able to inference it or work you can see now the loss is appeared the original is the below one the smooth is the orange one and you'll right now see the loss if you come back here on the notebook you'll also see the learning rates the loss happening and the good thing you can see our loss is decreasing by the way we have a reduced loss after each chunk of the epoch that we see here fantastic for every steps uh that we have okay uh you can see the information over here as well so far the model is generalizing well so so it's not overfitting by now as we see but the better understanding of evaluation will come once we go into the evaluation part that Lama Factory provides but we'll we'll try it out right now it's it's shpping up nicely we'll not have a overfitting by now let's go back to Lama board and you can see it's going to complete very soon we have completed 27 out of 37 it says running and after that guys what we're going to do is we're going to use this adapter path that you see we're going to refresh the adapters and once you refresh the adapters because this get saved in your runtime so you have a collab runtime inside your uh save I'll show you this weights get save and then you refresh it you can load it from there and you can also merge with the p right uh merge P that we have and then you can push it to hugging face as well so you'll have your own model that's called docker LM or something that's that's very interesting okay so let's try it out and see what happens okay come down and you can see it says finished now you have an output directory that's called train 224 or something like that okay and then you have finished and then you have your loss graph here generalized really well by this but once we have more steps more EPO we'll have lot of local minimas over there you know if you try to influence it like that that what is your finding from that loss uh ratio now uh L function by the way now if you look at here it says this is what my EPO was training loss 0.24 train run time was for this train samples per second blah blah blah dropping the following does not have all the necessary Fields now what we going to do is the first thing I'm going to do is saves now once you click on saves here in the left hand side you will see mral 7B chat the folder has been created now inside the folder you will see your Laura because because we have used Laura and your adapters are there and this is your adapters where you have your two things are very important adapter config Json and adapter model of safe ters now other things are like coming from the mral base now this is fantastic now let's go back on the Lama board click on ad refresh adapters now once you click on refresh adapters and when you click on the drop down you will see right now you have turn it once and that's why you have one adapter over there let's click on that and the time you click here you will see chat it is now let's click on load model so I have that adapter and I'm going to load my fine tuned model over here and I'm going to you know ask some question and see if it's able to generate the answer or not CU let's try it out it will take a bit of time to load the model you can see it's model loaded now you can chat with your model so let me ask a questions we have a system promp but let's first trite it out and see uh let me ask a questions from here first and see if it's able to get it anything like that I'm going to keep maximum new tokens as 512 top p as 0.7 and temperature is 0.95 so let's click on submit and see what happens now once I click on submit what I'm expecting is that it will generate some response and you can see it generated the response fantastic that's the right response I loved it so we got Docker PS hyphen a filter status exited filter exited one fantastic isn't it now that's the question I ask it the streaming response as well and you can ask other questions as well from here and maybe you can ask your own question get me the docker image ID let's see what it gets for this get me the docker image ID and this gives you Docker images format let me ask a question uh how to delete uh Docker container so I'm asking this question now now this is my question that how to delete a container it a Docker RM hyph F because we are forcefully deleting that container container and fantastic a correct response now you can see that we are able to generate the uh uh command for Docker queries and you have built a fine tuned mral version for Docker related queries you know question answering which is fantastic we have your maximum new tokens and a lot of other inference parameters that you can tweak now we have loaded the model in this runtime it's running right now is you can see that here it is you can download it and also push it to hugging phase but what you can also do click on export I want to show you here now in the export you can also see a lot of things you have you know export quantization bit you can play around it you can you know maximum size for a model we have you also have the option this is fantastic I'll tell you why it gives you lot of other personalization option that you can you know personalize your F model that okay what should be the started size of your model because you would like to share with your friends or deploy it somewhere then you need a sedy or something like that which is fantastic and if you go to evaluate and predict you can also evaluate okay so you can try it out yourself and see we don't have a validation data set over here but you can see this is the chat loved it uh so now you could see that how easy it is to find tune I know it took us less than 30 minutes you know to find T mral 7B chat on the hugging F data sets live completely no pausing video no editings nothing everything raw and now you can see that we are able to generate responses which is completely correct but maybe we go a complex questions you ask we have to see that how it works but yeah this is what it is guys you know the code will be available of course there's one notbook will be available with some instructions that what are the changes that you have have to made uh if you want to find tune from this video you can do that you can make those changes and F tune it this is a GitHub repository of Lama Factory go ahead and use it and let us know you find findings uh what if you have that's all for this video guys a few things I'd like to end it with don't rely on low code nood tools uh rely once you have decent enough experience to survive in your jobs okay because this is important because you want to grow in your career you should learn the fundamentals and Basics and then you reach a stage where you understand how this works and you don't have to write a lot of code you don't have time to do it because you have a lot of other things managerial you know leadership Architects blah blah blah then you do all of those things you know drag and drop and kind of a things so because that helps you with some productivity as well but if you are a beginner I will not recommend you using this you should understand algorithms handson right okay uh that's all for this video guys so if you like the content please hit the like icon if you have any comment thought feedbacks please let me know in the comment box you can also reach out to me through my social media channels please find those details on Channel about us or the channel Banner in uh please share the video and the Channel with your friends and to peer thank you so much for watching see you in the next one
Info
Channel: AI Anytime
Views: 9,884
Rating: undefined out of 5
Keywords: ai anytime, AI Anytime, generative ai, gen ai, LLM, RAG, AI chatbot, chatbots, python, openai, tech, coding, machine learning, ML, NLP, deep learning, computer vision, chatgpt, gemini, google, meta ai, langchain, llama index, vector database, llama factory, LLaMA Factory, llama factory fine tuning, fine tuning tool, llama factory llm, PEFT, LoRA, QLoRA, SFT, DPO, PPO, RLHF, Reward Modeling, direct performance optimization, fine tune LLM using llama factory, no code tool, mistral 7b, 8x7b
Id: iMD7ba1hHgw
Channel Id: undefined
Length: 35min 10sec (2110 seconds)
Published: Sun Jan 07 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.