Fine-tune LLama2 w/ PEFT, LoRA, 4bit, TRL, SFT code #llama2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello Community today we create a synthetic data set for fine tuning a llama2 model and this data set is created by gpt4 the application of fine-tuning llama2 model and of course we want to make it a little bit challenging so we create the synthetic data set with one single prompt now since we want also that the whole code runs on a free collab notebook so we have to use all the tricks that we know to reduce the memory footprint so we're gonna use path our parameter efficient fine tuning and we choose one methodology and we go with Laura the low rank Matrix approximation for high dimensional tensor structure and of course we use a quantization and since we want to run maybe on a free Call of notebooks on a T4 GPU from Nvidia we have to reduce from an 8-bit to a 4-bit quantization of all other weight tensors what we will use here the Transformer based reinforcement learning and our good old script by hugging phase 4 as supervised fine tuning now if this is all done we have the result a fine-tuned llama 2 model for a specific user-defined task if you have any questions those are my videos I have here coding video on path where Laura and 8-bit if you want to go to work quantized lower in 4-bit this is the video we will not use any Lang chain because we will do everything here in a more intelligent way and if you have questions about reinforcement learning this is the video for you this reinforcement learning of course includes robotics but never mind it is also valid for only a large language model and not a visual language action robotics model so here we go what I want is I have one primary prompt then I have one script this all of this is one code one code script I'm gonna show you and the result is an inference system ready so we have a fine-tuned llama2 system that is fine-tuned for a very narrow specific task and the task is defined in my one prompt so we have now an agglomeration here of intelligence that I just have to input one prompt and I get out a complete fine-tuned llama tool system for the task that I Define in the prompt what do you think an example I say hey fine tune Allow Me 2 model on the latest scientific data of all the planets in our solar system so maybe activate an agent to collect all the internet data from NASA or the European space agency and right here with another agent or with a pure llama too a short science fiction story on an adventure traveling from Marco to gunymed but based on the real scientific data on the atmosphere and the temperature on the hard radiation level of each of the different planets Plus activate effect checking agent when we have here our agent white the science fiction story so you see this is a simple thing or you can use it for some business application I mean if you were without any fantasy you use it for business so what do you think we write now Dakota so this is here our Jupiter notebook by Matt Schumer I give you the link in the description of this video downloaded it and follow along so we do everything and today we want to fine tune our alarm 2 model in my last video I showed you the same where we fine-tuned here our chat GPT model from openai that is now also possible but a lot of questions said hey what about llama 2 how to do this let's do it I have a V100 High Ram GPU great so at first our primary prompt I told you what is the specific task and I take the same prompt like in my last video so you can compare here fine tuning chat GPT to fine-tuning gamma 2. say a model that takes in a puzzle-like reasoning heavy question in English and responds in a well-reasoned step-by-step Sword Art response in Spanish as I told you last time we go from a temperature from 0 to 1 so we have a very imaginative to a very fact based temperature and the number of examples that it will greatly set it to 100 this is the minimum set we go here with the Lama 2 version so we have here now and we ask gpt4 so this is why we need the open AI API key so we use here the openai API interface two where is it here chat completion opening eye create reuser gpd4 model and we have here our messages now we Define now the data set no gpt4 creates the data set based on our system prompt and here we have our prompt so of course we address the system and says Hey gbd4 listen you are generating data which will be used to train a machine learning model you will be given a high level description of the model we want to train and from that you will generate data samples each with a prompt response pair so this is exactly what we need we need fine-tuning data samples and we don't either we have it which would be great so you can skip this but if you have no data but you want for a specific task the gpd4 creates data in gpd4 news let's say the complete internet respecting all intellectual property rights of course this will be done redefine the format you will do this in the format prompt prompt because here response response concierge so only one prompt response pair should be generated by turn we should we really want gbd4 to focus on this thing and for each turn make it slightly more complex than the last while ensuring diversity this is the same like last time here's the type of the model we want to train and this is the prompt great so we have our example we have here gpt4 we have the things and we generate this great next opening eye just generate system message create gpt4 or any other model that you want but tube84 is really good at reasoning and Performing the task so you will be given a high level description of the model we are training and from that you will generate a simple system prompt for that model to use so we are not any more generating here a system message for data generation no we are now generating a system message to use for inference and of course we have to define the format make it as concise as possible include nothing but a system prompt and response for example right here so we give him a one short example great you can set the temperature in the max token yes yes yes now we put everything into a panda data frame because we want to create a Json file that we will use later on so we create a data frame we remove the duplicates you know this and then of course as the classical Way Forward we have our training data set and we said okay we want to split the data into training data set and the test data set maybe you use an evaluation data set or whatever so then we have here our data sets and then we say here our training Panda data frame to Json or a test data frame and the data frame to Json and we have our two Json files great now I told you not GPT or chat GPT but now we go here with llama2 so we're gonna use every trick in the box that we know we're gonna use Accelerate from hugging face where you've spaffed in the latest version bits and byte please update to the latest version our beautiful Transformer and our Transformer based reinforcement learning algorithm we import everything beautiful Laura configuration path model supervised fine-tuning trainer from hugging face everything is here now simple term we use here from hugging face Alarma 2 model since we want to run here in a free Jupiter notebook we go with the smallest 7B if you want to spend some money please update here to bigger models and we have here now all the Laura parameters we say hey use 4-bit equal true you might say are you sure not 8-bit will really go down to four bit are you really sure and I tell you yes because 4-bit it is short it is fast it is quick maybe not as precise as 8-bit but here we are just going here as a demonstration we have here as I showed you already in my video here about 4-bit Quant quantized Laura here a specific quantization to type nf4 what else device per train yes yes yes you have your learning rate your weight Decay everything that you optimize if you want to Deep dive and modify this parameter if you're an expert go for it if you are a newbie just leave the default parameter they are done by an expert great so as always as we train the data set we said hey let's load the data set let's pre-process our data set let's have here our bits and bytes configuration in 4-bit yes yes yes and then like in the Stone Age of programming we say here hugging face our Transformer Model Auto model for causal language model from pre-trained our model name our quantization rbnb configuration the device map and you know all of this now nice of course let me make this here is the tokenizer from pre-trained addressing trust remote code is true to padding yes beautiful we use your English the right padding size and then we have the path configuration beautiful training parameters the same as I told you if you're an expert dive in modify optimize the code to your Delight otherwise just go with the default value we Define here a hugging face trainer or a supervised fine tuning trainer that we have here the model we have the training data set evaluation data set we have the path configuration we have here the maximum length we have our tokenizer we have to find our training arguments and beautiful so go and train it beautiful and then we save our pre-trained new model great and of course as I told you last time there's a little bit of difference if you fine-tune chat GPT or you find your llama too and here we go with the llama shoe output and that's it now what we do now run inference and you're gonna say unbelievable is it that easy when you see the pipeline we go really simple Yes you merge the model and store it in the Google drive if you want beautiful reload the model and merch it with the Laura it's beautiful reload to tokenizer yes Save the merge model beautiful hello to find your model from drive and run the inference we have here our classical commands Auto model for causal language from pre-trained model path in the auto tokenizer from pre-trained model buff you know this and we're ready to go and now you see my prompt now is explain Quantum entanglements to a 75 year old professor of theoretical physics and Mathematics and you say go and no no this is the wrong approach because remember we did the fine tuning of llama2 for a very specific task remember our primary prompt was here a model that takes in a puzzle like reasoning heavy question in English and response with a well-reasoned step-by-step Sword Art response in Spanish so what I go here this I would go here for a normal chat system for a normal blah blah blah system but no we have a fine-tuned llama too and fine tune to this specific task so please do not make the mistake and go with this General prompt but your prompt of course has to be in line with the primary prompt take in a question about a puzzle-like reasoning heavy question in English and you get a step-by-step result in Spanish I will leave you the link to this complete Jupiter notebook in the description of the video all the rights and all the respect go to Matt Schumer this is his Twitter account I also leave you the description to his GitHub repo try out here fine-tuning llama 2 and by the way by the way if you have your data set great you do not need to generate here a synthetic gpt4 generate the data set but you start here you say hey install my libraries you go here with llama2 you define your hyper parameters watch out the format of your data set is fitting is equivalent to the input that you need here for alarm two for your training data set and then you just if you want copy the code go and run with it experience yourself optimize learn make it better and if you have some great ideas please leave a comment here in the description of this video and share your Insight with the community I hope it was a little bit informative for you would be great if you have some feedback and if not maybe I see you in my next video
Info
Channel: code_your_own_AI
Views: 14,250
Rating: undefined out of 5
Keywords:
Id: zcMQXID447s
Channel Id: undefined
Length: 14min 36sec (876 seconds)
Published: Fri Aug 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.