The ALPACA Code explained: Self-instruct fine-tuning of LLMs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello Community welcome to the most watched part of how to instruct fine tune your large language model after the amazing success of part 1 and part 2 now part three so here we are now we have here a look at the prompt Stanford alpaca prompt so we are taking you exactly this prompt remember this is step one like I showed you now we do it one by one so we say yeah let's execute this have a look at it so we say okay you're asked to come up with 20 diverse task instruction now create Step 1 20 in structure and as I told you there are some requirements try not to repeat the verb the language should be diverse the type of instruction should be diverse the language chat GPT or GPT language suit model should be able to complete the instruction it will be in English the instructions should be one to two sentences long you should generate an appropriate input to the instruction not all reinstructions require input that's exactly as I showed you in my last video what is the highest peak in the world does not necessarily provide a specific context in this case we simply put no input in the input field and the input should be appropriate and appropriate response to the instruction in the input make sure the output is less than 100 words so chat GPT generate a list of 20 tasks and here chat GPT or then in the case of universities Stanford University they had a slight variation but this is exactly how it is done now the system comes back generate a short story with the team of betrayal so you have to input a man finds out his best friend is stealing from him output as he confronted his friend he learned the truth he had been betrayed not just in business but in their friendship too so here you have exactly the same you have an input you have an output and you have the instructions generate a short story with the team of betrayal Dosa is our two form for my last video now next example classify the following text as positive or negative sentiment input I love this movie so much output positive sentiment you see it's always the same repeating pattern a two form next example provide a recipe for some vegan lasagna oh Jesus so since here the object region lasagna is already defined we don't need an input that defines the our object so we say Okay output this is the output next topic summarize the main idea of the news article about politics you see input the bill was introduced yes yes yes output the Senate introduced the build focused on climate change that would be so nice so you see now here we can generate this hey this is nice eight generate a poem about the ocean so this is our instruction since we have here the ocean as a as an object here as the topic we do not need a specific input and so the output now generated hereby jet GPT the March 23rd version by the way is now this poem and this is the input data that goes now into the fine tuning of our llm so it understands exactly if I have this output and I have this instruction set these are related and he learns all the connected interlinks of here this semantic context what else do we have okay you see sometimes it goes wrong no problem you can sort it out if it has not the length and it has not the structure I show you in a second how we do this generate a haiku about Autumn this is our instruction the input we know it's Autumn we don't Haiku the output leaves falling gently colors of gold and red glow autumns Beauty shows okay so now I've shown you here this is the first step here Stanford alpaca prompt takes two beautiful now you might ask okay so what were the 175 human defined tasks so let's take here this one we go here let's put in this one let's see what he does so let's see here this was the original human input so we have an ID this is the C task 12 the human Written Task number 12. then we have here the instruction the instruction is explain humans Behavior we have here a code name explain behavior for internal reasons and then we have here the instances and like I showed you in my last video we have two fields we have an input field and we have an output field so here we go the input field is the behavior that is given in the instruction explain humans behavior and now in the input field the behavior is characterized or defined as Quine peers the output now is there could be many reasons why a person might cry could be feeling sad scared angry frustrated yes yes yes so you see these are the human written examples now look if I put this here just into chat GPT and I say nothing I just put it in look what he does he comes up with a modification can you provide the instructions now can you provide an explanation for why people behave in certain ways look the original human way was explain human behavior and now it's a much nicer way and then of course we have our instances and here we have the first field is our input field and this is the same Prime and then we have the output field human behavior can be complex in its influence variety of factors such as emotion experiences and personality this is so much nicer than this human written sentence here and this is the power here of gbt4 or judge gbt I don't know whatever March 23. this is why you need one alpha GPT it has been trained on tens of millions of dollars on more than a thousand gpus you have to have one alpha intelligence and then you can copy and extract and modify and whatever you do in combinatorial things so this is the beauty but anyway I wanted to show you this was the human input as I showed you in my last video and now more or less yes why not let's go into the code so just want to show you where are we where you're here in Stanford alpaca no thank you thank you thank you stand for alpaca an instruction following llama model they call it an instruction following I call it a self-instruction fine tuning whatever and there we have the code for generating the data and the code for fine-tuning the model so let's have a look how they generate the data did you see that what I explained to you last time is exactly what they did license data Json yeah Json was exactly what I what we have in a second resident here generate instruction sorry this one so here we go in and what I want to show you here we start Channel 8 instruction following data our instruction our self instruction generated data for fine tuning so here we go what do we need we need a path to our seed task that I just showed you the Json file in long then we need an um open AI API and then we need a model we go with the text DaVinci 003 so this is where you have to have your credit card ready the number of prompt instruction is three remember we have instruction input output given as a structure of our instruction data set you can have the temperature you can have the number of CPUs available whatever this is great so what happens now now we load simply here from our file that I just showed you seat task path where is it here seed task Json long here we now load here our data ID name breakfast instruction everything you know and that you love here you have the instances input and output here we are so back to the data where is it yeah so what do we do we have now here our seed instruction data we have exactly we take now the instruction we take the input you remember this is a field of the instances of the instruction and we take the second field of the instances and this is the output so we just read here the 175 human created seed instruction data to create our own data set yes yes yes you we later we want to calculate the similarity but this I'll show you later so first we tokenize all the seed instruction remember yes we have a tokenaza also here yes generation yippee for a machine instruction and it is so easy we have here we take our instruction for all seed instruction data and all instructions for the machine grader data what a surprise and then yes yes yes sample here we go this here is now yeah and here it's created now let's look at this one in detail so we are sampling only because we are beginning the seed task so we have a random sample and then comes the beauty and now we see we encode our prompt from our prompt instruction and then we just append it together and then we have our data set and then we go here to open AI to the utilities by openai and open AI will create as I just showed you here some diverse tasks so where are we yeah we and you can say here the the temperature you can Define everything you know with openai API you can set a lot of parameters to a p-top and whatever you like and then you just say from the utilities of me I completion and it generates here all the data that you need but the most important thing again I lost it here here is it here prompt encode our prompt so encode prompt where is it where is it let's have a look at end code prompt area Define and code prompt so we encode multiple prompt instruction into a single string we have to bring now our instruction into our data pipeline our input data pipeline for fine tuning so you're not gonna believe it but it is the same so four numerate prompt instruction 175 here we have now the task of instruction the task of input and the task of output who would have guessed that and then if we want to combine it in a single string you're not gonna believe it here exactly instruction input output are combined into a single string and we return a prompt that has now a single string cramped with all our three information Fields included we have the perfect Q form and then yeah you can have here the post process GPT response after TPT 3 or 2pt4 or whatever you have generated the data you have to clean it you have to make sure that it has a certain length it's not too short or long then you have here as I told you last time you can filter out certain images that you do not like you have video audio flowchart diagram you make it cleaner you can take care here by the punctuation but this is not so important the most important part as I showed you here we started here and we were now able here to encode our prompt to generate the prompt to generate now from open AI exactly like I showed you here I don't know 50 000 uh similar descriptions and then we have our data set yeah I just wanted to show you you want to have that from those 50 000 data sets they are not all similar like today is a beautiful day today is a nice day today is a very good day you know you want to compute the similarity hey you know yeah you remember Bert we compute the similarity of sentences remember we construct the vector embedding and compute the cosine similarity of our Vector in the vector spaces yes this is what's Happening Here you compute the similarity with the pre-torconized instruction and all those that are too close you just filter them out you want to have a diverse data set beautiful this is it this was here I wanted to show you here the code implementation so you go to Stanford alpaca generator instruction a python file this is it now you know how to generate your code you have your model card we had a look at this The Prompt yes of course the prompt I showed you here the execution in our chat GPT you know this now too the C task I showed you requirements yes of course if you want to do it this we need a numpy we need to hear something that we compare the similarity you need here of course your special token meaning your credit card information for open AI we are operating with Transformers we're operating in pi torch we have a sentence piece tokenization we have a certain uh version of tokenizer and we have W and B so everything you know everything you love is here now you understand it utilities anything I lost yeah open AI decoding arguments you have the temperature you have to float you have the optional sequences this is open AI specific you can read in open EI exactly what they want from you the decoding arguments by opinion the form that they wanted it is a beautiful explanation so you can go and really really do the same job as Stanford University for about 600 dollars what else training training we want to fine tune our system our llm because up until now we just created a data set my goodness I almost forgot the most important part fine tune our llm with a self-instruct data set let me start by the main core of this example this is it this is the main core it is always the same look we have a model we have from the hugging face Transformer an auto model 4 causal language model from a pre-trained hugging face model and there we give it the model name and we're on path and then we model any tokenizer where the tokenizers again from the hugging face Transformer we use an auto tokenizer has been pre-trained this is our parameter for this so you see model tokenizer as always then you can take care about the padding the size here with the pads in the tokenizer you have if you have a llama you have some special token forget about it and then we have here the data module now this is one of the most interesting things right now but let me skip it for now and say here look we use now the hugging face trainer class arguing for his trainer class is defined for us we just have to input the model a tokenizer and I just showed you here where where is it where is it yeah this is our model this is our tokenizer and then of course we need the data so we have here the hugging phase trainer class with the model the tokenizer into training arguments and of course I mean uh data a data module and then you just say trainer.train as always in my last 100 videos this is not a command to fine tune your llm then you can save this date you can save the model to a hugging phase Hub that everybody else can use it with your name and this is it so the only question left is now here our data module and here we have a function make supervised data module oh beautiful and our input is of course a tokenizer into Data arguments so let's have a look at this line 185 let's go there so here we are make supervised data module maybe even a little bit bigger so what we do we make a data set and a data collator for some supervised fine-tuning well of course so what we have we have here a supervised data set with a tokenizer and our path and a data collator supervised data set now data collator you know we use this a lot of what we comes out here is now a dictionary where we have our training data set well we have no evaluation data set for that moment and then we have our data collator so this is exactly what we need for the fine tuning so we have here two functions supervised data set and data collator coincidence that we have here the class definition for the data collator the supervised data set what it does it collates examples for supervised fine tuning and if you remember my other videos this is just about the maximum length about the padding that's happening the pad sequence so more or less I leave this up to you this just takes care about the perfect padding that everything has the same length everything is homogeneous they are not short and long sequences please this is for you now do really interesting thing here is our supervised data set with our data set that we just created so this is now our data set for supervised instruction fine tuning so beautiful now what we need we need a tokenizer because we have to translate our human words into some numerical Vector representation where tensor representation so we take one from the hugging phase Transformer from the pre-trained tokenizer Beautiful if I remember cork now I I don't know what Stanford is using sorry so super logging warning we load the data path warning prompt you know what as I showed you why not use here this task description so I tell you something why not let chat GPT work for us in explaining a certain piece of code so let's see cat item is this really the last one no where am I get item yeah it's the last one beautiful so you see you do not need me anymore because I just ask here chat GPT explain this class of supervised data set where we input our data set so this is our data for set for supervised fine-tuning we have here our tokenizer our pre-trained to organizer from hugging phase we have our data path we have here the warning beautiful and then I think the most important thing is here the prompt input and the non-input ah this is if we have an input field or not an input field okay I get it I don't care if we do not have an input yes beautiful so what we have we have sources and we have targets and warning and then we have data deck pre-process sources Pockets tokenizer so I think the most important most interesting field is sources targets and data dictionary and from the dictionary we have here our input IDs and our label is the classical one and then we'll return everything beautiful so let's see what he comes up with I want here two variables prompt input and prompt no input are assigned values for the prime dictionary a list comprehension is used to create a list of sources each element in the list is created by formatting The Prompt input or prompt no input string with values from each example dictionary in the list data dictionary list depending on whether the input key in the dictionary is M2 or not ah so easy to understand and not a list comprehension is used to create a list of targets each element in the list is created by concatenating the value of the output key in each example dictionary with a specific EOS token attribute of the tokenizer object okay and then we have a preprocess function is called with the sources with the targets our tokenized arguments and this function tokenizes the input and Target text using the provided tokenizer and here is it happening and returns a dictionary containing the resulting input and label tensors that we need so here is the magic happening beautiful you see sometimes it's nice even the free chat GPT version of March 23 you just input here just to explain this pipe watch code for step by step and he is really explaining it if we have here the data set now this is so nice the only thing maybe I'm interested the tokenizer Transformer pre-trained tokenizer exactly what we use but otherwise isn't this a beautiful explanation of what's going on here and there you have it we went through the complete code of Stanford alpaca I showed you how to build your self-instruct data set we then together went through the code of instruct fine tuning your language model and now you know all the secrets how to instruct fine tune your own large language model I hope you enjoyed it
Info
Channel: code_your_own_AI
Views: 6,415
Rating: undefined out of 5
Keywords: ALPACA LLM, Fine-tuning LLM, Fine-tuning BERT, Self-instruct, AI, NLP
Id: jQL0ZeHtXFc
Channel Id: undefined
Length: 25min 9sec (1509 seconds)
Published: Mon Apr 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.