How Large Language Models (LLM) In Generative AI Are Trained ?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello all my name is krishnaik and welcome to my YouTube channel so guys uh now it's been more than three plus years I've been uploading videos that are related to data science and I've always made sure that I've covered each and every thing uh let it be with respect to machine learning deep learning neural networks uh NLP based problem statements I've also uh you know implemented a lot of end-to-end projects with deployments a lot of knowledge sharing has been there and I've always focused on the basics first of all right now utilizing this Basics now if you're probably following my channels for a longer period of time if you have seen my videos where I've written everything mathematical Concepts how it is related to a specific problem statement all those things is basically covered in my YouTube channel now it's time that we start learning about more advanced things uh in this video I'm going to talk about like how chat GPT is trained and if you probably see this entire information what I'm going to talk about in this particular video you will be seeing that most of the things I have already covered right uh in my channel itself right and then that is probably added cumulative in a cumulative way it has been combined together and then probably charge Deputy strain which we are currently using it right so a lot of amazing things that is going to come up because as I have already told and I have posted in my LinkedIn that I'll be talking more about generative AI llm that is large language models and if I talk about chat GPT it is also a large language models okay now first of all let me just go ahead and give a brief definition about large language models and then we will try to understand about like how charged if it is Trend there are a lot of things that are going to come up probably in this specific playlist the playlist information will be given in this particular description of this particular video where I'm just going to focus more on generative AI uh what I feel is that in this upcoming days in upcoming years right couple of years lot of startups that are going to come up they may be working in this generative AI because there are a lot of work to be done amazing use cases can be solved through it and I can see that many many companies are there who are currently working in this now first of all let me large language model large language model right they are very big models all together and they have been trained with huge amount of data for solving some specific problem statements now it can be a kind of text to text model which can probably be used as a chatbot for conversation and all it can also be like a text to image model where you're just specifically giving a text and it is generating its own images similarly text to video text to audio this kind of models are there and when I say just as an example with respect to chat GPT 3.5 it has been trained with 175 billion param it is basically having 175 billion parameters just imagine how much data has been trained with right and currently I've already explained in my previous video like charge GPT whenever I talk about this has been trained with internet data right different different kind of data all available in the internet it has been used and it has been trained right and it obviously has huge amount of parameters where whereas in charge apd4 it has more billion parameters when compared to charge 3.5 so this was the basic definition about llm models I'll be talking more about it as we go ahead but in this video let's focus on like how chat GPT is trained which is super important and I'll just share my screen over here so here I have actually created all this diagrams to just make you understand but uh just to understand how chat GPT was trained you know I have I have explored a lot of research papers from a past month you know a lot of different resources materials that are available in the internet about chargeability blogs or even I have explored the chargpt website the research paper that was basically generated for this everything I explored and I found out multiple things to probably explain about this I'll try to break this down but before I go ahead there is an amazing article written by pradeep Menon you should definitely check out this I will be providing you the link in the description of this particular video and look it looks completely amazing he has explained in an amazing way and this last two diagrams that you will be seeing over here right if you probably explore this this last two diagrams I have also copied and pasted so definitely a lot of credit goes to uh to this specific article uh which is written by um you know pradeep so I will just try to explain you again just but not uh if you probably read this you may get some amount of understanding but uh currently I will what I will do is that I'll give a lot of examples over here right so that you will be able to relate it like how Char GPS train now to start with chart GPT entire models basically train in three stages okay the first stage we basically say it as generative pre-training okay and before you probably really want to understand about charge Deputy you should also check out my video in YouTube regarding Transformers okay because this is the base so if you search for krishnaik Transformers right so here you will be seeing that I have created a live session over here this specific video should definitely refer to because this is the video that I probably uploaded two years back and if I talk about models like charge repeat or Bard you know they are specifically using Transformers architecture which has an encoder and decoder okay so definitely check out this particular video If you really want to know about Transformers okay so in the stage one you probably train a generative pre-training model which through which you get a base GPT model then in the stage two you do supervised fine tuning I will be talking about each and everything of this like how this generative pre-training is basically happening supervised fine tuning is happening and then finally in the stage 3 you basically do the reinforcement learning uh by using the human in inputs right so that kind of not human input but I'll just say human feedback right so that is the reason it is written as reinforcement learning human feedback and finally you get a charge GPT model now this first step that you probably see the stage one right generative pre-training what kind of data is basically required from the entire internet data right it can be website articles books public forums uh uh like a website which is probably having tutorials a lot of things right so those all internet data is basically taken to train the charge Deputy so let's go ahead and deep dive and talk about this okay but let me just give you a basic example like what a llm model looks like let's say that I am a person over here and I am really much interested to know about dogs okay so what I did is that I have explored five six different types of this big huge 500 pages of books regarding dogs and I have learned I've probably read it about it so I know many things about the dog now you can ask me any questions so I will try to answer any new thing about that specific dog that is only present in the book okay that is only present in the book so this gives a brief idea about a llm model where you are able to answer something after probably getting trained or reading it from multiple Books Okay so this is just an example but when we talk about the stage one generative pre-training model so what exactly is basically happening over here so first of all let me make this as a full screen here you have a internet huge input data okay so here you basically have an internet huge input text Data you pass this data to the Transformers okay Transformers when you pass this huge amount of data to the Transformers and Transformers you have encoded decoder have explained about Transformers a lot in my live session please we will make sure that it was that particular video after you after this particular data this huge data is basically trained with the Transformers it basically creates a base GPT model now as you know what all tasks a Transformer will be able to do it Transformer car like they are able to do this kind of task like language translation text summarization text completion sentiment analysis so this is the kind of task that Transformers can easily do and just by training this huge amount of data we are able to get this kind of task and we are able to solve these problems right even uh in Transformers there is a concept called as attention is all you need right and based on that all this tasks can be easily implemented and I have also shown a practical implementation with respect to that particular live session also right ah just in a Google collab notebook right if you have good amount of data you will also be able to implement this okay so once this task is implemented but our main aim is basically to use this chat GPT model for conversation right what we want we want basically a conversation chat bot right so we want this functionality where we are giving a request and a chat bot is given the response that is the functionality we want we don't want this independent functionality this functionality can be combined in the response part right so in short when we probably create a g generative pre-training model what we are specifically doing over here is that we are able to get this sub task kind of task over here right all this language translation text summarization and all now we need to convert this task in the form of request and response right so that is the reason why three are basically required right so now what we do is that we go to the next step that is next stage supervised fine tuning now what exactly the supervised fine tuning is with respect to sft it is also called as sft now in sft what happens is that in one side a human being will be there okay let's say this is a human being over here let's say I am sitting over here right and I'll talk about a very important role nowadays which is very much famous which is called as prompt engineering role okay we'll also talk about that okay so here on the left hand side one human will be there on the right hand side another human will be there and this human will be acting like a chatbot agent okay so what happens is that whenever this human being sends a request let's say it asks a question like hello how are you so the another human will say that yeah I'm very good I'm fine okay something some response will be there then again this human will send another request then again the another human will probably send the response so like this it will be having the request and response continuously so these are some real conversations and this is all conversations will be getting captured so guys now this kind of real conversation will be converted into an sft training Data Corpus right now this sft training Data Corpus will basically be in the form of request and response request will be your input and response will be your output okay so like this lot of requests lot of different different responses it can be read for a similar kind of request they can be multiple responses also so they will try to first of all create this kind of training Data Corpus and it will just not be one or two records but millions of Records right millions of Records now once this is basically getting created you can see request is conversation history and response is the best idle response itself right so this this format of training Data Corpus will basically get created and then it will then be sent to the base GPT model right for the training purpose now once it is sent to the base GPT model for the training purpose then over here the optimizers according to the research paper that is basically been used uh the optimizer that is used over here is called a stochastic gradient descent right and from this you are basically getting an sft chatbot model what is sft sft I have already written over here it is a supervised fine tuning model with respect to charge GPT right so sft charge GPT model I will be able to get after I probably do the stage 2 or supervised fine tuning now still there are a lot of problems with this obviously it will be able to give you the answers right but this model this sft chat GPT model will be able to give you the output based on the data it is basically trained with if I probably ask some other questions to this particular charge repeating model that may not be there in this particular training data when it will start giving you some awkward answers which you may have not been seen also right so this chart Deputy will start behaving in a way you'll also not know what exactly it is trying to say and this all problems was faced also by the researchers when they were actually creating this and that is the reason they came up with the stage three which is called as reinforcement learning through human feedback now because of this step the model that was now created was called as chargpt model and right now whatever things you're using with respect to rgbt 3.5 or 4 is basically using this reinforcement learning through this human feedback now let's understand what exactly is happening over here and here I'm also going to give you a lot of amazing examples to just make you understand because the most complex thing is not this see data creation is always a task that we probably do as a data scientist not this is also not a very a very difficult step because there are Transformers we are using the same architecture we are just taking huge amount of data and we are training with it right the most important step is this because of this the accuracy of the chart GPT has been increased in a tremendous way right now what exactly is this reinforcement learning through human feedback so over here when we have this accepted trained model let's say a human agent gives a request then sft charge GPT will give some kind of response now similarly for this kind of request for this kind of request we may have multiple response also right we may have different different response now this is based on this particular response over here you can see these are all my alternative response now when I say through human feedback where this new human has been put up over here in this right so once we probably get multiple different responses now this human agent what it will do is that it will try to rank all this response saying that which is the most suitable response right which is the most suitable response or which is the best response based on that ranking will get assigned so here you can see response B is the best then response is the best then response D is the best then response C is the best we are ranking all the responses now based on this responses ranking what we do is that we create a reward model basically the researcher created a reward model wherein based on every response they probably provide a score right for every response they probably provide a score and this score will be based on probability so it becomes a binary classification if the probability is high right if the probability is very very high probability ranges between 0 to 1 if the probability is high that basically means that particular response is a very good response the probability is low that basically means the pro that's for that particular response the score is less right now this reward model now obviously if I am explaining it like this it is very difficult to just to understand so let me give you an example over here let's say that there is a chef okay now this Chef knows how to cook any kind of food okay any kind of food now suddenly in a restaurant there is a request saying that from a customer hey I want to have a very good non-vegetarian food okay it can be chicken right something I'm just giving some things right like chicken right and I want to have it and dinner right now if probably the chef gets this kind of request Chef will not initially know like what food should I create it depends right if the chef is from India or he's from some other foreign countries they'll try to create that kind of things that is actually likeable by the chef right so what Chef will do now it will first of all ask from many many people like what kind of food would you like so this is the initial response that is taken so these are all my response that is basically coming up right so these are all my response you will be able to see that I'm picking it up right so Chef initially will put up all the responses right and it will also ask that okay fine then what will happen based on this responses it will try to rank it right how many people have told that similar kind of responses right if many people have told okay I like this specific food obviously it can try to rank it right this can be greater than this this can be greater than this this can be greater than this this can be less than this now once we try to rank this specific response now Chef what it will do is that it will try to create a reward model and this reward model will be very very simple it will be a binary classification over a cross entropy is also used okay cross entropy is used now based on this what happens is that whenever the chef gives any response right it should be able to consider that whether we should go ahead with this particular output or not or whether should I cook this particular food or not like that right so this is what is the reward model that is basically getting created and this is based on the feedback the feedback is probably coming from the human beings over here right so I hope you got this specific idea about it now once that reward model is basically created then reinforcement is basically applied by a technique which is called as proximal probability optimization now based on this proxy policy optimization all the things that is basically happening is that reward model first of all updates the reward based on the response that is probably coming from the charge GPT model right and then it will also make sure that it will update the specific rewards by using this proximal policy optimization technique again in this technique uh if I uh probably I will not explo explain about proximal or policy optimization right now but I'll make a dedicated video about this but again understand that this is a reinforcement technique wherein we will be able to improve the charge GPT response and base based on that whatever human feedback or response is coming up it will try to make sure that it will try to increase the reward if the response is properly correct right so that is the thing that is basically happening over here and finally we basically get a charge GPD model and this rewards updation and and the policy model that is probably getting updated using the proximal policy optimization will happen continuously as the conversation is basically happening right and this is how the entire process of reinforcement learning happens through human feedback see guys I know from this whatever things you know you are very much familiar with stage one and stage two you may also know that how you can probably create this particular data set right that can be a manual approach but yes you can definitely do it right only the thing the most complex thing that usually happens is over here but understand if you are able to understand things right writing a code for this is also very very easy probably you can do it but it will not just be possible by any companies to do this because they definitely require a huge amount of data set right so this was just an example to show you that how charged if it is Trend I would definitely like to talk about uh this particular article which has been written by pradeep Menon I will be giving you the link regarding this uh you will be able to understand just read just go through this you know just try to understand whatever things is possible over here and then uh just let me know whether you're able to understand or not but going forward I'm going to discuss about generative AI llm models and all where we'll try to break down like how I broke down this by providing a lot of examples so yes this was it for my side I hope you like this particular video I will see you all in the next video have a great day thank you mandol take care
Info
Channel: Krish Naik
Views: 45,768
Rating: undefined out of 5
Keywords: yt:cc=on, ChatGPT training process, GPT-3.5 architecture, OpenAI language model training, Natural language processing (NLP) training, AI language model development, Deep learning for conversational AI, Neural network training for chatbots, Training GPT-3.5 for conversational applications, Language model training techniques, GPT-3.5 training data and methods
Id: rcxXiLhxhsk
Channel Id: undefined
Length: 20min 38sec (1238 seconds)
Published: Fri May 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.