LLM Ecosystem explained: Your ultimate Guide to AI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello Community let's have a look at the complete ecosystem here of large language models April 2023 so we start here in 2022 GPT 3.5 was released and you could use GPT 3.5 with two different access path either like today you have chat GPT is more or less a free version you can use the access or if you want to do it professionally open UI has an own API for you where you have to pay to get access to run your things on an llm and you get your data back beautiful so the first access we have here a prompt a prompt is simple a line where you can type something in we call here the intelligence you can apply here prompt engineering because you can write a clever prompt so instead of hey show me what you know about photosynthesis you can write give me a detailed explanation of the biological and chemical processing on a molecular level regarding photosynthesis so and of course if you are a professional company you can write a Python program so you can directly interact here with the API of GPT 3.5 this is great it took openai about six months to come up with GPT 4. GPT 4 as you can see in the size is now able to handle text images and code so if TPT 3 was about 175 billion trainable parameters we expect now that the size of gpt4 which is top secret is about 500 billion trainable parameters let's work with this assumption we have the same access to the system there is a jet GPT plus this is also a prompt base this cost you about 20 US dollar plus tax per month if you are a professional Bank of America a company and you need here massive access open AI provides an API for you where you can write your little program and you have here automated interface to gpt4 this is where we are now in April 2023 but you know what let's look a little bit back just three years when in 2020 Google came up with also a specific model that is based on the Transformer architecture it was called T5 and T5 stands simplifier text to text transfer Transformer and this model had a size of 3 billion trainable parameter and 11 billion trainable parameter and at this time in 2020 this was quite impressive so what happened next is that Google discovered that if you have your data here on the left side the internet data and GPT suite and gbt4 both have been trained on all available internet data and of course we assume that they respected all intellectual property rights when they copied the data from the internet you have data that are not available on the internet and you see this here on the right hand side you have special data that companies keep secret you have special data that are your private data your private photographs your private medical records you are private whatever but of course we are focusing here on company data if you were for example a biomedical company somewhere in the world and you have your research team all your research you do not put it for free on the internet so there's a lot of special data that are private protected but you know you want to use this data if you are a company so Google find out if they take now one piece here of this special data and for one task and let's say this is a very special task for example a translation task to go from I don't know the French language to a language that is spoken I don't know somewhere in the South Pacific so you can be sure that there are not a lot of available internet data but for one particular task and one task only you want to have a very focused special high performance system and they said if we find tune now here our training and all our systems here gpt3 3pt 4 have been pre-trained of course they digested all the internet data according to the algorithms and now with the training they have they were able to perform tasks but if you are not satisfied with just the pre-training you can find you now your system for one particular task and this is what Google discovered and called it fine tuning and in 2022 Google came out with the flan T5 models now they had the same size as their T5 models but plan stands for fine-tuned language models so you see Google of course having access to Google Cloud computer infrastructure they did not fine tune it on one task but they fine-tuned it on hundreds and thousands of tasks so those who happen in 2022 and you see now the data that we have on the internet and the data that we have here in companies that are highly protected data you see you want to use your company data to have access to here this artificial intelligence of an llm of a transformer architecture for artificial intelligence so people started to use this fine tuning with their private data on their dedicated Hardware now if you think that about let's just just an imagination GPT 3.5 you need about 1 000 GPU on the Microsoft supercomputer cluster to train a GPT Suite let's say gpt4 we are somewhere in the bracket of 1000 gpus to 10 000 gpus as your Hardware configuration that you are able to pre-train GPT form the real data are secret and of course if you have here access through either the prompt or through your API your python file you are not given any information about what's happening inside of gpt4 this is a black box for us that are paying to get access and of course talking right now in April 2023 you cannot even fine-tune here gpt4 for your task because it is so already pre-trained that if fine tuning here is currently not established but I'm quite sure they will do it because you can ask for money so beautiful Next Step with this fine tuning here something interesting happened because in the last one and one and a half here you found out that you do not just can prompt engineer your way here with this free access but you can do something that is called in context learning it is an advanced prompt engineering methodology if you want where you take now here piece here of this non-public data set you slice it up in in in tiny chunks and then whenever you have for example here with jet GPT plus where you have here the the one line you put in here one piece of information and then you ask something you put in a second piece of information and then you'll put your question so you can feed in one two three four five a small trunk of information and then ask the system something since this here does not change at all the weights the learnable parameter of the gpt4 system because it is top secret you have something that is not a fine tuning because in fine tuning like here we saw in Google all the learnable parameters all the three billion parameters are the 11 million parameters after flying T5 model were learning on the new training data this here are the training data a special data for you and your company here since you do not have access to gpt4 the training learnable data here in gpt4 do not change this is something imagine that you have this huge system and only on a very small tiny surface area you can put in your question and more or less only this information goes only to the surface area of your system is interacting there it is learning there for a very tiny amount and you can get some result back this is the beauty of in-context learning and if you chain this together you have methodology where people have developed methods that are free we have to pay for them so everything when you chain here commands here for in-context learning whatever you use it here we are trying to interact and have a learnable service area where our input data finally hit here gpt4 but you are not able to learn to train the system on new data beautiful but this of course opened somehow the box of Pandora because people said hey I have an idea now after Microsoft and Google there was now it's now 2023 there's no meter and meter you remember Facebook developed an own model knowing not to develop the model but it just drained a model on data and they came up with their model they call a llama and this model is available in four sizes small medium large XL or if you want 7 billion trainable parameters 13 33 and 65 billion trainable parameter now you see that the size here compare this to gpd4 is really really tiny but the advantage is that 7 billion is something you can somehow handle in your compute infrastructure at home in your company because who can afford thousands of gpus only Microsoft supercomputer Center or Google Cloud so here we go now we have since some months available this meta llama models but of course meta says hey our weights are top secret so you have to fill out a form you have to give us your address your name your email telephone number you have to write us your scientific publication and then we decide if we Grant your access here to our model and without the weights you have nothing now it happens that somehow this information leaked I don't know how and it was available on the internet and people suddenly used this because now they can't could build this model now if there is One Singular source of information for all the scientific community that are working now with meta llama models is this if you think about the risk that we have no idea about a leaking process by a company who wants to acquire market shares here in this particular area I leave this discussion completely up to you but suddenly we had models now apart from the flying T5 that was open source by Google we have here a secretive but somehow somebody temporarily borrowed the data from the internet and then this opened up something because people discovered they need training data they need a lot of training data but if you have data.r intelligently configured that have a lot of inherent information you could use this data this condensed let's call it intelligent data to feed those data to hear the very tiny smallish 7 billion parameter model I mean just look at the size difference of this little blue cube compared to gbt4 this is what ninety percent of the AI Community has access to in Hardware because I will show you later on how much it cost to have just access to 8 gpus that are able to handle this amount of data so you see people discovered a grade I can if I do not have now for example my secret company data if I have a small set of data that has a particular form think about this you want to write a poem so you as a human you sit down you write a poem and you come here now you say hey uh gpt4 in this interface I show you a poem please write now 50 000 version of this poem change the verb change the object change the time change the person in the poem change the outcome change the whatever you can change in a poem it is just an example that you see we have now this let's call it intelligence here that if you pay of course this is not a payable interface if you pay you get 50 000 similar poems back and now with this synthetic artificially generated data we call this data self-instructed data because we have one human instruction or 100 human extraction to be better and we self-grade we let the system self create here 50 000 similar copies now for our training now and we use here this minimum system if we use Now 50 000 training data set this is the right amount that we can now kind of fine tune our mini system here for a particular task and this was done here by Stanford University and they called their fine-tuned system the alpaca system maybe you have heard about it so this is the process how you have now access you have some data or you write some data you make here an instruction you ask GPT to create thousands of similar instruction because we need a huge training data set to feed in to start a training process of this tiny little model here it needs tens of thousands of training examples to develop if you want a system intelligence of an llm and you come up with a new model and Stanford University called it alpaca now you will notice that in the last weeks you have a lot of models you have wekuna you have GPT for all and in the next weeks there will be a lot of models but all follow the same path you construct synthetic data out of gpt4 and you feed it in as training data to the tiniest model we can imagine and this is more or less the latest model but you know if next week another company comes up with a next model we will all jump to the updated model so whenever you watch this please fill in whenever you have here your company and the latest model whatever it is called instead of llama of course you can have now since T5 is open source I use T5 because here the Llama weights are semi secret semi-protected no they are absolutely protected but there's way than a research Community can use it so this is more or less the path where we are now in April of 2023 so you understand what is going on at the moment and I have given you here some additional information concerning the hardware infrastructure you need but you see this path here to fine-tune this model with this huge data set here you have two options you have if you have not enough money to afford the brilliant compute infrastructure that you go to Microsoft and you rent there a super mini supercomputer you have a way that is called path parameter efficient fine tuning and the other option is a classical fine tuning so we are talking about fine tuning methods but in the first fine-tuning methods we say we understand our technological compute infrastructure is so reduced that we have to freeze the model and only work with less than one percent of the weight tensors in the layers of our Transformer architecture of our large language model so simply because we need huge compute infrastructure we find a way to reduce the trainable mathematical objects to less than one percent of what it really is a classical alternative if you have the infrastructure if you have the money to rent your supercomputers you go the classical fine tuning older weights of all your tensor operation in all layers of your transform architecture of your large language models or set to trainable and will be included in their mathematical update so 100 of our tensors and AI is tensor operation matrix multiplication will be updated so this is the optimal tuned model for this since now instead of one percent of our mathematical objects we train now 100 of all our mathematical tensors we need a huge infrastructure and this is typically A system that we call from Nvidia and H and Harper architecture h100 GPU tensor core GPU and we need eight pieces of them and they are bundled here in this configuration and alpaca for example by Stanford University was done exactly in the way number two they used here the classical fine tuning after they generated this synthetic data using the intelligence of gpt4 to come up with let's call it self-similar data but in a huge quantity of course if you have enough data in your company you do not need to go for the synthetic way you can take hey I have here so many datas think about your washer you are a huge company you have so many data you can train your own model of course and this is now the third way I would like to show you we call this instruction fine tuning and we use this for complex data so first way second way third way to fine tune now your model on this particular data set and now you see here we do not need to go to gpt4 for the synthetic data but we have now in our huge company for example our secret data that we do not want to put in the internet for example you have quantum chemistry you have a lot of data about your quantum chemistry a lot of information about quantum physics and Q accident you work here in the medical area and you have some secret data about the latest research development in Pharmacology so if you want to find new new medications or new whatever you're researching for you can combine physics chemistry and everything else we know about together and train this let's say there are 1000 research paper here 1000 research paper here and 1000 research paper here and you are looking for patterns you are looking for common patterns for interlinks no human has discovered yet so you feed this kind of information here now over here what we call instruction fine tuning we use for complex data set that have a hidden pattern structure and we feed it either now I would not recommend it the smallest system but if you are a mid-sized company you can afford to rent here for example a model that has 30 billion parameters or 60 billion parameters but you do not need this huge size for everything for everybody but you are just interested here in this particular data set and you just want to train on with a strict focus on this data set for new medications for example you are absolutely fine with 30 or 60 billion trainable parameters since you only have one task and forget about the political news forget about architecture forget about I don't know poems you don't need this citizenship 54. you can build your own highly dedicated system that is really focused and really powerful for only one or two tasks like those one so you see this is where we are today we call it llm ecosystem of April 2023 and I just wanted to give you an overview that you understand if you hear wordings like theft or Laura that you understand what is happening how is the interface between system what a public data what a secret data what a company data what a corporate data that a highly protected how is fine tuning happening how is in context learning what is the difference and what are today more or less our three main methodology to take a little llm system the naked architecture only the pre-trained version and then invest in the fine tuning for your particular corporate interest for your aim for your goal you want to achieve with this system and the last thing I want to show you if you are now taking the the case of alpaca from Stanford University they paid and total six hundred dollars for the computer infrastructure but you know what just to generate the data they had to pay here of my Microsoft 500. and then actually to run it on the compute infrastructure on the supercomputer Center Microsoft they only had to pay 100 so the generation of data the value of high quality data in this particular example the data are five times more valuable to generate them to have them then to run here the operation in the supercomputer Center that you do the calculations and you build up here this specific artificial intelligence for your dedicated task so you see data compared to run supercomputer to create a specific intelligence five to one in the case of alpaca so I think this is it for the first video if you have any questions please leave a comment in the description of this video I hope I brought some idea and clarification some understanding in the world what's going on today this is my knowledge I like to share my knowledge with you and I hope you enjoyed it
Info
Channel: code_your_own_AI
Views: 44,200
Rating: undefined out of 5
Keywords:
Id: qu-vXAFUpLE
Channel Id: undefined
Length: 27min 2sec (1622 seconds)
Published: Sun Apr 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.