LLaMA & Alpaca: “ChatGPT” On Your Local Computer 🤯 | Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I will show you how to run language models that are almost as good as chat gbt on your local computer and for this I will first show you the Llama model which is a foundational language model and is able to predict subsequent words for a given input sequence or given input sentence and on top of that I will also show you the alpaca model which is a fine-tuned version of the Llama model and is able to follow instructions and that is a behavior that we already know from chat gbt so more or less we can run our own chat jpt on our local computer Isn't that cool and I will show you today how to do it but before we start with setting up the Llama model let's first have a quick understanding what the Llama model actually is and why it's so cool and for this the first thing that I think is important to know is that a language model or a llama model is called the foundational language model and to understand what that means it's a model that predicts the probability distributions over sequences of words tokens and characters and what that actually means is that it does the next token prediction so basically the Llama model predicts the most reasonable word given a sequence of input words but what does this model make special and for this let's look at the abstract of the paper which already reveals it a little bit that lammer model with 13 billion parameters outperforms the gbt3 model which has 175 billion parameters so we can see the Llama model actually contains way less parameters and is able to outperform the gbt3 model on most benchmarks which is very impressive and also another fact it is open which is partially true so it's open for research you can request to get the model weights from meter but this is also partially true because somebody already leaked the weights of the model and and you can follow actually the chat or the thread about this in in this pull request on GitHub where somebody tried to more or less sneak in the link to the model weights which you can see here uh yeah very funny also like the conversation that happened afterwards um that people yeah hopping on to that looks good to me and uh yeah as you can see many started approving um this poke request so yeah technically now the model weights are open and that that's also uh why we can use the model on our local computer and before we start with setting up the Llama model on our local computer I want to mention two more things uh one is that the model was published in four different versions which vary in the amount of parameters each model contains and not very surprising the more parameters the model contains the better results it can achieve on different benchmarks but what is quite interesting here we can even see the model with the Furious parameters and she's quite comparable results sometimes even better than the gbt3 model and sometimes slightly worse but overall very comparable results while having tremendously less parameters than there's gpt3 model and that's also why we are able to run it on our local computer and the last thing is it's a funny note and this guy was actually even able to run the model on his like the 7 billion model honest Raspberry Pi 4 with a 4 gigabyte RAM uh apparently it took him 10 seconds for one token which is very slow but hey that's definitely an improvement and isn't that crazy that those large language models are even able I mean like it's very slow but they can now run on a Raspberry Pi who would have thought that one year ago okay and now let's run the Llama model on our local computer and for this we will use the Dalai Library I mean like you can see what's going on here Dalai Lama I personally like this kind of humor uh calling the library Dalai and to be able to run the Llama model on our local computer we only need four comments which is I would call it very handy it's absolutely impressive how easy it is to use such a huge language model now now like just on our local computer running for comments okay but before we install the Llama model on our computer first I want to show you that the Dalai model works on Linux Mac and windows which is perfect you have the system requirements unless you have a very old computer definitely it should work on your computer and one thing that I figured out while installing the alpaca model just needs four gigabyte which maybe sounds much to you but if you compare to the Llama model that actually takes up to 31 gigabyte on your computer I hope you have so much space I won't complain the library is really awesome thank you you so much for creating it it's really cool and yeah as you can see the larger model versions of the Llama model are even bigger so I hope you have some space left on your computer and another thing my note version was too low so I had to update my note version to 18 or higher as required in this module so make sure that you have at least note version 18 for Windows there are three different things we cried first make sure that Python node.js and C plus plus are installed on your computer so one way to do this is to download visual studio and check those three boxes so it will be installed in the background you can theoretically I'm pretty sure also just install Python node.js and C plus plus separately on your computer that should also do and then one last step is Linux here you should use a python version that is lower or equal to 3.10 and also make sure that you have a node version of 18 or higher and at the time I wanted to run those four comments that I mentioned earlier as you can see in the background the API already changed again actually during the day the API was already changed and now it changed again so definitely expect that after the video is published there might be still changes to the library and yeah and in case the comments you see in this video don't work definitely make sure to check out the GitHub page because here in this area you'll find how to install both models using the dollar Library which is very convenient for us so what we will do then is just typing in npx dollar llama install and only 7B because the 13B model we don't need that right now it's just loading the 7B model will already take long enough all right and once the installation is done we can start using our llama model and for this we just type in npx Dalai serve and then our server will be started so and then you can just start like you know today is a good day because and let's see what llama generates for us I had this issue earlier today and this is not the expected behavior from the Llama model and I found an issue on GitHub for exactly this one it was open 17 hours ago pretty sure maybe one of the changes has led to this weird production of characters I will definitely in the comments or description keep you updated on how to fix it but I'm pretty sure that will be fixed soon because that's not the expected behavior of the Llama model I wish I would be working now but maybe by the time you're installing the Llama model on your computer it's already fixed and you don't encounter any of of this Behavior and the expected Behavior would be though for example if we take again today is a good day because and then the model would predict something plausible or realistic which could be for example the sun is shining will be very reasonable assumption why the day is good or my maybe because another AI model got predicted yeah and for now let's just move on to the alpaca model because I already tested that one and that works on my computer and I definitely want to show you this one but first let's talk about why the alpaca model is such a big deal right now and why it's so impressive and the alpaca model is fine-tuned is a model fine-tuned on the Llama 7B model so it's the version with the fuse parameters and it behaves qualitatively similarly to open my eyes text DaVinci 003 which is the GPD 3.5 model and what's very interesting that the model has qualitatively similar capabilities or results while being small and easy to reproduce because it only cost less than 600 and if we look at this tweet from will Summerlin who's working for AR car Investment Company he summarized that in 2020 training a model like GP G3 cost around 5 million dollars while in 2030 it's expected to cost around 500 and now we basically traveled seven years in time because alpaca model already achieves results close to The DaVinci 3 model or the church gbd 3.5 model and this is while only costing less than six hundred dollars which is the 500 or like around 500 per post here isn't that insane that basically if you or me have a few hundred dollars and we can train a model that is almost as good as chat GPD model or the GPD 3.5 model I think that's really a huge achievement looking at how they were able to do it I would still say that this is not entirely wrong because the model got trained in a supervised Manner and for that they use the gpd3 model to generate instructions and I would say you can think of this the gpd3 model is a teacher while the alpaca model is a student and a student gets better and better and better and closer to the level of the teacher so I would say the gpg 3 Model is the upper bound from my understanding for the alpaca model but it's almost as good and this also means for companies like open AI that theoretically with way less resources you can create models that are almost as good as theirs using this self-instruct technique okay and now let's run the alpaca model on our local computer for this we will type in again npx Dalai alpaca and then install 7B and this again will take some time not as long as for the Llama model but we still have to download like three to four gigabytes okay and once this is done we can run again npx Dalai serve and our local server is running and Ebola now we can start using the alpaca model and the Dalai web interface and I thought I would just ask the model a little bit of her philosophical question what is the most important turned innovation made by humans and let's see what alpaca answers us to that okay that's all of it ironic that the language model answers the most important invention is language itself which allows us to communicate share ideas thoughts emotions but like it's definitely a valid point that's very important uh actually I like the like the answer didn't expect that and one last thing for today's video is I want to show you that Dalai also has an API so in case you're have already ideas how to use this you can use the API to just pass your questions or prompts to your model and get your replies in a interface that looks different than the one we saw earlier I'm pretty sure many many of you are already getting creative right now and let me know in the comments about your ideas and your projects I'm Keen to hear about them and yeah that's it for today's video I hope it helped you running the alpaca and llama model on your local computer and getting creative with them if so give this video a thumbs up I would really appreciate that also subscribe to my channel if you like my videos and until then have a great time and I'll see you in the next video bye
Info
Channel: Martin Thissen
Views: 199,755
Rating: undefined out of 5
Keywords:
Id: kT_-qUxrlOU
Channel Id: undefined
Length: 13min 7sec (787 seconds)
Published: Sat Mar 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.