How to Run a ChatGPT-like AI on Your Raspberry Pi

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey there my name is Gary Sims and this is Gary explains now generative AI is changing the whole Tech landscape almost on a weekly basis and here on this channel I've shown you how you can run a chat TPT like AI chat bot on your own local computer and I've shown you how to do that using the llama.cpp product and using LM Studio you can do it on Windows you do a Mac you can do it on Linux but here's the amazing thing did you know you can also do it on a Raspberry Pi The Humble Raspberry Pi is able to run generative AI so in this video I'm going to take you step by step so even if you're not too familiar with the command line you should be able to do this because I give you all the commands step by step on how you can get a chat GPT like uh chat bot running on your Raspberry Pi no Cloud servers no big gpus all just running on your Raspberry Pi so if you want to find out more please let me explain [Music] so first of all you are going to need a Raspberry Pi 4 with 8 gigs of RAM it could be possible on one with four gigs of RAM but I'm doing this on one with an eight gigs of RAM and you need to make sure you've got enough storage space so I've used a bigger SD card let's say 128 gigabyte one or you've got some kind of storage plugged into one of the USB ports and you're going to use that because a lot of the files we're downloading are four five six seven eight gigabytes uh inside so you need the memory and you need the storage okay let's head straight over to the command line okay so here we are on the command line on my Raspberry Pi this one has eight gigabytes of RAM as I was saying earlier now we need to install some software so the first thing to do before you do that is to make sure that all the repository information is up to date you do that with sudo apt update that will then go ahead and download all of the latest repository information so that you can go ahead and install now the first thing you need to do is make sure you've got a C compiler installed and so what you need to do is say sudo apt install git because you need git G plus plus and build essential and that will basically make sure that all the stuff you need for compiling software in C C plus plus using git making all that kind of stuff will be installed now once you've done that you need to get hold of the source code for the Llama dot CPP project so this is a project that allows you to run these large language models only using the CPU which is of course great for a Raspberry Pi now you do that by using the git command so git clone and then here is the address now all of these instructions including the ones we just did a moment ago will be available in a document in my GitHub repository and I'll leave a link to it in the description below so you can cut and paste these things easily so basically there's this project here and you're going to get a copy of the source code which will be downloaded onto the raspberry here this should happen fairly quick which you have has done and now you've now got a directory called llama dot CPP so we'll change directory into that direct into there and then we want to build the software so you do make and you can use minus J which kind of speed up a little bit so this will take a few moments and we'll come back once it's finished building now once the software is built you're going to need a large language model to actually use against this software now there are of course quite a few of them knocking about today and the ones that are producing the most interest are these ones based on llama and that's a large language model that comes from meta in other words that's Facebook and there are different versions of it some with seven billion connections 13 billion connections 34 billion connections now obviously we're running on a Raspberry Pi here and we're running with limited i o and limited memory limited CPU power so we're going to pick a small one the seven uh billion one now if you go to this guy here on the hugging face page again links are in that document you can see who's got 1151 models available that we could pick from many of them are based on llama llama 2 llama instruct llama code but you have to pick the model that you find the most interesting when you click on a Model you actually get to see what it what its Heritage is where it's come from why why does it exist so for example this one is the standard llama 2 7 billion connection version and it's in the right format for the Llama dot CPP project so what we need to do is download this now when you click on files and versions over here you'll get lots and lots of version this is all the same model but they've been quantized now quantize takes the model information and tries to represent it in less data so rather than it being in let's say floating Point 16 bit floating point it tries to reduce it down to let's say eight bits or four bits and that and actually with a with neural network models that actually isn't as bad as it might sound if you're only measuring you know the amount of money in your bank account like that way that might be quite frightening but these models it actually works quite well and there are different methods for doing this and that's what all this stuff is here and basically you can see the difference in file size so look this one is seven gigs and this one is 2.83 gigs now what I'm going to do is I'm going to download the Q2 version because it's the smallest one you may have some success with the Q4 KS versions but these ones run quite well on a Raj by the bigger you go the slower it's going to be the more complex it's going to be so you need this link here so what we do is you need to just right hand click on here and say copy link address which is what we're going to do and then we're going to use that again over on the command line so back here on the command line you need to go into a directory called models that's what's going to store the models and then you need to do a double you get that's a way of fetching data over the internet and then I've cut and paste in that UI RL and it will start to download it and because it's two gigabytes that will take a while but let's just we'll wait for that to finish okay so now that's uh downloaded we can see here is that model the Llama seven billion connections but it's the Q2 K version okay now we want to do is test that so what you do is go up a directory back to the llama.cpp directory and the reason is because this is where there is a program called Main and that is a file that we're going to have to run now that is the the main executable and so what you do to test this is we can cut and paste in here a command line that will give it a test and so what we do is we say main that's we want to run minus M the model well it's model slash uh that one there and then really all you want to do is there's this kind of quick way of testing it and that is where we say minus P for prompt building a website can be done in 10 simple Steps step one and then these parameters here are ways that you can can make the model do certain things minus n talks about how many tokens it should remember but if you just run that with a built-in prompt it will start running and actually answer that question as if that's what you typed into you know like like chat GPT or LM Studio as I showed you in my previous video if you just start typing this in it will start brilliant now of course this is a Raspberry Pi we're not expecting it to go blazingly fast it does take a little while here at the beginning while it loads all that data into memory and starts to access it so it can run the uh the the model but then once it starts running it will start producing the tokens like you would expect okay so after a bit of a delay for all that to load up you can now see it's producing the tokens which are the answer to this question Define your website's purpose and target audience and it's not producing them lightly fast but as I said you're running this on a Raspberry Pi which is just absolutely amazing and for me gives me kind of the hope that when this technology gets more and more refined as we get to understand it more and more then I think we are going to see the ability to run these things on relatively modest hardware and that will be quite interesting for the consumer market for when we can see what what we can get built into you know fridges you know let's just you know talk about at that level this could be quite interesting to see what this can happen now if you let this run we're on to step two already it will just go ahead and it will produce probably six or seven steps it won't actually produce 10 in this particular case okay so we'll interrupt that now and we'll look at how we can make it more like the chat GPT experience with the interactive one where you can kind of you know write back and forth and so there's another command that we can run here again this is all in the document that allows us to actually make it into interactive mode that the I here is what turns into active mode these are the various settings that kind of turn it more into that chatty way and make sure it talks to as well so we can just run that model now and again it does take a little while to load up and here we go I've now got that prompt so I could ask it a question like uh what is the best time to visit London question mark so I'm kind of asking you know just that kind of Chatty question and it will go ahead and start producing tokens giving me the answer there's a better time to visit London depends on your prefer preferences and priorities notice how this is what GPT is these are token so notice the word it doesn't say priorities one word that was actually two tokens uh so it's interesting to watch it so don't think it's just producing words when you're dealing with a g a GPT is dealing with tokens which can be syllables or infractions or symbols of words so London has a mild Oceanic climate uh comma that was a token so it and so we could see here we can just let it go through uh the uh the the general interactiveness here there's not that fast because we're running on a Raspberry Pi however if you want to try this out if you've only got a Raspberry Pi if you want to see what you can achieve leave on a relatively low end piece of Hardware look at this absolutely amazing uh you know if we could have thought a couple of years ago that we could have software that could answer questions directly like this um and on a Raspberry Pi and give us sensible answers well I don't know if I not believed you but here we are we are in the era of generative AI music images text and it runs even on Low End Hardware and it's only going to get better okay there it is so the humble Raspberry Pi can run the Llama 2 model and you can get the same answer out of it as if you would out of a much bigger computer of course it does take a little longer but great for playing with great for experimenting do let me know in the comments below if you've had any success running this on a Raspberry Pi okay that's it my name is Gary Sims this is Gary explains I really hope you enjoyed this video if you did please do give it a thumbs up don't forget if you like these kind of videos hit good to subscribe to the channel after this you'll see all of my handles for social media where you can follow me okay that's it I'll see in the next one [Music] thank you [Music]
Info
Channel: Gary Explains
Views: 28,025
Rating: undefined out of 5
Keywords: Gary Explains, Tech, Explanation, Tutorial, ChatGPT, Raspberry Pi, Raspberry Pi 4, Pi 4B 8GB, 8GB, Llama, llama.cpp, LLM, Large Language Model, GPT, Generative Pre-trained Transformer, AI-powered language model
Id: idZctq7WIq4
Channel Id: undefined
Length: 11min 46sec (706 seconds)
Published: Tue Sep 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.