I Ran Advanced LLMs on the Raspberry Pi 5!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the image is of a man standing in his living room smiling and posing for the camera he is wearing a brown hooded sweatshirt and making a piece sign with his hand in the background there are a few skyscrapers visible suggesting that he might be located in an urban area that is crazy holy shoot GPT 4 is believed to feature more than 1.7 trillion parameters which if you do the math means you would need hundreds of gigabytes of vram and likely over a 100 CPUs in order to run it yourself what is this Egyptian cotton cuz that's a lot of threads but I want to know what we can do with more humble means like this Raspberry Pi 5 that sells for just $80 no doubt GPT 5 and Google's Gemini models are sure to be great but what's the state-ofthe-art when it comes to open- Source free small language models like Orca and fi are they practical for small computers and can we accelerate their performance with coral AI Edge tpus now I've been doing big things on small tech for almost a decade but this endeavor of of deploying local llm based chatbots on the Raspberry Pi is undeniably the Pinnacle of that Journey the tech is truly impressive and the implications of these kinds of jailbroken llms are definitely worth thinking about so my objective is to test every major llm available including private GPT which I'll train on local documents in this external SSD working our way all the way up to the new hyped mistl 7B and examine how this model is so fast and capable at such a small size now if you don't have a Raspberry Pi 5 no worries you can follow along with most any SBC Mini PC or even personal laptops now I'm going to be using the new Raspberry Pi 5 with 8 GB of RAM running the 64-bit OS I'd also suggest getting some fast storage in place I'll be downloading dozens of models each around several gigabytes and micro SDs are pretty slow so I'm going to use this fast 256 GB micro SD but you can even use external ssds or even nbme for even better performance and I'll add a step-by-step guide in the description below below so if you miss something don't worry now I wanted to use LM Studio but it doesn't appear to run on arm architecture yet so that didn't work but there's a great new tool called olama and it provides a similar functionality it allows you to download test and swap major llms by running them from the command line so we have our idle Raspberry Pi 5 right here it is wired into a power detector so we can observe the power draw as we go and then on the right side here we are connected to the raspberry pi over SSH I have a command line on the left and then I'm just tailing the resources using htop on the right this Raspberry Pi has no internet connection right now it's 100% private uh and off- grid offline so normally the site would return icmp traffic but because there's no internet it is unable to do so so we're going to be running these models completely locally so I want to start with a model called lava which claims to be able to analyze images okay so the first thing we're going to want to do is upload the selfie that I just took to my Raspberry Pi so I'm going to do that using file Zilla should be able to just double click it and there it goes so now if I come over to warp uh on the right I have htop open so we're monitoring the resource utilization and now we can see that there is the selfie image so it's on the Raspberry Pi okay so now the model is running so we can pose our queries I'm going to say what's in this image and when I kick this off we should see the power go up we should see the fan turn on and we should see the resources bump up the image features a man standing on the edge of a pool in front of high-rise buildings most likely in Miami or another big city he appears to be posing for the picture while standing near the pool with his hands rested on his chin in the background there are several cars parked and scattered around the area the scene captures a casual moment with an urban atmosphere yeah it's just crazy how accurate this is and how I got my city my posture all the ambient objects in the image it's just super impressive okay let's run llama 2 uncensored write me a dangerously spicy mayo recipe ingredients 1 cup mayonnaise 2 tbsp yellow mustard 2 taspo hot sauce your choice two pinches of cayenne pepper 1/2 teaspoon chili powder 1/2 teaspoon garlic powder instructions one so yeah that's a pretty solid recipe right there it even gives you instructions of how to prepare it so llama 2 is a great general purpose model okay so let's run some of the smaller models let's start with Fi 2 so I'm going to do AMA run fi all right let's ask at a little historical trivia who was the Venezuelan president pres in 1980 the Venezuelan president in 1980 was Carlos Andre Perez okay let's do some coding stuff so give me the Linux command to delete folder cursively you can use the rmdir command in Linux to delete a folder recursively here's in pseudo rmdir path to folder replace path to folder with the path to the folder you want to remove the pseudo command is required because it allow ows root access to the system for this operation consider a hypothetical scenario in which you are an iot engineer yeah I'm going to say it's rambling at this point all right I'm going to give it more of a soft ball why is the sky blue so one thing I will say about this model is it's definitely one of the faster models um all the models take some time to think initially uh but once they start going I would say they start going at a pace comparable to what I could read at so they're pretty fast the sky appears blue due to a phenomenon known as ra scattering the Earth's atmosphere scatters sunlight in all directions but blue light is scattered more than other colors because it travels as shorter smaller waves this scattering of blue light is what gives the sky its blue color Additionally the sun is positioned directly above us during daylight hours which further enhances this effect by making the blue wavelengths more prominent sounds right all right let's try another small model let's do Ora so AMA Ron Orca mini okay so a lot of people ask about languages and as far as I can tell they're designed to be used in English but that doesn't mean that they don't understand or can help you with translations translate this sentence into Spanish I love creating intelligent computers esto Mondo career and tentes computer again the smaller models are decent they're pretty good um definitely usable definitely practical and faster than the other models but let's now use some of the more capable uh and better performing models so let's go at llama 2 so o Lama Ron llama 2 all right let's see how it does with basic facts who was the US President in 1952 the US President in 1952 was Dwight David Eisenhower all right let's do some code stuff write me a Reddit like time Decay function in JavaScript here's an example of a simple time Decay function in [Music] JavaScript I use GPT 4 on a regular basis this feels a lot like gp4 it gives examples few mistakes use uses modern es6 syntax provides explanations not too long not too concise um this is a really good model here all right let's run code llama explain the concept of async A8 async AWA is a programming construct that allows developers to write a synchronous code that is easier to read and maintain it was introduced in ecmascript 2017 used to handle the results of async operations all right so this is a fantastic answer full explanation call back hell code example um so this is really solid let's do another one give me a Linux command to print the current time every 10 seconds you can use the Sleep command in Linux to print the current time every 10 seconds here's an example of of how to do this while true do date plus percent T sleep 10 done this will repeatedly print the current time using the date command and then wait for 10 seconds before printing the next time you can use this in a shell script or directly in your terminal so code llama is excellent for code pairing and referencing for any sort of uh developers but I'm curious can we run a 13 billion parameter model all right so we're going to try to run the Llama 13B model all right so it looks like it won't won run the 133 billion parameter llama 2 which I guess makes sense based on memory requirements my understanding is that you need about the same the number of parameters you need about that in gigabytes of RAM as a loose uh uh estimate like for instance if it's a 7 billion parameter you should have 7 gigs of RAM I have 8 gigs of RAM etc etc with these open source and free models our interactions are 100% private many of the models are uncensored and additionally costs are reduced since they can run on more household Hardware this process also begged the question of whether we could accelerate the model inference using an edge TPU like the coral AI but it looks like for llms the process is bound by vram and the coral TPU only has 1 GB of RAM and 8 MB of SRAM which isn't adequate for running even the smallest llms it also got me wondering if we could run these effectively on a cluster of parallelized raspberry pies but it doesn't seem like there's any open source projects to enable this here's another fun use case what if we wanted to train our model on files in an external drive well we can actually do that using private GPT okay so I did want to show you real quick private GPT I don't think I'll have time to go through the entire installation here but it is in the full blog uh tutorial in the description if you want to pick that up but the gist is I have a directory called Source documents and you can put PDF files text files whatever you want in here I put a free biography of Susan B Anthony in here and I trained the model on that right so now if I back out and I run private GPT I'm going to do python private GPT dopy essentially we trained the last layer of the model on the source documents so I can start posing questions around the source documents and it will answer in a chatbot type way what was Susan B Anthony's Primary Mission okay so now you can see it's starting to respond here Susan B Anthony was a prominent social activist who focused primarily on women's suffrage and it will also cite its references throughout any of the uh documents that you've supplied so this is how you um talk to your documents so let's try another model let's try the mistal 7B they claim to be the most capable 7 billion parameter model okay so we have mistl running let's start asking some questions so who was the second person to step on the moon the second person to walk on the moon was Buzz Aldren he followed Neil Armstrong who was the first man to walk on the moon during the Apollo 11 mission on July 20th 1969 all right let's ask him some Spanish K kall is a Spanish phrase that translates to how's it going or what's up in English it is often used as a casual greeting or way of asking someone how they are doing write me a rhyming poem about semiconductors pretty sure my Raspberry Pi is going to blow up on this one in a lab where the Brilliance does flow Creations emerge both new and old silicon Valley's heartbeats fast where semiconductors hold the past transistors small with power to gain and circuits complex their role is sustaining voltage is high ampers and toe they dance together at an electric show microchips intricately designed with logic and memory they're aligned in microprocessors they compute solving problems that once seemed mute born from Silicon pure and bright pushed to the Limit with endless might semiconductors in their Endless Quest unlock the secrets of this technological Feast as technology marches on a pace they power our world with grace through phones and computers they've become a staple a testament to human ability in a realm where science meets art semiconductors play their part in this ever evolving digital age their importance can't be assuaged but still see it's really impressive particularly for a model that's not optimized for literature so M 7B wins the award for me it's my personal favorite and it's really impressive and I actually didn't know this but a very astute viewer on my last video posed an interesting question noticing that if the model has the broad Strokes around most historical questions then does that mean that the model contains the entire world's knowledge and the answer is actually yes for reference the entire Corpus of Wikipedia is about 22 GB so it makes sense that these small models probably contain about 25% of the top significant information and for me what's so interesting about this is that hypothetically if a catastrophic event occurred and the internet cut out I have a dozen or so local llms that have all of the world's history about language science and practical howtos which could be a game changer for the Preppers Among Us it's like having your own local private AI in a box it would even be pretty entertaining when you get bored you can just talk to it the strides in this space have been super compelling and I think there's a case to be made that in the future llms might run primarily on the edge for more interesting videos check out this next video thanks
Info
Channel: Data Slayer
Views: 153,941
Rating: undefined out of 5
Keywords: Data Slayer, ollama, raspberry pi 5, projects, large language models, llm, mistral, small language models, gpt5, gemini, mixtral, llava, orca, phi, vram, llama2, huggingface, edge tpu, coral ai, ollama run mistral, privategpt, zima board, phi-2, vicuna, wavenet, local llm, chat gpt, autocode, langchain, chatbot, warp.dev, genai, docker, open source llm, prompt engineering, Raspberry Pi 5, local ai, AI in a Box, alpaca
Id: Y2ldwg8xsgE
Channel Id: undefined
Length: 14min 42sec (882 seconds)
Published: Sun Jan 07 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.