Using Ollama to Run Local LLMs on the Raspberry Pi 5

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this tiny computer is a Raspberry Pi it's made for schools and loved by makers and more specifically this is the Raspberry Pi 5 which was released a few months ago now this version is an 8 GB of RAM model costs just £80 in the UK or $80 in the US if you're lucky enough to be able to get a hold of one so this tiny computer can be used for many things but specifically in this video I want to show you how you can use that 8 GB of RAM for running an open source large language model on your own network and what sort of benchmarks we can get versus say something like the MacBook Pro that I've used a ll on in the past so that all said let's get started so I'm on my Pi five um I'm going to try and install a ll on this and see how it goes I should be able to run the AL instructions and just see how they pan out so if we just copy that curl commands and uh paste that see how that does Okay cool so that seems to just gone in and installed straight away so if you're not familiar with a llama you can go and pick up any of these models it's got listed here so we got mix dra we got uh llama 2 tiny llama code llama um I'm just going to try and run tiny llama at this point we can just run tiny llama and it pull down that model so let's see how we do I've never run tiny llama before so this is going to be a new one for me so I'm running raspian as you can see and I've updated everything installed all the latest packages and I haven't installed anything else I literally just installed a llama there okay cool so it's pulled down everything let's have a question classic why is a sky blue see how that does the sky blue is a natural color why is the sky blue oh that's inter interes in the way it's phrased that I'm guessing this is just basically down because it's a tiny llama which is not as big model as other options okay so it's actually work which is superb so I'm actually pretty surprised that that got installed so quickly and was so easy um I'm going to try out a few things the fan did kick in on the um heat sinks there when I was trying things so it is obviously uh using the CPU a bit be interesting to know when we do this um so we can run this for both commands so if I do a llama run tiny llama d d I think it is uh then what me do why do same thing we should get some stats out in terms of how fast it's generating those responses now when I was doing this on my M1 Pro on uh llama not tiny llama we're getting about 202 a second I think and on my M1 I think it was like 17 or something like that so eval rate 12.9 tokens a second that is not bad that's prompt EV right so the EV is 10 tokens so roughly half what I was getting on the M1 Pro which is not too shabby we could actually do a fairly a better comparison and if we pull down the other model so we say buy and come out and then do llama run llama 2 I'm actually going to pull down the uncensored one because um llama 2 is pretty restrictive it's pretty um aggressive with it the restrictions apply so you could ask for a really spicy s so I think in my other video I asked for a Rex for in Python and it it wouldn't give me the answer to those rexes because um it felt that they were inappropriate and that I might be trying to do nefarious things with them so this is saying it's going to take about 10 minutes so that's obviously a 4 gig model now just wait a second and let that pull that down okay cool that's all finished downloading as well as the Llama 2 uncensored model I've pulled down uh L as well because I wanted to check out if it can do um how well it CES with doing image kind of interpretations so let's first do a llama let's run the Llama 2 uncensored and see how that fares and in fact actually let's uh let's do that with the ver post command again so I'm going to prompt it with can you write a regular expression to match email addresses addresses um so in previous video when I did this it actually this is the reason for using uh the uncensor version because then this doesn't get caught like I said things is a little overzealous stuff and that generally is to do with the initial system prompt you can see that this is much slower than the tiny llama that we running okay so it's doing it in uh JavaScript I didn't actually specify that or in Python but there we go that's fine I have no idea if that's going to match an email address well this is yeah this is this is really slow in comparison so you probably want to be using one of those smaller models so yeah this is the 7 billion parameter model I didn't state that but it it says on the Llama website under the un sensor the um llama 2 model that the memory requirements are that 7 billion parameter models require generally 8 gig ram which we've got here but you can see that it's not it's not fast okay yeah so you can see there we've got an e rate of 1.78 so tiny in comparison to what we had just now with the um tiny llama so obviously the model is that much more bigger it's double size we've gone from 1.78 to 3. 78 um I think that's a 3 billion parameter model let me have a squiz at the website in fact actually no it's a 1.1 billion parameter model which is obviously a lot smaller we're going from 7 1.1 billion to 7 billion and we getting much slower eval rate so this is probably not the way you want to go you probably want to be using something like mistol on this or in fact tiny larm is a good option there because it seem to be going pretty fast I'm going to try this um image as well so I download this image into downloads is a picture of the Raspberry Pi let me see if I can get it to understand that because that would be pretty awesome to know that it can do that as well so let's run lava and we're going to run that verose as well man got an absolute tweet storm going on in a tree in my garden this happens all the times okay so let's see what's in this picture home in downloads image JP I think that's what it was called okay let's go wow this is this is slow and you got no feedback um is the only is the other thing here we're not seeing anything aside from spinny snaked and it's finally responding with an answer here we go the image features a close-up of the back of a computer circuit board green and yellow Compu board has many screws on it attaching various components detail view showcases and inner workings of electron devices such as laptops or computers so it's obviously looked at that image and it understands it and it's done all that locally which is really impr impressive it's not gone out to a third party service in order to do that it hasn't been able to pick anything out from the image file name because I've made sure that it's not identifiable from what I've named the file so that's really impressive but it's incredibly slow it took how long did that take total duration 5 minutes 33 so a long time we've obviously got all of the features that um alarm has as well such as the API stuff you can go and check my previous videos if you want to see how to do that but yeah I hope you found this useful uh let me know if you're going to be trying it out on your own rosby Pi I'll speak to you soon in new video and check out one of my other videos on alarm there'll be one popping up in a minute probably okay bye for now bye
Info
Channel: Ian Wootten
Views: 29,671
Rating: undefined out of 5
Keywords:
Id: ewXANEIC8pY
Channel Id: undefined
Length: 9min 29sec (569 seconds)
Published: Wed Jan 17 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.