Private AI Revolution: Setting Up Ollama with WebUI on Raspberry Pi 5!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey Robert makers hope you're having a good day so far so you want to learn how to build your own private chat GPT server that completely runs locally and it's secure using a raspber pi 5 or possibly raspber Pi 4 then this is the show for you so let's die straight in my name's Kevin come with me we build robots bring them to life with code and have a whole load of fun along the way okay let's get over to our slides and then we'll get straight into the demo like we have done the past three weeks that seems to have gone down really well so today's session is all about something called olama it's a private offline AI like chat GPT clone but it's far better uh because it's free and it's something that you can run without sending any data to the cloud we going to show you how to set up olama and web UI which is the bit that makes it look just like chat gbt on a rasby pi 5 using Docker it's can't be simpler I've actually created a Docker tutorial a little course on kevs robots.com so you go there go to the free courses you'll find that just in case you haven't got Docker installed it's a piece of cake honestly going to have a bit of obviously have lots of demos today we're going to look through all Lama we're going to have a play with that quite a bit we're going to look at Lang chain which is something we can use basically or l in our python programs we're going to build a really simple program uh using that and I'm going to have a demo of that obviously working and if you're here for the live stream obviously you can hang out we going have a bit of a Q&A bit of a chat and I'll show you some cool things I've got recently in the mailbox quite a few new robots that I can't wait to show you as well okay so oh the other thing I can't let today go by without mentioning it is we've just passed the 25,000 subscriber mark on our channel so thank you everybody that's joined me so far and if you've not subscribed already then you're missing out you're not getting the notifications when new stuff comes out and also a few bits and pieces behind the scenes as well that I I provide to people who are members so what is aama so this is a large language model that's why it's got the llm in the name they came up with this kind of silly name for it Lama but it's a really cool product so it's an offline private AI similar to chat GPT so you can chat away you can type things in you can send code or pictures or documents all kinds of stuff and it can interact with that uh it runs locally on your computer so you could actually run it on a Mac windows or Linux computer we're going to be using a rasby pi 5 because why not uh it does not require an internet connection so there's no data being sent up to the cloud you simply download the pre-trained model and then you just use that model it ensures that you've got user privacy and security because obviously not sharing anything else with anyone and there is this web UI component to it so instead of just being allor which is kind of like a API and command line the web UI makes it look just like chat GPT so userfriendly interface for interacting with all armor provides its graphical interface easy to use customizable so you can configure it however you want can use all different types of model as well we'll look at that and it simplifies the process of sending queries and receiving those responses and like I said it is just like chat gbt but slightly better CU it's free so why would possibly want to use this so I we've touched on there privacy and security so if this matters to to you all your data is processed locally uh ensuring confidentiality so you're not sending or sharing any data with anyone it's just running on your local machine so there's no risk of that data being shared with third parties analyzed all that kind of stuff it's your data you keep it so accessibility liability so you can access all these functions without the internet so you don't need to be connected this could be running you know on a solar farm or whatever um it doesn't require any kind of internet connectivity whatsoever no dependencies on external servers so it just means it runs locally and the fact we can run it on rasby Pi 5 means it can run in a really small form factor as well now you can customize this very heavily so one of the great things with Al and Lang chain is that you can just configure every layer of the AI that you could possibly conceive of Sig full control over that you're not being forced to to use a particular type of model you can use whichever one you like and you can even tailor the AI to your specific needs so if there's a type of a AI that you particularly want to use so say you're doing lots of coding there's coding specific language models if you're doing something where you like interact with the images quite a bit there's language there's models specific to that kind of analysis as well so you can pick the perfect uh model for your usage and cost efficiency wise well there's no ongoing subscription fees there's no like come back later we're busy it works whenever you want it to work and obviously reduced internet bandwidth once you've downloaded those models and you don't have to download those via the internet you could have you could side load those by USB or whatever you don't need to have any kind of bandwidth being used there and being charged for that either so lots of reasons to use this so let's have a look at it straight away shall we right so I'm just going to get over to um my computer over here I'm just going to load up uh let me just grab a little terminal and I'm just going to share my screen now so you can see this so on the left hand side I've just got a terminal this is already logged into uh my one of my raspberry pies so I've got four of them Dev one to four and I've already installed um all labra on this one and I'm going to show you how to install that later but I just wanted to show you just how easy is to use on the right hand side this is another terminal to the same rasby pi and it's running htop which is just like a little processor it's bit like task manager shows you what's running and kind of what the the current CPU load is and you'll just see something interesting happen on there when we start to run this so let's get over here and then so the I've already installed it I'll show you how to do this in a minute so if we do all run and then dolphin Fe was it fi I think it's fi isn't it uh so dolphin dphi and what that's going to do is this is a model that I've already Chosen and downloaded I'll show you how to do this shortly and now it's just asking for a prompt so we could say tell me a joke this seems to be a popular one to hello world kind of thing of AIS so it's having a bit bit of think about that and then pretty quickly it comes back with a response so why don't scientists trust atoms because they make up everything this is or even explains this is a pun on making things up so you can get it to tell you all kinds of other things as well but I just wanted to show you that's kind of all armor the basic core um it's actually pulling a rest API in the background this is just like a command line thing so that's how we very simply uh interact with that just going to pull my notes up there and you can see when I actually do this if I say tell me a fact about the Roman Empire the ROM Empire Roman Empire let's have a look on the right hand side on this other terminal you can see all four of the cores on the rasby pi 5 are fully maxed out now interestingly you can see where it says meem that shows the amount of memory that's being used so out of the 8 gig that's on board it's only using 3.19 gig say only so it's not using all the memory up uh that's down to the size of the model that we're using which is I think 1.6 or 1 point we'll have a look in a minute to see what the size the model is but you can see there it's not using all the memory up it is using all the CPU on all the cores so there's quite a lot of work going on there but you can see the speed of it is acceptable it's kind of about the same kind of speed as chat GPT 4 return stuff so um yeah it's more than acceptable I think to to use that so let's have a look so that is the chat GPT without the web so let's have a look what it looks like on the web so I'm just going to load up a web browser and I'm going to show you what this looks like make sure I get the right one there we go so if I head back over to here so this is the same server um and we could probably even get that CPU usage alongside if you wanted to see that as well but look at the screen so on the left hand side here we've got new chart model files prompts documents search new chat and there's an account down there as well so when you're very first log in we'll have a look at this when we install it on another machine in a minute uh you get to set up an account just like chat GPT but there's no cost involved it's not sending your email address off uh to some newsletter somewhere it's just basically just create an account locally and then over here we've got um Allama web UE that's just the name of the the interface and then dolphin fee latest 1.5 gig so there you go that's the size of the memory required to run this particular model and it says here how can I help you today it's got some ready to go um prompts so you can click on one of those like tell me a fact about a fun fact about the Roman Empire and what's going to happen now is it's going to um just query that rest API in the background of the all Lama clients and then come back with the uh the text and it it'll type it out just like you you would expect it to on chat GPT so there you go the Roman Empire is known for extensive conquests and Architectural accomplishments but did you know they are also Avid patrons of the Arts and there you go it's carrying on giving us some more facts about that and it'll run for quite a while so we can actually just click on the stop button down here if we want it to stop there just like we would with a regular uh chat GPT but I'm actually going to let this one finish off because what will happen is it will take the uh the query that you've typed in and it will rename the chat on the left hand side just like chat GPT does so that you can go back to that one later on so in a second you can see there ran trivia blast it's called that one uh and we've got some buttons down here so you can go back and edit that query if you wanted to you can copy it to the clipboard you can give it a thumbs up to say that you liked that response um I'm not sure what the the volume thing there does uh and then we've got this little I information thing so it gives you some stats like how many tokens it uh use so 4.59 tokens uh and tokens are uh sections of a word is the best way to think of it so tell me could be one token a a random fun could be another token so it depends how how they've tuned this particular model to respond back uh and then we've got things like the evaluation count the evaluation duration so what's that that 24 seconds is that so you can get to see just how fast um or slow this actually respond so it says 35 in total so probably is about 35 seconds for it to to do that one and I think we can go to that other one it's it's forgotten those stats now so it's only the the current uh thing that you type in so over there we've got the uh the actual name of the the model so we'll have a look at how we can change that default model Sor If I bang in the microphone there so if we click on this little Cog here we can go to to the models um dialogue box and we can type in a different model so we could type in like mistol uh they have these things at the end like SB we'll have a look what that means in a in a bit later on on one of the slides but um don't worry about that and this is where you would type in things like um um is it lava lava 2 I think if I type that in um that particular model doesn't exist so we need to know the names of some of the models and we'll have a look at that uh in a minute but I think on here I have actually got um some other models loaded so I think I have mistol loaded so if I in fact if I just click on that if I start a new chat I can I can click on this drop down and I can pick other models I've already downloaded so I've got mistl and I've got AER I can see the size of these this is kind of the the size of the model file but also with that size comes a bit of complexity and the speed at which it can process that thing so if we go to mistl for example and we go down to the sender message and let's do tell me a joke see if it comes with a different joke or is it the scientist one yet again and also notice how quickly this responds or not does it take a bit longer because it's twice the size of the other model that we used but with that complexity comes more knowledge and more kind of ability to be specific or uh more accurate perhaps uh depends on what the model has been choosed on you can see that it's a little bit slower it's the same joke which I find fascinating so why don't scientists trust atoms because they make up everything there we go we can see there that one took um total sign 29 seconds to to run now the other cool thing that we can do on here is we can actually run more than one model at the same time and I've not seen this on chat GPT but if we start a new chat we can click on this add button and we can actually add a few different ones we can have dolphin fee we can have mistol and we could have let's have that third one there which is AA which is 1.8 now if we do um tell me a dad joke and let hit run so it's probably going to take at least 30 seconds if not a bit longer because it's got all three models to run through so while it's doing that I'm going to load up um this um on the the left hand side here so you can see that we have the CPUs being maxed out and then they dropped back down so let's go back over here see what's it's thinking about something and we can see there that it's uh it's responded on one of them and I think that means that um yeah we go back over there it's it's still thinking about it so what I'm going to do I'm going to go you can actually navigate about and and go back to these things once it's finished processing on all three then we'll be able to actually go through each one of the responses of each of the models which is pretty cool I've not seen that before on chat GPT or anything like that so it does take a little bit longer like I said because it's got to run each of them and I think it runs them in serial so it'll do one request wait for that to finish do another request wait for that to finish and then do the third request and once it's done all them then it will come back with the answers you can see there it's still maxing out on the CPU because it's having to really crunch crunch through on that one so while I doing that I'm just going to read my notes on the other screen there because uh I've got quite a few demo things I want to show you on here so we've looked at the tell me a joke we've looked at the default model and we can change that default model so if I go while it's doing this I can go back to the uh the screen there I can click on do a new chat and I might decide that I actually want Ora to be the default so I can click on that and whenever I do a new chat now Orca will be the sort of default um chat that it uses or you can change that to I think dolphin fee with it being the smallest one is the quickest one as well and it's pretty good it's um it's more than good enough right so we've got our responses back you can see there three of three so why did the tomato turn red because it saw the solid dressing we click on the next one you can also see the name of the model that's returned this as well so why don't scientist trust Adams because they make everything up that was mistol and dolphin feet has also come up with the same one but in a slightly different format so that isn't on a new line so we can see there that uh Ora has a different response we might decide that we want to try some uh responses uh using arer instead for future ones right so I'm going to go back to the slides now um we're going to have a look at some other bits and pieces on here I want to show you so one of the cool things you can do you might think this is cool you might think this is dangerous or scary you can run uncensored models so cloud-based large language models such as chat GPT Google bard whatever the bing one is they are actually censored so there's certain things that um kind of safeguards um say what they call guide rails that they'll put in there and you can actually choose not to have those on some of the local models and that might mean that there are some things that you need to consider so there is a broader range of content so some sensitive controversial or typically censored political content adult content ideological content might be blocked so if you use an uncentered model you'll get get the raw content so if you wanted to ask how to make you know explosives or whatever like that it probably would be able to tell you how to do that I wouldn't recommend you try that though um ethical and um societal implications so if You' got concerns about the propagation of harmful content reinforcement biases um potential misuse of Technology then you probably want to stay away from uncensored there's a reason why they censor it and it's because some of these models the the the Text corpus that they're pulling all this from um particularly if it's like older content they might not have the sort of modal view of the world so therefore they might have things in there that you know racist or uh just bad ideas Eugenics things like that uh responsibility in governance so users and developers need to be aware of the potential impacts and misuse of the outputs and need to be um Implement additional layers for oversight and ethical guidelines so if you're going to use this just use it um sensibly but the fact that you can use it I think is something that you don't get with other language large language models so to run this I would say best is to have a raspby pi 5 with 8 gig you've seen the models that we've run so far they've they've used up to 3.9 was it gig of RAM so you need a bit more head room for the operating system and anything else that's running on your rasby Pi you can get away with running it on a 4 gig and we'll actually have a play with running on a rasb pi 4 just to see how that performs as well so it will run on a rasby pi 4 it is a bit slower and that might be a deal breaker depending what your expectations are and I've created a simple Docker file that means it's really really easy to install and we'll actually have a look at how to do that as well in a second so how do you choose the model that you're going to use so here are quite a few of the models um straight off the bat these are some of the ones that ol Lama lists on their GitHub so dolphin fee we've looked at that that's 2.7 B so the b means billion parameters these are kind of like knobs and wheels and levers and things things that you can configure or that the training model will configure the more parameters um the more richer the information that's in there the slower it will be and the larger it will be so you can see there by comparison chat gpt3 is 175 billion parameters so it's quite a lot so our dolphin Fe at 2.7 it's kind of you know an order of magnitude also smaller than chat GPT still quite fast though uh as we saw there then we've got V2 which is 1.7 gig we've got Orca mini 1.9 gig that's got three billion parameters um llama 2 is one of the common ones to use so that's um generally accepted to be you know the The Benchmark so llama 2 uh that's actually developed by meta the Facebook company and that has 7 billion parameters 3.8 gig I say it's a little bit slower to run there is an uncentered version of that if you want you got vuna is that 7 billion mistol 7 billion um what's that neural chat 7 billion styling uh lava 7 llama 2 13 billion and there's a llama 2 70 billion which is 39 gig so this is really for running on some pretty hefty horsepower CPU probably something that's got a GPU in it in fact because uh gpus can really speed up um how these perform so have a play around with some of the models if you've got plenty of space on your drive if I'd recommend if you've got a Raspberry Pi 5 getting the new um um SSD drives the um internal nvme drives with a hat then you can have plenty of storage on there okay let's have a bit of a demo on how to set up a Lama on a computer so what I'm going to do just going to go over to here grab um another if I go over here now so that's our Dev 2 machine I'm going to bring up another machine here which is Dev 3 and I've created a a script on here so what I'm going to do I'm just going to move my screen around a little bit so you can see things a little bit easier so let's just shrink that down a bit and let's make this one a little bit larger so that we can see it okay so on here I've got two files I've got Docker compose and I have this get web U so let's first of all just have a look in the docker compose file so this is a um file that Docker uses to basically build the the container that's going to run all this software so version 3.9 is the version of Docker compose there's two Services there's all Lama and there's all l- webui which is the nice front end that we've we've been playing with so we're going to set up AMA which is a container named olama the image is /ol Lama latest so that will grab the latest version of or L off the web we've said it to always restart and we've also just mapped a volume in there as well so this is where they the models will actually be stored so it's just mapping um home Kev allor to the root allor in inside the container and then the next thing is uh the web UI so we're going to build that from um a local um GitHub repository we're going to download that uh and it's going to build it from there it's got a couple of other uh it's got a volume on there it depends on the all Lama uh rest API and you can see there the ports it needs to connect to this is kind of in internal working within the container uh it needs to connect to port 8080 inside but it will externally provide that as Port 3,000 so when we type in the URL we actually have type in the IP address of the machine that is running on and then colon and then that 3,000 to access it and then it's got little environment variables there that's just telling it how to access the uh the other container that it's um the other service has run in the container and it's got some other bits and pieces in there so that's the docker compose file uh now the other thing I've got on here is there's a file that's called get web UI so if we just do cat get web UI you can do these steps manually I've just made it a bit easier so on here it's going to do a git clone github.com ol l- web ol l- webu and it's just going to call that folder web UI so if I do um slash and then get web UI it's basically just going to do that cloning of the web UI git repository we need that to be able to build the uh the container that we need and then if we simply do Docker compose up um and then return it's now going to build that container build the web user interface and then launch it ready to go now if you do doc Docker Das compose up space- D it'll run it disconnected meaning that you won't see all the the gobbins all this kind of stuff here all the the terminal commands and you actually can run it in the background and just kind of carry on with stuff so you can see there it says it's running on 0000 880 but that's inside the container what we need to do is look at the the web um the IP address of this machine which is 192168 1.111 and then it's running on Port 3000 CU that's what we told it in our Docker uh compos file so if I now go over to a web browser let's get back over here let's do 1921 168 1111 -3000 and this is what we met with so Al LL webu is up and running it's asking us to type in um let's go for my email address let's type in a password let's type in password something like that sign in uh oops sign up we just need to do and it just needs my full name so let's just go for that create an account and there we go let's just save that and now we're running this on Devo 3 which is a new raspber pi so we haven't got any models on here yet so you can see there select model we haven't got any so we need to click on that little uh Cog we need to go to models and we need to type in there something like dolphin fee now I'm mindful I'm streaming over the internet I don't want to kill my own internet feed here but if I do dolphin dphi and hit that little download button it'll start to download that model and we're basically at the beginning of the tutorial that we started at the beginning of this stream so you can see there just how easy that is to get up and running it's pretty quick didn't take too long I did use um I did build this before the show just so that all the the files and everything uh weren't basically stealing my bandwidth so might take about five minutes perhaps uh just to do all those steps there but you saw how easy it was and it's pretty pretty robust pretty easy to go and yeah you can download as many of those models as you want to as you got space for to give it a try okay so what I'm going to do I'm going to close that one now and I'm going to go back to um let me just check my my uh notes here to see if there's anything else I want to cover up on that I think we're good there so the actual Docker compost Val and that get webui I'll put that in a git repository and I'll put it in the show notes uh in the description below so you'll be able to find that I'll also write up a Blog article about this after the show is finished uh but let's get back to our slides there's another thing I want to show you which is uh um Lang chain but before we do that if you like this uh content and you want to help me grow my channel to the next Milestone 50,000 subscribers then please make sure you give this video a like drop me a comment let me know if you've used or or any other web interface to something like or Lama and if you've not already subscribed you know what to do um hopefully you'll help me get to the next big milestone I do go live every single Sunday at 7 o'cl GMT so you can join me U live on the stream as well if you want to have a bit of a hangout after the main part of the show right so we talked about those um those models here they are in a bit more detail so you can see that llama 2 is the like de facto one and this one is um developed I think by um meta so Facebook have developed this one I know that they've they've kind of pivoted away from all that VR stuff that they were really heavily invested in and they're now looking at AI so that's probably where a lot of their money is being spent at the moment so it'll be interesting to see what happens there uh so there's a question there about um what operating system I using for rasby pi 5 so rasby Pi 5 I'm running the the rasby raspby pi OS which is just the uh um Debian I think it is behind the behind the scenes but yeah that's what I'm actually running on there so just to answer that um so yes I also recommend that dolphin fee I think that's a really good um General allrounder it's really small so it runs really quickly if you're not really happy with the kind of answers that are coming back from that then you can use a much more sophisticated model and on those little tutorials I did there those demos I didn't really get into some of the really beef stuff you can do which is like code generation just like chat GPT you can get it to summarize documents and stuff like that there's a lot there I know at the moment the web uee um is in alpha or early beta so there's a lot of functionality that's not quite working properly so you should be able to just drag and drop a document into there it then tokenizes that and then you can then chat with your documents you can basically drop in hundreds of documents it'll um tokenize all of them and then you can use them to query information that's within and particular like heavy heavy Excel sheets or whatever it can help you find insights within that um some other things wor on here is that some of these models are tuned for images so you can do um you can chat with your images you can say what's in this this particular image you know tell me give me a description of what you can see in the image so some of them like the is it mixol I think is a good one for that um so yeah that's worth noting uh let's have a see what else we have on here um so or mini I also like that one as well that's quite a lightweight and fast one uh as well so suddenly give them a go you can see there the download command you can basically go back to this video pause if you want but you can see there you type in Allama run and then the model if you're running the command line otherwise you click on little C go to models and just type in the name of uh the model on the left hand side of the column there like Orca mini whatever is orca Dash mini cool let's go to our next slide and see what else we've got on here so Lang chain yes so so Lang chain is how you basically use all armor with your programs and it it's perfect for Python and they also support JavaScript uh as well which is pretty cool so they've got this nice diagram um I've kind of replicated here that kind of shows you all the different components of Lang chain so in the middle we have the the Lang chain um cognitive architecture this um they got some really fancy words for this but this is essentially how we put together a large language model and therefore how you can tweak each one of these different bits and pie pieces so beneath Lang chain which is a kind of wrapper around all this we have the Lang chain Das Community we've got all the models in there things like um the all am ones we've been playing with Dolphin fee and so on the prompts we can get it to respond as a particular type of uh personality so you can get it to uh a bit like can do in chat GPT so all that prompt engineering you can customize the prompts there the example selector so where you provide like an example of what the output is you're looking for you can build that in there and then the output paa how it actually interprets its own responses and provides it back to you so there's quite a lot of different steps there they've got like retrieval so the document loader the vector store is how it stores those document tokens so if it takes like a big word document um it actually breaks that apart into into tokens and then it stores those tokens in the vector store very fancy name and then text splitter I assume that does what you expect it to do it splits up text and the embedding model uh embeding is how it um gets the knowledge from those tokens and it's just the order that they're in essentially and then you got the uh agent tooling the tools and the tool kits that go along with this to help you do those things then you got the the protocol so the L chain expression language is this uh what do they call it a paralyzation paralyzation fallbacks tracing bench bench batching streaming um asynchronous and composition so this is just all the different parts that make up the stuff behind the scenes so you don't have to worry about that too much if you don't want to but if you want to dive into it and configure all this stuff you know have at it so I'm going to show you how to write a program using Lang chain and python because it couldn't be simpler uh and it's really nice just thinking about what we could do with our python AI that we've been developing on the uh the stream we can now add to this um some really nice language stuff so let's have a bit of a demo on this I'm going to go back over uh let me bring up uh let's have a see where I want to look at this one so I'm just going to just going to stop that uh container that we had over there and let me just see where I had it running I think if I just quit on this one as well so slash by is how you get out of um that particular uh interface let me just shrink that one down move that over here and then let's just move this one so the Lang chain I've just got a um very simple Python program so just going to zoom in on that so you can see it okay oops if I do lso demo. PI so let's have a look at this so if I do cat demo. py you'll see just how simple this program is so we've said from Lang chain community. llms import or Lama so our friend or Lama we're then going to load in the model llama 2 which is uh uh the one that met have developed and then we going to do a print F and then initializing a large language model and then the models name so these little curly braces just mean just print out Lama 2 and then LM equals o Lama model model so this llm will now become um the object that we talked to so the first thing I'm going to say is tell me a joke the uh the hello world of AI we're then going to say um print F asking a question and the question is tell me a joke and then the response we're going to store the response that we get back from invoking this on our large language model so we're going to just just invoke that question which is tell me a joke and then I'm going to say print the response is and then just print out what that response is so that's what we're going to do and then I'll show you how to set this up because it's uh it's pretty simple to do so if I now do Python 3 and then I do demo. piy so it's going to say asking the question to tell the joke so what it's going to do in the background we're on um 100 again so if we look at this we might see those CPUs jump up as it's starting to process that properly so it's interesting that it doesn't do that right away I guess that's kind of unpacking the model and doing whatever it needs to do you can now see that it's going 100% CPU so it's it's really crunching through that and giving us the answer and then we'll see the response that comes back so because this isn't a stream we will just get the answer back all in one go uh you can do a different one where you basically just say keep giving me the tokens as they come through and you'll see it do that kind of chat u typing it out kind of uh response rather than just being all in one go you'll know when it's responded because you'll see the output and you'll also see uh this 100% drop right down to zero again there we go so once again it's that same joke why don't scientists trust atoms because they make every make up everything and it also says I hope you found this amusing do you want to hear another one so that's slightly different and that's probably to do with the Lang chain uh and the kind of wrappers that they have around with the the responses and so on we saw how simple the code is for that couldn't be simpler so the way that I've uh got this to work if I do um pip oh before we do that if I just um come out of this environment so deactivate let's just clear the screen so the first thing I did when I was building this out I did python 3-m venv venv this will create a virtual environment so I've actually done this already so it may or may not work because that already already exists but there you go it's created it and then you have to type in source to activate that virtual environment virtual environments are there to sort of separate seat out all your python code from each other all the dependencies and so on uh and then I did pip install Lang chain long chain I think like so and that basically just grabs all the dependencies that it needs installs everything that it needs to I think I also did Community like that you can see it's already downloaded those and so if we now do pip freze you can be able to see what's actually installed so it's installed all these different different uh dependencies just by doing that pip install Lang chain and if we now do that Python 3 demo again um you can see there it's doing everything it needs to do to get that response back probably be the exact same response again and again that's down to the complexity of the model uh and the fact that we it's not learned anything from running that the first time so if we we keep this running all the time it's unlikely to respond with the same response because it knows it's already responding with that uh but there we go so yeah how easy is that to to bring o chat GPT locally on your raspby Pi using python couldn't be simpler than that love it and there's so much technology stacked on top of each other uh but they've made it really easy to get at these so there you go so interesting it says that I hope you find this interesting amusing uh if you're interested I can definitely share more jokes or engage in some funny banter with you just let me know what you'd like to chat about so there there we go okay let me just check my notes to see if there's anything else I wanted to show you so there is there's one more thing I wanted to show you on here so if I just go back to me for a second and I just grab um a web browser okay to share that with you if I go to ol. a um you can see here they've got the blog Discord GitHub models and they've got this download button so if you click on the download you get these three options Mac OS Linux and windows and the Linux one you simply just copy that command there so it's going to curl a shell script this install shell script we could actually grab that and see what what that actually looks like and that pipe just means once you've grabbed that then run it that's what that sh there means but if we paste that into our browser there you can actually see what it actually does if you really interested it's quite a robust script because it can run on lots of different versions of Linux but essentially it's very straightforward um to install using that so just copy that paste that into a terminal and you've got all armor installed the then thing you need to do is just run um a model and that if it hasn't got it cached it will then download it so could not be simpler if you look on their blog you'll see um they've got let's have a look on here how to run it locally theyve got some examples there's an interface that doesn't quite look as nice as the the web UI that we've been running and there you go all Lama run code Lama 7B which is the 7 billion par parameter version it says they need 16 gigs of RAM for the 13 billion version and for the 34 billion you need 32 gigs of RAM as well so you can specify which which size of parameter you want to use with that not sure why they picked 7 13 and 34 they're not quite uh round numbers are they uh but there we go cool so just wanted to share that with you I think that's everything there so let's get back to our slides cool so if you want to get started Ed today learning things like um how to set up Docker how to build python programs then you can head over to KES robots.com learn slash or just click on that free courses button that you can see there on the navigation and that'll take you straight to all the courses that are freely available to you so you can get up and running right away so check that out if you haven't already I know quite a lot of people have done I've been looking through all the logs recently and uh yeah the the website is really getting quite a lot of traction now if you want to get yourself one of these hats these robot maker hats I do have other colored ones as well let's grab my my red one I like the red color but I think in the US Red Hats have got a different meaning haven't they so I've started to wear my black one uh there we go so you can buy that you can get mugs all kinds of good stuff on there that's Kev robots.com merch and if you've not joined our Discord Community you can head over to Kev robots.com Discord to get yourself a sign up link completely for free and you can join our Discord community on there and get help with any kind of question questions or if I've written so much code on this this channel so far I don't necessarily know the answer to everything if there is an issue but there's certainly people in the chat smarter than me who can help out with that so help yourself over there and if you want to follow me on social media then you can do that a number of different ways so you can go to um to threads I'm at kevinm threads. net I think I use that one more than uh Instagram or Twitter at the moment I'm on Tik Tok Kevin Maier 6 I'm on Instagram at Kevin Maier I'm on at kevmac also I'm masteron kevm masteron social and blue sky kevmac at B Sky social so be sure to check that out um okay so there's a question there about how you join Discord so if you just go to Kev robots.com Discord there should be a sign up link for that if I just show you that actually I go back to here you go to k robots.com Discord uh it should take you to the sign up link if that doesn't work um I'll look into why that will be but um that should work let's try Discord slash yeah if that's not working I'll have a look at why that is that that's probably just a glitch on my part when I've updated the website recently I'll have a look at that one and I will get back to you with why that uh uh isn't working if you want to follow me um or you want to help support the show there's a number different ways you can do that so you can get your name in the end credits which we'll share in a second so if you go to k robots.com coffee you can get your name in those end credits if you're watching this live now you can do a super chat which is uh I make sure that's switched on from now on there we go uh that'll pop up on screen saying that you've uh done a Super Chat and you can also do a super thanks if you're watching this on replay uh which is just like a way of saying thank you for the video and it's like the price of a coffee I think if you want to um do that more regularly you can join the YouTube membership program which is the price of a coffee per month just to support the channel it means I can uh buy more things for the channel robots and components and whatnot cuz it's an expensive uh thing to do okay so my supporters this is the point where I give everybody a shout out for joining the channel so we've had quite a few people buy coffees recently so we've had Steve Robinson there somebody wanted to remain nameless we've got Maria Louise mayor uh and then on the membersip membership side on buy me a coffee we've got Alvaro Diaz we've got Mary Lee Mary louiz Mayer got Jeff Johnson Dian Cy Marin Brent Tom shmy and Steve Phillips and then we've had quite a few new members join the YouTube channel so we've got Warren steel Steven Cross uh John Lam lamu is that lamu sorry if I pronounced that wrong lamu I think it is Jonathan R we've got Vince we've got alist wear John Paul Jolly Cassie got Dale from hybrid robotics tinkering rocks JDM Johnny bites oxr 39 hansman cheer likes Michael and of course we have Tom as well so if you want to get your name these credits you can go to k robots.com crits uh to get uh information on how to do that great so hopefully YouTube will now be showing you on this side a video that you might find interesting that's related to this particular topic as well that you should click on uh and this is the point in the video if you're watching this on replay I'll say thank you so much for watching and I shall see you next time
Info
Channel: Kevin McAleer
Views: 21,502
Rating: undefined out of 5
Keywords: ollama, raspberry pi 5, raspberry pi, pi5, webui, ollama webui, diy chatgpt, chargpt, ai, ai assistant, llm, large language model, local ai, chat gpt, artificial intelligence, large language models, local llm, dolphin-phi, dolphin phi, llama 2, llama 2 local, ollama docker
Id: jJKbYj8mIy8
Channel Id: undefined
Length: 41min 49sec (2509 seconds)
Published: Mon Jan 29 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.