Run Mistral, Llama2 and Others Privately At Home with Ollama AI - EASY!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everybody and welcome back to Jim's Garage it's official our AI overlords are here and they're not going away so what can we do well in this video I'm going to show you the simplest way I found to self-host your own AI instances and that is you can choose any of the large language models to run privately on your infrastructure that means your requests your queries your responses your data all stays local without being sent to somewhere like chat GPT to be farmed for more data with all the privacy concerns that comes with it so in this video I'll show you two options one is a simple command line interface that runs in Linux and the second is a much more userfriendly experience that looks pretty much the same as chat gbt which you can access through the web browser now to accomplish this we're going to be using Ola and olama is basically the engine that runs all of the large language models so when we get into this video and by the end of it you'll be able to have a nice goey interface where you can choose whichever large language model you want to run and then you can execute commands through a nice guey and if you head over to their website they've got some really easy instructions to follow so if you simply click on download you've got a few options you can run this on Mac OS I won't be covering that in this video I will be covering however Linux so you can simply install via this convenience script here into your terminal and you'll be up and running there is also a Windows coming soon and also there's a Docker which I'll focus on in the second half of this video so before you get into the deployment mode Let's go and have a look at the virtual machine now for anybody who watches my videos you know I run proxmox but this should be hypervisor agnostic and you can even run it on bare metal if you wanted to now this virtual machine that you can see here I've just called it Ai and if we look in the hardware tab you can see that it's quite beefy I've given it 32 gigs of memory 20 cores and about 50 gigs of hard drive space now this is overkill but it's going to give me sufficient Headroom if I want to run or at least store multiple models on this machine so you can scale it down the documentation recommends a minimum of 8 GB of RAM and the more CPU Calles you throw at it the better but importantly if you have an Nvidia GPU you will gain some significant performance improvements the demonstration I'm going to show you today is only using the CPU that's twofold I don't have an Nvidia GPU to put this on and also most of you probably don't and a CPU is what you've got unfortunately at the moment this only works on Nvidia but there's a lot of noise around AMD and Intel support so hopefully that will be implemented in the near future so once you've got your virtual machine set up or your bare metal you simply need to log into it via the CLI now for the option one which is to interact with the large language models through the Linux command line into interace I'm simply going to use the convenience script here so I'm going to copy this and if you want to see what it's actually doing you can click on this link it will take you to their GitHub and you can see all of the commands that it's doing in the background so if we head over to our terminal now we can paste this command in and hopefully by the end of this process we'll be up and running I'm going to skip ahead and I'll see you on the other side once this script has completed now when the installation process is finished you can run the Alarma command and you can see all of the options that you have available so this will be able to do things like serve it through the API so you can connect remotely we'll be making use of that in the docker installation but more importantly you can pull and install new large language models so let's go ahead and do that first so if we head back to the ol website we can see all of the models that are supported so back on their website if we go to models in the top right hand corner you can see here that there's a ton that we can choose from now there's a lot of noise recently around mistol and their 7B model so you can actually choose that one if you want and so to do that you would click on mistol and then you'd get the tag here and it's as simple as doing Al LL run mistol and for any other that you want you would just simply look through here if you wanted the mixol one and you do alarm run mixol and then you've got mixol downloading so we're going to copy one of these now hop back into the terminal and then we'll get this running and then hopefully we can interact with one of these llms within the command line interface do note that some of these have got some really heavy requirements specifically things like the ax7 here you can see that it requires 48 GB of RAM so do choose one that's suitable for you I'm going to go back here and I'm going to choose the dolphin one which a is uncensored and B is a lot smaller in size now for this demonstration I'm going to run the dolphin 2.1 mistol just because it's really small and my internet's terrible but obviously you can run any of the ones that I've previously shown you and just make sure that you have the hardware available to run it so now if we paste that command alarm or run and then the llm that you want to use hit return it's going to go away and it's going to pull that and once it's completed you're then ready to be able to use it so I'll skip ahead this part and I'll see you on the other side now that it's completed we can do Al larm run and we can send a message so what should we try let's say who's the best homb YouTuber now this might be a little bit slow because as I said this is running on the CPU but it should be up and running within a few minutes now unfortunately it doesn't look like I made the cut so I'll just have to keep trying harder but as you can see here we've now got the Dolphin llm running and it was really simple so let's now hop over into something that looks a bit more friendly and so hopefully this looks pretty familiar to you albeit this is all running locally using olor and their gooey so how did I get here well thankfully there is a Docker setup for this so let's jump back into Docker and let's get this up and running so heading over to their GitHub let's give them a star and if we scroll down we found all of the instructions that we need so we can either choose to run this both locally so you can deploy the llm the olama agent and the guey on the same Docker container I recommend you do that just to keep things simp simp Le or you can make use of the existing server we just set up within say a separate Linux VM and you can then host the guy externally and reference that and all of those options are down here so for example you can run everything together here or alternatively you can build the container and host it on a different server I'm going to take the option to do everything together but the process should be pretty similar so to get this up and running the docker image needs to be built locally but thankfully that's not a complex process so the simplest way I recommend to do this is you can either download clone this repository locally within Linux or you can simply download the file from here and then copy and paste it over to your machine if you want to now over on the virtual machine I've cloned the repository and copied all the files into a folder called ol and if you've watched any of my previous videos this is kind of how I set out my composed files now all the files you need in here are ready so all you need to do is to go into the docker compos file and edit it if you need to I'm not going to be editing this just because I don't have a GPU and I'm happy with just setting this up using volume mounts but if you wanted to you could add Buy mounts Etc now if we have a quick look through this file we can see that there's two Services there's the ol which is what we installed previously just through the command line and then there's the web UI which sits on top of that and connects to this back end so with the Ala you can uncomment this part here if you've got an Nvidia GPU and it should automatically detect that and be able to use it to accelerate some of the responses other than that it's pretty straightforward it's just going to create a volume called alarm so if we look at the web UI this is where the build is required so it's going to build this and then enable the API and if we look further down it's going to connect to the Alarma base API which is this container up here once it's done that this service will be accessible on the IP address of the virtual machine you have on Port 3000 obviously you could use traffic and get SSL if you don't know how to do that I've got a video on it and we just need to add the traffic labels here to do so I'm not going to focus on that in this video here just because I want to get this up and running quickly so now if we hop back into our terminal and we navigate to our folder we should be able to run the pseudo Docker compose up- D and get this deployed and the first time you run done that it's going to take a while to create cuz it's going to have to download all those images and go through the build process however once it's completed you should get the following message that both containers are started so now with any look we can hop into the web browser and hopefully reach this and so in this instance I've navigated to my Docker IP with the port of 3000 and I've been greeted with the dashboard so fingers crossed everything is looking good so far so now that it's working how can we make use of this well to get things started we need to hit the Cog at the top here once you click on that you're presented with the models page and that's because we haven't enabled any models so just like before you need to go onto the Ola website and choose a model that you wish to run again I'm going to keep this simple and just run the small one because it's 4 GB so in this case if I go to the website and I copy this part here I can head back into the other website and then paste this value in now back over on the website we can paste this value hit the download button and that should go away and pull the Manifest and we've got a progress bar here so I'll see you on the other side once this completes so now this has completed downloading and you can see that here so we're ready to get going so if we close this down we can now set this one as the default so this is now set as the default and to find your models this is especially applicable if you've got more than one you can hit the drop down here and here you can see the dolphin 2.2 we can do that and then hit set as default and then we can ask a question I don't know tell me a fun fact about the Roman Empire let's see what it has to say now with all AI I really like the fact that they've got this notification at the bottom llms can make mistakes very important that's definitely true so there we've got a fun fact but why don't we do something now more related to coding so let's say write me a kuber manifest file for Pyle and so here it is generating that response and what's quite interesting is whilst it's doing this you can have a look within your hypervisor to look at the system performance so here you can see that out of my 20 CS it's using about 50% and we're already up at 20 GB so you get a feel for just how hungry this thing is and bearing in mind that this is a stripped down 4 gig llm so you can pretty much extrapolate that up to things like the 48 gig gab model so now hopping back in let's see what it's done it's created a deployment and that looks pretty good for a basic deployment obviously we'd want to ask it to create some PVCs Etc and probably a service and some default headers but don't know that looks pretty good and the best thing of all is all of this is local to my environment I haven't had to make any calls outside of my network so all of my data stays with me so hopefully you now have everything you need to Launch open llama and use one of the many popular large language models on your infrastructure this is going to safeguard your privacy and hopefully give you all of the functionality of many of the big players out there it's going to be interesting to see what your results are compared to things like chat GPT how mature this is a lot of these models are not selling themselves quite at the chat GPT 4 level yet but who knows is it a matter of time before it gets there will we see chat GPT 5 to overtaking it again I don't know it's really exciting to see how this evolves let me know if this is something that you're going to use in your home lab and drop a comment below if you've liked this video please give it a thumbs up hit that subscribe button and I'll see you on the next one take care [Music] everybody

Info

Channel: Jim's Garage

Views: 15,290

Rating: undefined out of 5

Keywords: ai, ollama, llama2, mistral ai, mistral 7b, mistral 8x7b, uncensored ai, private ai, chatgpt, ai chat bot, homelab, docker, linux, open ai, gemini ai, artificial intelligence, artificial intelligence course

Id: 1XPG1DfrtN8

Channel Id: undefined

Length: 12min 45sec (765 seconds)

Published: Tue Dec 19 2023