Ollama on Linux: Easily Install Any LLM on Your Server

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
a llama the way that I prefer to run large language models on my Apple silicon Mac has just announced just yesterday that it's now got a release that runs on Linux now that is great because what that means is that we're now able to run a llama with all the models that it supports on any cloud service provider that we'd like to use so in this video I'm going to show you how to set it up on my preferred cloud service provider which is digitalocean not sponsored and show you how to get to grips with it and get started using it so with all that said let's get on with things so yeah llama as announced yesterday that it now runs on Linux it has WSL support and will support Nvidia GPU acceleration in this demo I'm not going to be using GPU acceleration because the servers that I'm going to be using do not have any options for that but you can go and use it on whichever service provider you prefer and get the added benefit of the Boost that you get from GPU so here I am this is digitalocean I'm going to create a new droplet here I'm going to select London as my region because that's the place that's closest to me and I'm going to opt for a machine that I'm going to use a regular machine but I'm going to opt for one that has eight gig of RAM because actually doing it on some of these smaller machines that I found it's just not going to work at all and I think it's 8 gig is probably the minimum that's supported anyway I'm going to add my search key and then I'm just going to rename this to Alarma and I'm going to create my droplet okay great so that's been created let's see if we can actually sh into it another needing to say that I am roots okay okay cool we are in so we have this command it literally install it one command just this girl one should always inspect scripts that you are going to run I trust the alarmer guys you've got a nice link there to be able to inspect it anyway so let's just run that command there okay so we can see there that it's saying that we don't have an Nvidia GPU detected so it's going to run in CPU only mode so yeah definitely if you want to do something more that is not just a test and do something more production worthy than you probably want to get a machine with GPU but for now let's just try and run something in alarm so I'm going to try a llama to uncensored and just see if I can get a model installed and running so saying there that alarm is not running we need to do alarm is serve to start okay so if I do that is it now running okay so we'll run now yeah okay so if you do service Alarm or start it runs and then we can pull so I'm pulling down the line two and sensored model and we've got all these models that are available to us so a lot of different models that we can run on any cloud service provider that we are running Linux on which is great so we're using Ubuntu in this case and as we can see it seems to be installed incorrectly so let's wait for this to download Okay cool so that model's come down let's try and run it okay great and we're going to ask you a question why is that it's got a blue and you see it's taking a little bit of time for a response there it's obviously only an 8 gig machine and it's all running on the CPUs not running on GPU but we are getting a response so Skype is blue because of the combination of factors but that's really impressive we've been able to install this and get it working in no time at all very quick installation procedure and we've pulled down the lava 2 uncensored model so we have everything that provides as well and we can do this on like I said on any cloud provider we like so let's come out of this um the next thing about this is that we can actually run it as an API So currently if we scroll back through this blurbs that we have here this is running an API on the localhost on the one one four three four so I do have an example of using that API I think in that GitHub let's have a look at that okay so if we now use this post commands so we can do this isn't going to work because actually that is the Llama 2 model so let's jump back and change it to llama2 uncensored because I don't want to pull down another month at all and be able to see what the output from the API is so this is the way the alarm sees things behind the scenes so we can see that every we've got each token being spam back as like a real-time API so at the moment if we were to try and do that locally uh from my machine so if we open another terminal here okay so if we're trying to run that same command here and we use the IP address that we've been given by distribution for our droplet this isn't going to work and that's because so we can see we've got fail to connect server that's because this is only accepting connections from localhost now there's a whole bunch of things that you need to consider unfollowable rules and etc etc but I'm going to open this up and just prove that we can access this locally from my Mac here so to do that we need to set an environment very well so I'm going to set I'm going to service Obama stop stop and there is an environment variable which I believe is a llama host yes alarm host okay so so if we set this here and do a llama host equals zero zero zero as in or from any IP address rather than just localhost and we set it to the all month four three four which is the defaults ports that alarm is using then I hope when we start this it will accept connections so let's see fail to connect to server okay I'm gonna do it like that so what I'm going to do is explicitly set the alarm host and then do alarm server after okay cool now we can see it's listening on this port and we've got a whole bunch of output and also the warning about the GPU support so let's see if that now runs ah in fact actually that's not gonna work because we haven't got the instance that let's do uncensored model ah okay so it's claiming that that model is not already downloaded let's open up software connection to add so if you have a look we haven't actually stored that model which is interesting that's pull it again so I'm wondering if the reason we had to pull it again is because the host has been changed and so that affects things in some way in the way that it's been stored and so maybe the models do no longer resist so you can see that we didn't have any models listed there at all anyway let's wait for this to come down and see if we can actually run it client side Okay cool so that's come down we're still serving here let's see if that now is gonna run just put out a load of outputs still not girls response okay got a response there we go so we can see again these tokens coming down and now we're running this client-side making a call to our server from the same interface in same alarm interface so we have this standardized kind of interface to server-based model and I think this is great because this means that we can just use anything we want we can install any model that we want really simply the same kind of process for using it the other thing that we can do is in fact actually if we um use Alarma host locally um and so if I so I have a llama installed on my client machine so this being our client machine again if I list what I've got installed I do not have llama 2 and sensors if I set the alarm host so the IP address of this machine I can remember what it was this one and just say alarm around alarm to uncensored then we can see that we've got that API happening there [Music] um so let's ask it a question it wouldn't answer if I was using a normal model so I'm going to use the killers question I had before so two killers are in a room one killer hang on another killer enters the room and kills one of them how many killers on now in the room so this shouldn't answer in with a default kind of llama 2 model or it doesn't tend to because it treats it as a a question that is deemed to too risque for it to answer doesn't fit within its boundaries so it's likely it's going to get the answer wrong but the fact that it answers it is good is what we want so saying there's no information given about how the second killer entered the room so it's impossible to determine whether the new killer killed the first or the second person in the room therefore there could be one two or three killers in the room depending on which scenario occurred I don't think they could ever be three killers if one had entered the room and killed one or the others there's one or two anyway there's two in fact not one or two but it's the fact that we can call what we've just installed on our own server under using our model that we selected the Open Source One this is great I think this is great you should go off and try and use it um let me know how you get on let me know if you get around this weird situation that I had with the models they're having to be downloaded twice um yeah and I'll speak to you soon new video alright bye for now bye
Info
Channel: Ian Wootten
Views: 12,219
Rating: undefined out of 5
Keywords:
Id: swNeoKGFkQM
Channel Id: undefined
Length: 12min 56sec (776 seconds)
Published: Wed Sep 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.