Getting Started with OLLAMA - the docker of ai!!!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey welcome back so as you know we've done a few videos showing you how to get a large language model such as mistal 7B or llama 27b running on your local machine and we've done that using tools such as ll. cpb nodejs and even python itself and today I'm going to show you the simplest method of getting an llm running on your machine and that is through the use of a tool called ol llama so what makes o llama pretty special it's not just that you can run llms pretty easily but actually I think it's starting to become the dockor or dock Hub of large language models and today I'm going to show you how to get started really quickly so if we go to Allama a on your machine you can see you get the nice little welcome screen get up and running with large language models locally and as I said before it allows you to run things like llama 2 code llama and other models if you want to look at Which models are available you can just click on the models Tab and you can see a whole list of models that are available that you can run on your machine using o llama everything from llama 2 and myal even lava which is an image model open source very very cool mix St uh you've got code llama there dolphin uh whole ton of them and this is what I'm saying about this becoming kind of like the docker or Docker Hub of models uh there's always new models being added all the time and then following that sort of consistent command structure that you would like have with Docker you can get it running on your machine pretty simple as well and then there's even tiny llama again I'm going to cover tiny llama on another video at some point so now that we know the models that are available it's pretty easy to get started you just need to click on the download button and then that will download on your machine now as it stands today it's only available for mac and Linux but as you can see Windows is coming soon and if I just click on that download button you can see that allows me to download it onto my machine now I've already got that downloaded so I'm not going to do that but it just downloads a nice little zip file once that's downloaded I just need to uh open up the zip file and you see olama at then becomes available and I really just need to add it to my applications you see I've already got olama installed so I'm just going to click on replace and now if I click on applications you will see olama is there click on that and there you go olama should be on the top of your status bar so if I want to get started super quickly with or Lama which is basically download a model run it and then uh ask a question of it I just need to type in O llama run and the name of the model so in this case I want to run llama 27b and that case it's just llama 2 and then I just run that and then you can see here it because I don't have l 27b already installed it's going to do that Docker style trick of going pulling manifest so it's even using similar terms to Docker and then it's just going to download that and then once it's downloaded and is in the cache then it will be super quick to uh launch afterwards all right as you can see now it's downloaded and again you're going to need quite a hefty amount of disc bace to be able to run it you see it's about 3.8 gab and again you'll need a bit of RAM on your local machine maybe about 16 gig ram to run this but as you see it's running on my machine and it's is automatically launched so if I want to I can just type in a message something like uh who is adaah Lovelace and you're going to see it's going to come back with an answer and again it's pretty fast right so this is standard output from llama 27b and that's a pretty good answer and again I can ask it any question I want I can ask it to do some basic math if I want what is 2+ 2 and then it's going to come back with four so this just works like any other kind of llama model if you want to get more information on what sort of things you can do within neol llama command prompt you can just type in SL Mark and then it gives you a bunch of uh answers here so again if I type in something like forward slow for example you can see I can get some uh information about the model file the parameters Etc so I'll do do show info and then you can see here that the family is llama it's a 7B model with uh you know and it even includes its quantization level and if you're curious what quantization level is is actually a technique that is used to be able to sort of reduce the size of the model within memory so it can run on local consumer Hardware in fact I have a video on fine-tuning models in fact llama 27b especially using quantization and the nice thing about o Lama is actually running uh llama CPP underneath so if you've got any models that are compatible with llama CPP then actually it's just going to run in here fine it's F in that sort of standardized file format again if I want to look at any more details I can look at the model file for example so I do show model file and you can see I haven't set a system prompt or anything like that but you can see the kind some of the parameters like the stop parameters instanes Etc and then finally if I want to exit out of this I can just uh type in uh forward slby and I'm back to my uh terminal again if I want to see other commands that are available I can just type all Lama and then it's going to tell me the things that are available so actually you can see it's got a nice serve thing I can create uh my own models I'll show you what that is in a little a little bit later AMA run we've done before for uh being able to run a model and again olama list uh will tell you which models are available on your machine so you can see here llama 2 latest which is what we just downloaded and Mr 7B at latest as well so I've already got that running so if I just do all armor run at Mistral 7B for example now I'm talking to another model if I want to I can ask uh who is Ada love LS here now you're going to get an output that's not coming from llama 2 but instead it is coming from the Mist trial 7B model and one of the things that you probably noticed there is that the response rate for Mr all coming back was a little bit slower than llama 27b so if you want to see how many tokens per second they're actually coming out uh you can actually just use the ACT command set verbose uh and then that will put it in for Bose mode so now when I ask who is Ada love lace and you can see it comes back with the same answer but now it will tell you how long it took for the query to run what how what the tokens per second is so in this case it was running around 16 tokens per second so it's just a little bit of extra debug information if I want to switch that off I can just do uh set quiet so we're just going to exit out here for a second we'll clear this and I'm going to do a llama list again as I said if I wanted to call another model I could just come back into the library uh that we had before so Alama Ai and then click on the models and then maybe I fancied something like code llama then all I would need to do is I could do an all llama run like I did before or I could do a llama pool and then pick something like uh code llama and then that will uh download that just as it did before so uh in the interest of time I'm not going to do that I'm just going to exit out of here now as I said before it should have a very familiar Docker feel to you the listing this idea of layers manifest Etc been able to do a run it's got a hub Etc so we can actually look at this a little bit further as well and if I just uh from my uh home directory go to cd. Lama uh you will see everything uh that is underneath the hood uh for o Lama so in this case if I look under uh the models folder for example you can see the registry there is llama 2 there's mral 7B in fact if we open this up in let's say VSS code for a second you can kind of see a very very similar sort of look to a type of Docker file it's got this idea of layers same way as a Docker has and then it's got its digest what the size is so you see things like the license file any templates any parameters Etc so it sort of composes up as this idea of a kind of model file at describing it so if we just close that for a second uh we'll just come back into the uh directories uh that is the kind of registry describing the Manifest of the model but if I go into blobs for example you can see that is actually where the actual models themselves is stored or the layers of the model so you can get the idea that this uh uh this file here at 3.83 gig is probably llama 2 and then this one at 5.13 at the bottom there is misal and again similar these are all cachable so if you get yourself in a little bit of mess you can delete that folder and you can kind of start it again so what's really cool about this is because it supports the same file format as Lama CPP which is the GG UF file format then actually you could just download a GG UF file or uh have your own local model file and not actually have to download from the model directory and you could just put something locally on your local machine and create your own model from there so I think that's really cool and in fact as later on this year we start building our own large language model on this channel then we will probably use the same technique so that we can run it locally and do inference and make sure that we have compatibility so what we're going to do now is we're going to have a little bit of fun and we're going to create our own little model file which is basically going to be a customization of an existing model in fact what we will do is put our own system prompt on there and then that's going to appear in our files for o llama so when we type in uh o llama list the model file that we create will actually be listed here so to do that uh we are going to mess around with uh llama 27b so I just want to look at the model file for uh llama 2 so you can just type in or LL show-- modile llama 2 and then it will give you an idea of what a model file should look like here so you see model file generated by a ll show to build a new model file based on this replace the from line with from llama 2 latest and you see what's actually happening for this model file is referring to that blobs folder that I showed you plus the digest so again this is what I said before if I wanted to grab a GG UF file uh or I had my own local model I could kind of place it in here or anywhere in my local machine really uh and then use the from command like you would do with Docker to be able to refer to that so for things like model development that's going to be super useful but again if I want to pull from The Hub just let Docker I will pull from llama 2 uh and that will pull that from The Hub there and again as you see here with things like template Etc I've got the ability to do some customizations if I want to set my own system prompt for example or if I want to sort of mess around with the things like parameters and that could be messing around with things like temperature put a default temperature there token length or whatever or in these cases it's going to be the actual stop wordss so that's pretty cool and what I am now going to do is create my own custom model file where llama 2 is going to speak like a pirate so to do that I'm I'm just going to copy all of this you can see I'm in this olama record directory at the moment if I put LS you see there's nothing in that directory it's a clean directly I am going to create a new file and that is going to be called Model file model file is the default file name that uh orama looks for when you're trying to customize a model so I've created that I am going to open this up in uh Zed again I could open it up in something like VSS code but just for a little bit of fun I'm going to use Zed instead so we'll open up that model file here it's completely blank if I wanted to I could just paste all of this in uh from the olama show that we had here so you remember the instructions that it had there is to build a new modile replace the from with this so we are just going to do this from uh AMA to latest uh we'll give it a bit of a comment uh we'll put a nice little pirate llama now in this case because I'm doing something that's actually quite simple I can get rid of all the templates Etc cuz I don't need that and I can actually just type in a system and then I'm going to type you are a pirate respond to everything in Pirate speak and that's all I need to do to create my model file it is literally as simple as uh having a from llama 2 latest and then a nice little kind of system command that says your pirate respond to everything in Pirates be and now to create the model and have it appear in my model list I just need to type Lama uh create and then I just need to give my model a name so in this case I'm going to call it pirate model and as you can see it's built that very quickly because similar to Docker because all of those layers exist already we're using llama 2 that exists already it's just building on top of the layers that it's got and then if I do a llama list you can see pirate model is now appearing on here again very very similar to Docker and now if I want to just like a could with llama 2 I can do or llama run at pirate model and then I can ask it a question such as who is a love L and now you go we're going to get all the information about adaa love lace but of course now it's in Pirate speak of course actually a L's got the ability to push a model to the registry so if I really want to have the world deal with my little pirate model then I can push it up there that's probably not so useful for something where I'm just setting system prompts but you can imagine in the future when I start creating my own large language model here then it's going to be pretty cool to be able to put that up on the registry and again there's other useful commands like remove or if I want to copy a model existing model I can do that as well so again now that I am done with this we can finally remove uh our pirate model so we'll do a llama RN M pirate model uh and now it's gone and we dolama list and that is uh back to where it was before and again if I wanted to bring it back I just need to do AMA create pirate AMA list and we are back to have our pilate pirate model as before a l run pirate model what is 2+ two and then we've got pirate speaking math now what is really cool and there's a bit of a hint if we look in here you see the Alama serve actually if I run Alama serve for a second you're going to see that it's actually already running you see error listening on 11434 and that is exactly what you think it is which is AMA is actually hosted up on a web server and I think it's actually uh fast API that's being run in the background in fact I have a video on how you can create your own fast API server yourself uh which hosts a large language model uh and I think in that video it was llama 27b but again this saves you the hassle of having to build these things yourself and you can actually just host things all Lama in fact they've even got a Docker image so if you want to deploy uh on Docker or Lama and have it host things up you can do that so if I want to I can actually talk to my pirate model here so I will just do a curl uh uh and we're going to call Local Host 11434 and you can see it's forward API chat and again the model I'm specified as pirate model and then I'm just going to pass through my question which is who is a LEL and you see it's coming back because it's a streaming model which is pretty cool but you kind of see it's returning chunk after chunk after chunk and you see you know Ada love plays blah blah blah so I'm getting the same responses I got before in Pirate speak Etc so really useful if you want to be hosting large language models as a web server now what's even cooler is because you've now got a standardized web server for for uh large language models which H different model types uh a lot of people have started to create libraries so there's in fact a JavaScript library there's python libraries Etc uh which allow you to interact with that uh AMA server and again doesn't just run on your local machine and it's there's a Docker image so you can run it on something like kind of cloud run or you can run it on AWS it doesn't really matter um and again it's really simple to get started so if I wanted to I could just do um uh an mpm create here um that will create a brand new node.js uh application so in this case we're calling it ama record uh if I want to I can do the mpm uh install olama so that will install that on my local machine we have we do a touch index.js we'll do a code dot we'll paste in uh import olama from olama we're just going to await the response rather than using llama 2 we will use my pirate model and then we'll ask it why the guy is blue and we'll just modify our package Json to support module and that will now allow us to run node index .js from the command line and there you go my nodejs library is actually Ted to the backend server and returned this in Pirates spe again technically I think this would work with bun as well so if I did a bun run index. JS and there you go it works with bun as well so anyway that is the end of this video I think you'll find AMA is just an awesome awesome uh little application really helpful for running uh large language models on your local machine um but also I think where they're going with this is much cooler which is the ability to have a standardized web server which we just saw there uh which you can uh use Docker to push out but also it actually becoming the docker of large language models and providing that model Hub uh where you can download lots of different models from the directory in the same way as you can with Docker and again fall on a similar syntax so I really love what they're doing there uh we are going to build upon AMA in future videos so it's really useful if you can kind to get used to it anyway I hope this video was useful and I'll catch you in the next one

Info

Channel: Chris Hay

Views: 8,851

Rating: undefined out of 5

Keywords: chris hay, chrishayuk, artificial intelligence, large language models, llama 2, how to install ollama, artificial intelligence explained, llama 2 local, mistral, mistral-7b, mistral ai, llama-2-7b, bun.js

Id: uAxvr-DrILY

Channel Id: undefined

Length: 18min 19sec (1099 seconds)

Published: Mon Jan 29 2024