Run A Private AI model In Your Computer And On The Command Line with @Ollama

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

let's keep going down the rabbit hole of models that you can run locally on your computer we often think of AI as being something that has to happen out there right it takes a whole lot of gpus it takes a whole lot of processing it takes a whole lot of money we think of these super well-funded companies like you know open AI um or it's it's it's competitor anthropic um and the um or or or coher or a21 or all these others right they're aming all those crazy amounts of money but but there's an alternative to this uh which is the universe of open- source models uh that are available to us today and uh in a previous video I talked about how we could you know access those relatively easy in something called LM Studio but I want to go down that rabbit hole just a little bit more here and talk about olama which is a lower level way of accessing local models using the same fundamental ggf technology and talk a little bit how you can bring bring your own custom models to Barett in there now the idea behind AMA is that they are focused just on how to connect your local model uh with uh something you can run and it uses a standard called ggf uh which um basically means that they have a a library of models they know have been translated into jiga format uh that you can then run locally on your computer so we have you know access to llama 2 access to mistol uh and then like you know lots and lots and lots of others and as we'll see in a couple minutes you can actually uh tell Ama to use any of these open source models once you download them to your local machine the interface is not as friendly as it is in LM studio uh because it's all really working off of a command line so let me bring up this command line over here and we can do you know AMA uh run llama uh two you know um well actually I can do is just do an AMA list to show which I've actually downloaded uh and you know olama always use d-el to find out how a given program wants to be run this is all available to me by the way because the very first thing you can do with AMA is just right from their homepage uh you can just download it uh directly to your computer and you know install it and one of the very cool things is once you've installed it it actually installs a little bit of a um a server into memory and that's the reason why you have this uh llama up here in your uh in your address bar um but you know and right now it's only for Mac OS uh they say they're coming out with Windows relatively soon so if you're Windows uh you want to be looking at LM Studio which is mac and windows if you are Mac OS you would can use either of these and it seems like AMA also supports Linux on a first class basis uh whereas that's more beta over on LM Studio I haven't actually experimented with either of these things on Linux just yet although I think that gets to be a pretty cool way this thing can proceed so let's talk a little bit more about about uh what we find here in the terminal uh and the um and and one of the most straightforward things we can do is just pull one of these models uh from the registry and to do that you just need to know what model you're pulling and that's where uh these names come in or you can reference a a local one we'll talk about that in just a minute like how you can create your own uh and then we can uh say well let uh you know Lama run uh llama Tu and it will kind of preload the thing into memory then we can start asking it questions like we can ask questions like um you know uh make a a good uh description for a YouTube video discussing local large language models and it will just you know calculate and it will run it this is all happening entirely on my computer there's nothing that's going up to the cloud nothing that's going over to open aai or to anyone else and here is making me a whole bunch of description uh which might be you know more than we really care for but can be relatively useful now the question is like are these model outputs going to be relevant to you and that's going to be a question of you know the usual things figuring out which model you want to use and now of course instead of just using open AI you can use any of these uh and then the um and then you can say bye and be done uh and you can use open uh so so instead of having to go send things over open AI it's all happening locally you get to choose one of these models or bring your own to the party um and um and really the only price that you pay for it is going to be speed uh that these models are quite large and this formatting allows them to run on CPUs instead of gpus uh but the things are generally going to run more slowly than they do when they're just running off in the um run off in the cloud but that's probably fine uh there's a lot that we can do with that now I got pretty excited about this technology um and I got excited about it in sort of a couple different directions the first is the ability to bring whatever model you want to Bear now the the shopping list and and search functionality sitting over in LM studio is pretty great it's a bit more limited over here they just have their known registry of models but you can also if we were to click into their GitHub this their GitHub uh page uh you can you know scroll down and you can say you can import um you know your own model now there are a couple ways of doing this you can import it directly from pytorch but the best way to do this I think is to have converted the thing into ggu first and that can be a whole different conversation but if you have a ggu model that you might have downloaded from hugging face or maybe you copied over from like the cach directory in Elm Studio or whatever you can say you want to use it in olama and you do that by creating what they call a model file now the way you do this is like if I just say LS um you can see here I've got the this um you know this model I previously created and fine-tuned I I use as an example in in a previous video as well uh and I can you know code model file right to which will then create something called Model file right and I can take their instructions here which I'm just going to move over this to the side and move this over to here great uh and I'm going to say well let's just add from SL whatever right now that means work from this particular directory but that's okay uh let's copy the path and we'll say copy that path right so from here it will use this particular model file great going say save that uh and now we can go back over to my terminal and I can follow again follow the instructions here and say ol llama create a YT sample uh DF model file right and will'll B it will build out uh it will basically tell ol to take its cues from this particular G file and will simply call it um it'll call it YT sample cool cut out a few seconds there but basically this thing took about 20 seconds complete um and creates a a layer and writes a layer and when you see things writing layers you start to wonder this sort of using some Docker code behind the scenes but leaving that issue aside I can do now ama ama list and it will show me not only the Llama 2 that I previously downloaded but also my YT sample and now I can do uh o Lama run YT sample sample there we go keep in mind you don't get autocomplete for um these model names and I can ask it why is this sky so blue and we'll see whether it get TOS about where this Gathering yeah right see and now it talks about R scattering so this allows me from the command line to have all the chats I'd like to have so uh now we can say um uh let's see uh Slash question mark to get the list of commands and uh show to see information about this particular model uh oh show yts oh show uh more specific information have about model file which is just the fact that it came from the from right and this is what I download it from and then what the template is which is not very exciting okay so uh you can you know go learn more Explore More with llama all on the command line now there's one more thing I want to show off though uh which is uh as as part of this experimentation uh um I made um a little utility uh that if you have installed node.js uh you can have access to as well and uh that is um you know something I'm calling npx uh olama CLI right and if you have if you have node.js installed you should have access to npx and uh you can dos AMA CLI because uh you know npmjs uh package AMA CLI which actually something I I created uh just uh just over the course this weekend uh which also demonstrates how you can just Implement a little bit of code to make this thing even more useful from a command line point of view one of the things you probably notice is I'm able to do a chat from the command line but chats from the command line are perhaps less friendly than like what we could do over at LM studio so like how can I make olama more useful even than uh than than LM Studio and that is where things get pretty cool uh because I there is a library uh that sits on top of ol uh that was made by one of the olama uh maintainers it's called olama node which I can go click on here node the cool thing about AMA node is it provides a library to communicate with olama if you installed it on your machine I.E if the uh little icon is is up there in your your in your bar uh because then I can use its API and I added this to make a little CLI for it that way inste having to say run I can just say you know npx orama CLI why is the sky blue um and it actually has a couple other features that are pretty handy but the first thing the most important is I can just run something on the and it will just deliver this back to standard output so from the point of view of creating like a shell script or an automation this just works right as long as the llama's been installed on the machine I'm not going to say that it's going to win land speed records I mean it is running on your local machine um but it is uh it's pretty cool that it's able to produce this and that just went to standard output which means I can just feed it to the next element in my pipeline I can put into a file using like redirect or or or whatever else I want right from a shell scripting point of view you know sort of lightweight automation it becomes pretty helpful but the other thing I can do is and by the way if you want the full output uh like you know finding out you know how long it took and what the temperature was Etc you can you can uh get all that through just asking for the Json output um but there's one more thing that it does which is most of these models think in terms of markdown but they can do things like code generation so I could do npx you know AMA CLI um you know take the numbers one two 3 and format them as a Json uh you know Json array right now ordinarily if I do this uh it will do a little throat clearing explain a bit of the code but it will create it in the form of you know markdown and this little markdown indication of like where the code starts and stops well since we know that's going to be marked down we know that's where the code's going to start and stop what we can do is you know introduce just a little bit of logic which is exactly what I've done in here to say add code to it and instead it will just output the Json and then I could pipe that something like JQ you know to go find out the first element in there or something and it will tell me what the you know first item is um by using you know quering you know Json or whatever it might be um you can feed that into like other inputs or or or send that off as a payload to a web request or or whatever it is you're going to find to be most helpful um but the idea is that inition you can you spit back the Json this way it can spit back you know code that gets created you can turn around and pipe that say okay please run that in python or J or node or whatever um but the what this starts to do is sort of expand our universe of what we can do with this technology Beyond just making chat Bots we can create things that really automate for us and rather than saying hey we're we're reaching out into the cloud and allting the cloud into our machines we're bringing this piece of code locally and being able to run it without requiring any network connection if I were to turn off my Wi-Fi entirely this thing continues to work and continues to give me pretty good insights uh which I find just really uh really remarkable um this is all made possible by um the the the efforts of of of of this engineer uh and the and the team that he's been putting together um and the um uh and and what was originally the gbt for all initiative and now is the and then for a while there was gml and now there's GG UF which I think is the GG Universal format uh which is the main way that we are formatting these uh models now all made possible through a couple projects that uh this group has put together one of which is called llama.el using ordinary CPUs uh and so what we are doing is we're formatting the models previously were designed for hugging face pie torch and like you know uh designed with hugging face Transformers in mind that were going to take advantage of big heavy CP uh gpus for the purpose of doing all the training um and turns it into a format that is easily portable just downloadable just like we did here uh that responds to some relatively uh standard types of inputs um and um and you know and then we can run it locally on these much lower powered machines right now it runs pretty well on both int machines as well as Apple silicon and I'm really looking forward to what can happen next with it because the ability to take these models and run them in a way that is under your control and making about your model is really where I think the future is of all this okay hopefully uh this has given you a little bit of an intro to AMA hopefully this has given you the idea that you can be bringing whatever models and whatever use cases you think are going to be most helpful to you and maybe some ideas what you can do to extend it just a little bit further making use of not just AMA but you know maybe the AMA CLI um or or other tooling that you might bring to bear uh to make your own extraordinary applications uh that are leveraging this um this very cool AI technology um certainly AI is usually part of the toughest 5% and that's the kind of thing that we're always working on over at State change um you know for for AI no code low code projects uh and for automations and um you know if you've got questions about what we've been seeing here please put them into the comments we love to keep the conversation going and if you've got particularly challenging things for your you know economically meaningful projects uh please check us out over at State change I'll see you next time

Info

Channel: State Change

Views: 191

Rating: undefined out of 5

Keywords: Descript

Id: jgI0EaKDmlc

Channel Id: undefined

Length: 16min 40sec (1000 seconds)

Published: Sun Dec 17 2023