Ollama and Python for Local AI LLM Systems (Ollama, Llama2, Python)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back as you know I am Eli the computer guy and in today's class we're going to be learning about how to use olama and python in order to run large language models locally on our own systems so I'm going to say something a bit controversial here I'm not sure if large language models are actually AI frankly I'm not even sure what the hell AI is at this point in time but what I can tell you is that large language models are incredibly valuable for creating decisionmaking system so basically if you're getting user input and then trying to trigger an event based off of that user input large language models can be incredibly valuable and for parsing the information for basically what's called natural language processing where a user interacts with your system and they talk however it is that they normally talk and basically getting your system to under understand again what the hell it's asking for now the thing is there's a lot of uh you know apis out there there's a open AI I've done a lot of classes on open AI at this point uh there are Claude from anthropic that's an API but with the apis what happens is you install the module or whatever onto your computer and then your computer has to have an internet connection to communicate with their servers uh and basically the query the request goes up and then you get a response from their servers now there's any number of reasons why you may not want uh to use an API from somebody else's service maybe there are security issues that you're concerned about uh or simply maybe you don't have a stable internet connection and so uh being required to always make an API call could be a big problem so the cool thing is is that many companies have actually put out open sour either open source or open-ish warning warning Will Robinson be careful of the licensing when you start using this but anyways it's either open source or open-ish right they've put out their models uh so that you can actually run these systems locally they may not be quite as good as open AI or CLA but they're actually pretty darn good uh the problem that you run into though is like for me I decided to go to meta uh and start trying to run llama 2 uh on my MacBook Pro so I have an M2 or yeah an M2 Max or whatever uh MacBook Pro that I bought last year I use metas five steps for installing llama 2 onto it and I did install it on to it and then I ran a query and it took like 4 minutes to run basically I installed llama 2 per the instructions from Facebook onto a 2019 MacBook Pro and I crap you not each request took 10 minutes to run uh I then put it on my my M2 MacBook Pro uh we got it down to 4 minutes but I was a bit frustrated because obviously I mean that works I mean it does work it's just yeah doesn't work very well so anyways uh the cool thing is I had heard about olama and basically olama is a framework for you to be able to install onto your computer so whether it's Mac whether it's Linux or whether it's Windows you install Olam on your computer uh and then you can add large language models to olama and they actually run uh really really really well and the cool part is you can actually there's a python module so you can actually use Python to interact with AMA so a lot of the uh the the function and features that you were getting out of something like open AI you can actually now run that all locally uh and so that's why I'm going to be showing you how to use AMA today shockingly easy to use and very good performance on whatever system that you're using uh and gives you a lot of ability to choose many different models uh that you might want to test for whatever project it is that you're trying to create so before we get into the class today let me give you a demonstration of how AMA works uh so basically we got our nice uh terminal here the command line I simply type in the command o Lama then I do run and then I tell it uh what model that I want it to use so I'm going to say llama 2 right so I'm going to hit that it is then going to load llama 2 and then I'm actually going to get a llama shell right you've seen uh you've seen SQL shells and you've seen Linux shells well now you have a llama an AI shell and here uh you can plug messages so I can say something like hello and it gives me a nice little response there I can say R Birds real and it gives me a very long response there we're not going to read all of this by default I will say llama 2 is very verbose it just does not shut the hell up uh but as you can see uh basically it literally performs that quickly even as something random as you know our Birds real uh if we go over and we take a look at the python script right so I have a user python script here uh basically this is the query that's going to go uh to llama 2 and then down here I have a while true Loop so basically when this when the script runs It'll ask me a question what do I care about uh it'll give me a response and it'll just keep looping so I can keep asking a questions without having to rerun the uh the script again uh so I can click on the Run button again says how can I help you I can sayell hello here it says hi there how are you cool I can say uh are birds real uh of course birds are living flying creatures it gives you how can I help you again the thing here is basically what I'm doing is I'm modifying the query up here to say answer in 25 or fewer words since I don't want that whole verb verbos response and so that's a kind of cool thing you can do with python is you can add uh those uh basically prompt in injections is what they're called uh so that's how you run um a llama from the shell uh and from the python script and with that uh we'll dive into this class and show you how to use AMA because it is incredibly incredibly easy to use so the first thing that we need to do is take a look at the oama website so it's hard to see for you folks out there but literally it's o.com you just go there and you should get a nice little llama picture take can look at you uh when you're going to download this and install it onto your computer you just click on the download button they have the blog the GitHub and then the models right so if you're going to install um this I already have it installed in my system uh basically all you do is you go down you click on the download uh it'll give you a download for Mac OS if you go to Linux uh basically it'll give you a curl command to install uh onto your system and if you go to Windows it's currently in preview version depending on when you watch this it'll probably be out of preview version but olama is a pretty simple piece of software should so it should be pretty easy to use uh with whichever system that you're going to be using it with uh then we can go up here to models so models is is where the the real magic happens right so Alama is just a framework that allows you to run these models and this is where you can come in and you can take a look at the metric crap ton of models uh that are now currently available again one of the things that I will War worry about remember we are technology professionals as technology professionals and decision makers it is as as important to understand things as licensing as the technical aspects of whatever it is that we're doing so be very careful about that again what you do in a lab environment is probably fine you know what you do in your own home nobody's going to know ah but if you are testing and you are building with the idea that you're going to put this into a production environment for the love of Ky make sure you understand licenses for everything that you're using to verify that you're not violating that it may not seem like a big deal now but do remember as far as regulations are concerned and auditing is concerned for corporations uh there will be new rules that come down the pike and so if you build things now that are within compliance then two years from now when your company is being audited you get to go have a cup of coffee and read a book cuz you're not too worried about it right you know you have all the paperwork yep these these models are open source or whatever this is right there's all the documentation you don't have to be worried about it so I do be careful with that but if you come here you will see there are just a crap ton of different models you have the mistal model deep seek you have coder models uh so deep seek coder is capable of coding model trained on two trillion code and natural language tokens Orca uh dolphin mistal tiny dolphin experimental 1.1 billion parameters so one of the things to be thinking about is if you're thinking about using one of these different models is essentially the smaller model uh it is uh the the less resources are required right so like a one billion uh parameter model should should run on something like a Raspberry Pi uh llama 2 that has a 70 billion 70 billion uh parameter module uh basically you need at least least 64 gigs of RAM to run on that so that's one of the things to be thinking about with these models too is if uh the model is running really really really slow on your computer simply by swapping the model out you might find a model that runs better for you again with these models the more parameters there are theoretically the better the model will run but again depending on your particular system if you just need your computer to say hello or to do some very basic automation tasks so one a 1.1 billion parameter model might might be fine for you and then again like a lot of these are trained for things like coding and a whole bunch of different things so that's what you'll have to figure out what you care about uh but this you're actually going to you're not going to download from here you're actually going to pull from olama itself so now that you know how to install olama and you know where to go and just take a look at the different uh models basically we're are going to go back to the command prom now and the first thing that you may want to do is pull a model down now it is important to understand is when you first run a model it will automatically pull that model for you but just for you know Peace of Mind purposes you may want to pull the model down so basically all you're going to be doing is you're going to be downloading that model onto your computer again this is something to think about is you know take a look at the sizes of these different models right the the Llama 2 70 billion parameter one uh something like 36 gigs so one of the things you're going you might want to take a look at is literally how physically big these things are and even if you have the hard drive space the storage space it might take a long time for you to download but if you want to pull a model all right so we are just at the command line so we are just at the command line here uh so we call Ama so AMA has already been installed you've run through through the process and then you simply do poll so the pole command is going to download whatever model you tell it to download and uh I don't have this installed on my system yet so we're going to download Pi so Pi uh is a Microsoft uh model that they allow you to download it's really it's a small one so it shouldn't take too long uh then I will hit enter basically we can see is pulling the Manifest pulling the Manifest we're going through this whole process where it's going to be downloading and again so this is 1.6 gig so this is a small small model I think it's less than three billion uh parameters uh so it won't take very long and so as you can see we're Trucking along we get to the 100% Mark keeps going through again we go through these nice little you know screens that we get to in the tech World um um there we go so uh verifying so it's verifying it writing the Manifest removing unused layers and now we are back to the command prompt so that's all you have to do uh to pull one of these models down and you may want to do this again with one of those large models you may want to pull the model down and go have some launch uh past that then I can simply type in clear and then what happens if we want to run a model uh so we're going to do o llama we're going to do run and then I'm simply going to do llama 2 uh so llama 2 is the model that we're going to be running yada yada uh then we're going to hit enter uh and again depending on how large the model is this may be fast or slow and how many resources you have so again I have a Oney old less than oneyear old uh M2 Max or whatever MacBook Pro so this runs pretty good with 32 gigs of RAM your system might be slower uh so from this we simply have this nice little uh shell right uh you will see that there's a slash question mark you can take a look at this and if you do slash question mark it will give you some options of what to do you can set session variables you can show the model information you can do a lot of different things one of the big things is again always know how to get out of a shell uh Slash by is what will get you out of the shell if you need it then pass then there I can simply type something like hello down here and uh you know hello it's nice to meet you is there something I can do um I can say are birds real real and then it goes through and like I say it gives me all this information about how birds are real uh I can say what is a door I don't know what's a door Baker give us something really random see what the answer will be um I believe there we go it will give you an answer um for whatever you you say I believe you you meant what is a door what is a door Baker a door Baker is a humorous term that does not have a specific meaning or definition so there you go it will actually tell you what a door Baker is if you care what a door Baker is so you can sit here and you can keep plugging things in and the cool part with these different models like if you want to use a different model you simply leave you do buy uh once you've left uh you can do ol so run on Pi so right that was the one we just uh put in and then we have pi and we can say I don't know are birds real with pi and uh yes birds are very real and you see you get a different response back this is going to be an important thing when you're using AI quote unquote AI when you're using large language models let's just keep it that when the important things to be thinking about when you use large language models is how does the large language model actually resp respond does it respond does it quote unquote talk in a way that you appreciate or does it talk in a way that your customers will appreciate what is technically impressive can practically be a piece of garbage right and so one of the cool Parts about this is you can go through and you can use all of these different models just kind of like that for that taste and feel like what do your people like the the coke of llms or the Pepsi of llms I mean hell maybe maybe your clients like the RC Cola of llms right and the cool thing here is you can use all of these different models it's so easy to use these different models to just get that feel uh for how they operate so you can decide which one is most appropriate for your particular environment okay so using oama at the Shell level is a cool party trick but that's not really a full-fledged product right you can sit there you can test a whole bunch of different models see what gives the best types of responses but that's not something you're going to give to the end user that's probably not something that you really want to use a lot at the end of the day uh so what you're going to want to do is you're going to want to use Python and with python you're able to interact with AMA and then you're able to add on all the additional functionality that you want so if you want to create a web app you can use bottle or D Jango or whatever else basically have a little text input form that text input form can then you know send the query to the function that you create for interacting with AMA the value can come back and you can dump it into a database or whatever else python is what's going to make olama really really cool uh uh you know with everything in Python you're going to need to install the package you're going to need to call the mod uh module uh so it's AMA here if we go over to piie so our standard Pi pie uh pip install oama o l m uh that will install it onto your system and if you take a look here they'll give you some of the basic usage and different ways to play with Obama so that's here one of the big things to realize though is I don't know if you can really see it here is you actually need to be running AMA so this this might cause you a little bit of problem so so AMA when you start running AMA on your system so when when we started this to have olama run this is now running as a service on your system so basically what python is going to do is it's going to start communicating with a service that's already running so if you have not started llama yet or o llama yet with whatever uh model you want to use the first thing that you're going to do is you're going to have to do that right so if we do Buy U what we can do here let me clear the screen so we can do o Lama and then run llama 2 so this is going to start o Lama with a llama 2 uh model and now we can go over to Python and we can start interacting with o Lama because it's running again if you're on Mac you're going to see a little little icon up there to show AMA is running so if you run the python on script and it fails for like weird reasons most likely olama is not currently running uh if we go over we can take a look at the script uh and it's pretty simple again we're going to import the oama model so response this looks a lot like when you're interacting with open AI so response is going to equal o l. chat you're going to plug in the model so llama 2 pi um for the different llamas so it's colon so llama 2 colon 7B llama 2 colon 13B llama 2 colon 7 B if you've got a lot of resources I can run it runs I can run 70b uh on my MacBook Pro again with 32 gigs of RAM you know again it comes it takes about two minutes for it to actually process a response so depending on what you're doing that's one thing too is you'll hear about that with how many resources you need in order to run these different models and the important thing to understand is that's kind of like under load like if you're going to put this into production if you're just doing testing and it doesn't really matter if takes a minute to get a response back you can use some of these very large models again it will just be very very slow so if you're just doing very best basic testing you can run you can you can run your query go get a cup of C cup of coffee come back see the results and that might be all you need in order to prove some kind of point to yourself but anyways so the model here you just tell it what model you want so you have messages equals and then so roll is user and then content this is going to be what your query is are birds real CU that's just just the query I keep asking then we're going to come down here so response this is going to bring back a value for response then we're going to print out response message content and you'll notice if you've been using open AI that looks a lot like how open AI does it and so if we run this so I'm going to go up here and going to run it it'll take a second and then we get everything here so again it is pretty verbose of course birds are real birds are living creatures that belong to the class Avis and are characterized by their feathers wings and beaks Blas Blas Blas Blas Blas Blas Blas so basically if all you want to do is you just want to play with this this is the basic script for getting uh olama to run in Python past that I then created another script uh that gives us a little bit more U utility uh basically we're going to have an input function so the input function is where you get to Tippy tap type out your query into you know just on the command line uh and then it's going to do a while true Loop so basically it's going to give us the response uh and then when we're done we can add to another response and another response and another response this makes it a lot easier again to figure out how the system works uh the other thing is I have also added I've done a prompt injection to say keep it below 25 words again that's the kind of thing you are going to have to be thinking about when you're using these AI systems is how to you know how to how to bias bias your AI into the direction that you care about uh so with this we're going to import olama uh as we did before then we're going to import OS so we're going to be interacting with the operating system a little bit so we need that OS uh model module uh then we're going to come down here so we're going to have os. system clear so what this does is it's going to clear out the command line again to make this a little cleaner and easier to use basically when you run the script it's just going to delete out everything that's that's on that command line page to give you a nice clean interface so while true query equals input how can I help you so basically it's going to say how can I help you you're going to type something in when you hit enter again we're going to use that clear again so we're going to clear out the screen answer is going to equal ask so the ask ask function query and then we're simply going to print out um what the question was so question and what the the the query so what the question was and then we're going to print out answer so the answer is going to come from when we submit the query to the ask function so we're going to print out what the query was and what the answer is if we come up here and we take a look at the function itself very very simple function again this is really cool so uh again Define ask so query comes in so query is now going to equal right so this is where we're going to do a prompt injection so the original query so the query value that comes in so we're using an F string hyphen answer in 25 for fewer words to keep this damn thing kind of short all right response equals a llama chat model uh I have here llama 2 7B so this is how you can DET like with something like llama 2 there's 7B there's 13B there's 70b so I can say I want it to be 7B uh messages uh roll user content so the query so basically we're we're submitting the value for the query variable here and then the response is going to equal again response equals response message content so the response is going to come back then I'm going to grab the content from response going to just assign that to response and then I'm going to return response so basically I'm going to be returning what the results are from llama 2 and that's what's going to get printed out on the screen uh so if I hit run here you can see like it first loaded up and then we have OS system clear so it clears the screen so how can I help you and I can simply type in are birds real so question are birds real of course birds are living breathing creatures with feathers blah blah blah and it keeps it nice and reasonably short and then gives us that prompt again from here you know I can say um are birds I don't know alive I don't know what to ask it of course birds are living creatures they breathe eat and have a circulatory system uh can birds vote birds cannot vote is they are not human beings see that's the cool thing about that while true Loop is you can just sit here and you can just continuously play with it and especially this can be very useful again like in a real project when you really are trying to figure out real responses you can just sit here and keep hammering it with all these different uh requests uh trying to figure out whether it'll give you a bad response or if all the responses look good and a very very simple system like like this before you implement this into a much larger application or package that you might be wanting to deploy so anyways that's really all there is uh to to how to use Python with AMA the big thing the big thing is make sure AMA is already running on your system if you've quit olama in the past or whatever else all this will fail because basically AMA is running as a service and python is interacting with that service uh in order to get these results so there you go you have now learned how to install llama how to pull down whatever model it is that you want to use you've learned how to interact with AMA at that shell uh level and you've learned how to use python in order to interact with AMA in order to build uh larger platforms or systems that you want to um I think this is incredibly valuable one of the interesting things that I found with llms is we keep hearing about like the next uh chat GPT version you know chat GPT 3 and then four and then five and then 28 and I actually have a question as a professional as to how sophisticated an llm really needs to be for the functionality that most users need when they're interacting with a system right not that they shouldn't make better llms Again by all means like it's kind of like with 5G like 5G is fine I just don't really care that much they're already talking about 6G which again I will take 6G I also don't really care that much again with the llms as they get more and more sophisticated as a as they start having you know trillions and trillions of parameters within the llms one the interesting questions is is do you actually need an llm that's that sophisticated or would a much smaller much faster llm actually do whatever it is that you need done uh so of my projects that we've worked on in the past and I will be working on more in the future is I have my little robot right so I have a little computer vision robot uh I think it uses what's called an as6 from gecom uh as the main CPU I have it using um open CV uh I'm able to talk to it it's able to talk to me the whole nine yards well one of the interesting things is interacting with that robot is I need a way for it to understand what I'm asking for right so I have a uh a file with all the different functions so a Time function a hello function a goodbye function a what's the weather function a tell a joke function right there's all the these functions and so what I want to be able to do is I want to be able to talk so speak uh to my robot uh and then have that robot determine what function I'm asking for without me having to hardcode in trigger words right if I have to hardcode in trigger words that's just going to be a misery I just want to be able to talk to the computer and have it figure out what it is that I'm looking for if I say bonjour I wanted to Simply say hello back to me if I say do I need an umbrella I wanted to Simply tell me whatever the hell the weather is right and that's one of the cool things you can do with llm so like with that robot what I can have what can happen is I can use speech recognition so there's a Google speech recognition service I can talk to my robot it turns my speech literally into text into ask you text basically that asky text could then be sent to the llm with my function file and basically all the the the query is is what function is this person asking for all it does is return the function and then that then once that function is returned the whatever function it corresponds to is fired off right and that is something again that is incredibly powerful again imagine using natural language processing in in pretty normal systems that your users or you would be inter acting with but at the same time it's also not the most sophisticated thing in the world right I don't I don't need my llm uh to create new new Shakespeare plays right I don't need it to explain quantum physics to me I needed to say if somebody says bonjour oh they're asking for the hello function and so I think that's one of the things to be thinking about with these LMS is again there there's so much there's so much focus on bigger better faster because obviously there's a lot of money in that but I think for a lot of real professionals if you actually start looking at the smaller models those might uh give you more resources and more value than you're expecting and what's great with AMA is again you just install AMA and you do it oh llama was great right because again like like I I did I use meta's instructions I use the instructions from meta I didn't use some like back alley instructions for how to install llama 2 I used uh metas metas llama 2 instructions and again it worked I can't tell I can't tell you because it did in fact work it just took anywhere between four to for 10 minutes to to to respond to a uh to a request right obviously not very useful I have to say I installed o Lama and yeah within within 10 minutes I was writing python code and getting responses back and and you can see the performance here um so again whatever you whatever product you end up using at the end of the day I really think AMA is just an amazing amazing tool for starting with all of this kind of stuff uh so as always uh I enjoyed uh teaching this particular class and I look forward to see you the next one
Info
Channel: Eli the Computer Guy
Views: 5,193
Rating: undefined out of 5
Keywords: Eli, the, Computer, Guy, Repair, Networking, Tech, IT, Startup, Arduino, iot
Id: etHl_bEfQ0c
Channel Id: undefined
Length: 30min 10sec (1810 seconds)
Published: Thu Mar 28 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.