Ollama - Libraries, Vision and Updates

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Okay. So I first introduced Ollama back in the start of October last year on the channel. and back then I was impressed with it and it's just grown more and more. So in this video, what I want to do is look at some of the latest things that they've added to Ollama. how you can use them, how you can get started with them. And show you some sort of setups that we can then use for videos going forward for using a local open source LLMs on your computer for doing RAG and for doing, agents and stuff like that. So the three main things that I want to cover in this is the one that they've added a whole set of Python and JavaScript libraries for Ollama. So this allows you to do a lot of things without having to use a tool like LangChain or LlamaIndex, or some of the other ones that are out there. and it really makes it easy to make quick dirty scripts that you can just run in the background to do a whole bunch of different things. the second thing up is the whole, Adding a vision models to Ollama. So this really, I think it started late last year, with the some of the early LLaVA models that they added, but recently they've gone to the all in on adding a lot of support for vision models. And I want to look at how we can use those both in the command line, but also how we can use them through an API. And then perhaps later on in the future, we can look at how to use them through LangChain The next thing that I want to go through in some more depth is the whole, OpenAI compatibility. Now this is becoming a standard thing. I may have mentioned LiteLLM in some of my previous videos. That's one of the tools that we've been using before for giving open LLMs, a similar kind of API to OpenAI. But Ollama has now integrated this into, their whole sort of system. And it certainly makes it easier to go from model to model, and being able to benchmark them against each other So we'll have a look at that as well. And lastly some of the other small things that they've added actually into being able to use and save and load sessions in here with your models. which is really nice thing if you're working on something and you want to save it so that you can come back to it and try out different prompts and stuff in the future. All right. Let's jump in and have a look at these things one by one. So one of the cool things that they added towards the end of January with Ollama was the whole addition of Python and JavaScript libraries. So previously they had an end point that you could then call and you could use things like the Python request library to do that. You could use LangChain, et cetera, to do that. But now making their own libraries, means that a lot of that is just going to be taken care of for you. So the cool thing is they have both a Python version of this and a JavaScript version of this. the JavaScript version is just an NPM install and quite simple to use. I'm going to focus on the Python one in here and have a look at this. So you can see that the setup for this is pretty simple. We just basically PIP install Ollama. And then you can see that for using it for both sort of cases it's very simple. We can see this code up here that we just import Ollama. we then basically put together a set of messages in a similar kind of style to the OpenAI chat stuff, and then we can call the ollama.chat chat endpoint from here. in JavaScript, we've got a very similar kind of thing. You basically, for both of them, you specify your model. you go through and do that. One of the things that's really nice here too, is that With the addition of the vision models. which I'll talk about in the next section. you can now basically sort of automate calling these models. if you wanted to do a thing where you've got a folder full of images, and you just want to write up maybe a just quick description for them or something like that. you can just basically automate this and just have it go through. And this is, one of the things that I think is really interesting about this whole library, is that most people have been focused on using the large language models, purely as chat bots and things that you interact with real time, going back and forth. I think people are seeing that one of the, the newer sort of ways of using these things is to basically have them automating certain tasks and just going on in the background. and this library really opens that up for the Ollama models. And the cool thing is, don't forget, once we've got this going, we can use it on a LLaMA 2 model. We can use it on a Mistral. We can use it on a Mixtral model. you can use it on a whole bunch of different models, both big and small. You can use it on a bunch of the code models and stuff like that. And so just having this sort of sit there on your system, going on in the background and processing stuff, I find to be a really useful tool for a lot of different tasks that I'm trying to automate. whether that be combined with a scraper that goes and gets the latest news each day and brings it down, analyzes it, then just picks up parts for me. that kind of stuff for looking at every screenshot that I do and basically cataloging the screenshots that I make. all these things are very tools that are pretty easy to write. Once you've got this sort of ability to just easily access the local models on your drive, on your machine. and that doesn't have to be real time interaction with you. They can just run like a Cron job at various times throughout the day. So let's just have a little play with doing this in Python locally. So I can show you some examples. So using the Python library is pretty simple in here. We basically just import Ollama. and then we can use ollama.chat pass in the model that we want. and then we've got the sort of standard messaging, role of user content, That kind of thing as we go through. If you want to do something streaming, you can say here, I basically bringing it in, passing in, the user content. Why is the sky blue? This is one of their examples. and then we can basically stream out, we can basically sort of print out for each of these parts, as we're streaming back here And you can see the I'm using the Mistral model here. And I'm using the LLaMA 2 model here for the exact same thing. So if we look at it, let's see how long it does actually take to output the response. So it's got a first load up the Mistral model. Generate the response, then load up the LLaMA 2 model and generate the response for that. And you'll see that it takes a little bit of time to actually load up the model. but once it's got that, it's then basically streaming quite quickly here. And I'm not using a super powerful Mac here. So, okay, the Mistral part is done. it's now loading up the L LaMA 2 part. And you'll see that then now it's generating, the LLaMA 2 answer that we've got there. So, this is the simplest example. Like I talked about, you can actually also use it to load up images, to do a whole bunch of different tasks. one of the things that I've had some nice success with is basically using it to have a Python file, that's scraped something. And then bring it down and then use that either for extracting information out of what it's scraped or for doing something like getting news and summarization, stuff like that. So, in the past, you would have been tied to doing this with something like LangChain. Now, if you've got your prompts, you can just basically come in here and set this up. and as you can see that, just like we set up the role user and content and stuff like that. We could also feed in more messages. We could have a system prompt in here. We can have all the sort of normal stuff that we have in here. Okay. So the next thing up is this whole addition of vision models and more support in Ollama for vision models. So the main models here that they've added the new LLaVA of models. so this is basically LLaVA 1.6. I haven't made a video about this. I'll possibly make some videos about using some of these models because they're certainly getting very cool. And really kind of amazing that we're seeing such good open source performance, starting to get close to some of the things like GPT V, like the Gemini Pro-Vision. And some of these things that are out there. so basically they've added in a few of these different models, both the 7B the 13B and the 34B LLaVA model in here. And they've edited in a few different ways to use this as well. So you've got the standard command line interface way of where we can pass in a link to actually using the image and just basically pass in a prompt with that as well. So you can see here that you've just got ollama run llava. And you're basically just saying, describe this image and then passing in the name of the image. That's in a local folder. you'll be able to get a description back from that. the way that I find to be more sort of useful is to actually use the libraries for this. Here's a good use of being able to automate this Where you can get a set of, images or, local image paths, from a folder something like your screenshots folder. and then basically have it describe those. and just save those with the path to a spreadsheet or something. So that any time you want to actually do something, and you could also turn this into a multimodal RAG kind of thing, if you want it to do something like that as well in here. It's kind of interesting for doing that kind of thing. Anyway, they've given you a code for doing this in Python for doing this in Java script. It really makes it too easy to do this overall for things like, image, descriptions, that kind of thing. you can also use it for things like, text recognition. So Surprisingly, these models are actually quite good at being able to read texts that is actually in there. so one of the things I find is that this is a great way for you to be able to sort of index a whole series of images, very quickly with info that's in them. And it's something that, previously we would've had to use GPT V to do this, and then even before then it would have been even harder having to use some kind of captioning model to do this. So this is the vision capabilities, in Ollama they've gotten a useful for doing a bunch of different things in here. The third major announcement that they announced recently just came last week. And this was the addition of the OpenAI compatibility. so like I mentioned at the start, there have been, frameworks out there for doing this. things like, LiteLM, from Berry AI, I think it is. and there have been other people who have sort of integrated this kind of thing. I think The Together AI API has also integrated this. so what this basically allows you to do is use both the OpenAI library itself, or anything that's compatible with that to access the Ollama models here. So you can see here, if we look at the OpenAI Python library, we can see that, it's using the exact same library now as OpenAI. So this is OpenAI's library, except it's just pointing the base URL to a local path, in this case. And that's where Ollama is going to be running. we don't need an, API key. You can basically, just put in anything for the actual API key there. and then the key thing is you add in the model. So you can see from these examples here. the model itself is the LLaMA 2 model. And then it's got the exact same format as OpenAI uses for everything. So you've got role being system for the first one, content, you are a helpful assistant. Then you've got role user, role assistant, role user, etc. as you go through this. So it makes it very easy to just code up something, that you want to use. And the cool thing with this is that it can be done in Python. It can be done, with the OpenAI Python library. It can be done with the, OpenAI JavaScript library in here. Makes it very easy to do this. On top of using the OpenAI libraries, there are a lot of other libraries out there that make use of this format. So they give an example here of the Vercel AI SDK. So this one you could be pointing, using the Vercel SDK in Javascript but again, you're just pointing the base URL to a local path where you're actually running your Ollama URL in there. So the cool thing with this, also is that really most things that are out there are already supporting the OpenAI APIs, and models and stuff like that. So this makes it very easy to convert something like Autogen, to actually, Now start using an Ollama model for your Autogen. So you can see here, basically setting up the, Autogen, assistant agent, proxy agent here. Basically again, passing in the base URL, passing in the model that you're going to actually use, for this. And then you can just run your code, as if it's using the OpenAI APIs, but you're actually running everything locally on your device. it doesn't mean, of course, that the model's going to be as good as the OpenAI model. You're going to have to try out different models, but you'll probably find that, if you're trying out the Mistral or the Mixtral models, you're probably going to get some, good results, and there are probably going to be a lot of tasks that you used to require, pinging the OpenAI model. Suddenly now you can be doing them locally and you can write your code to take advantage of these things locally in here. Finally, just to finish up, they do mention that, they're working on, adding more things to this. I think, for sure we'll probably see function calling come in the not too distant future. There's talking of maybe doing some of the embedding stuff, which would be really cool if we could run some of the embedding models locally as well. then you could do a sort of full RAG with everything in Ollama. Perhaps even, all being quantized models and stuff like that. Certainly going to be interesting to try that out. also interesting, the possible addition of the log probabilities. I haven't really gone into the log probabilities much in my videos, but this, becomes a really interesting thing, That OpenAI added in their, returns not too long ago Definitely worth checking out Okay, lastly, I'm going to show you some other updates that have been added. so they've also added a bunch of things in regards to different CPU's and Being able to use different CPU's et cetera. most of those things, I guess most people are probably not going to notice. if it was working for you before then it's probably going to be still working for you. If it wasn't working for you before though it could be working for you now. some other things that they've added is some really nice things just to make it easier for actually seeing what's in here. So if we want to be able to see the available commands, now we can just go through all these, were you able to then look at things like, if we come down here to show, and then one of the, see the model file. we can see the model file. We can see that, okay, what we've got actually set here. we've got where that the actual model is, The template for this. the different parameters to stop and stuff like that. we can set those parameters in here as well. If we want it to do that. what we can do too, is we can do things like if we want to see, okay, what is it that we can set? We can come in here. And see that, okay, we can set a parameter. We can set a system string. we can set a whole bunch of different things in here. so let's go and set up a system string. so whenever you set the system string and you're trying to test out a model, one of the first things I always do is I come up with a system prompt that is kind of like the opposite of what most people would want. So I will say something like you're a drunk assistant that slurs your words a lot and speaks rudely. you can see now we've got that set. We can now ask it. How are you today? And we can see sure enough, we've got our drunk assistance. So this is the kind of thing that I want to see in a model. so that I know that it's actually doing, You know, it's paying attention to the system prompt. far too often people make slight changes to system prompts, wondering if it's working or not. So If you want to actually see, okay, what is the system prompt show? and you can see sure enough, there is our system in there. Okay. So you can see now that I've set the, system prompt in here, I can ask it, how are you? we're getting an answer back. And then now if we want it to save this, we can actually just go to save and then the model name that we want to call it. So I'm going to call this one drunk. so we've created a new model then now. so now if we exit out of this. So now after you've saved it, you can come back and get your session back just by coming in here. and you can see that we can load up the model. We can use it and then it should have the system prompt, et cetera, all set, just like we had before. And you can say, sure enough, we've got our drunk assistant again, responding back. So all these things just make it easier for testing out models and from being able to, use some of these things in here. So just to finish up, I would say that if you weren't using Ollama before, you certainly should check it out now. certainly check out using the libraries. I find them to be really useful. using, some of the new commands for saving and loading models are really useful and for doing some of the vision stuff. So the multimodal stuff is getting really interesting. I'm planning a whole video just on the VLMs and looking at some of the things around VLMs for this. Anyway, as always, if you got comments, please let me know what you're using Ollama for in the comment section below. If you found the video useful, please click and subscribe. it really helps out. And I will talk to you in the next video. Bye for now.

Info

Channel: Sam Witteveen

Views: 23,681

Rating: undefined out of 5

Keywords: LLaVA, Large Language and Vision Assistant, Multimodal Model, Research Project, AI, Artificial Intelligence, Machine Learning, Natural Language Processing, Visual Understanding, Language Understanding, Image Captioning, Visual Question Answering, Image Retrieval, Conversational AI., chatgpt, chatbot, image chatbot, python, javascript, llama 2, mistral, mixtral, vision models, openai compatibility, OpenAI, RAG

Id: 9NJ196KlAE0

Channel Id: undefined

Length: 17min 35sec (1055 seconds)

Published: Mon Feb 12 2024