Okay. So I first introduced Ollama back in the
start of October last year on the channel. and back then I was impressed with
it and it's just grown more and more. So in this video, what I want to
do is look at some of the latest things that they've added to Ollama. how you can use them, how you
can get started with them. And show you some sort of setups that
we can then use for videos going forward for using a local open source LLMs on
your computer for doing RAG and for doing, agents and stuff like that. So the three main things that I want
to cover in this is the one that they've added a whole set of Python
and JavaScript libraries for Ollama. So this allows you to do a lot of
things without having to use a tool like LangChain or LlamaIndex, or some
of the other ones that are out there. and it really makes it easy to make
quick dirty scripts that you can just run in the background to do
a whole bunch of different things. the second thing up is the whole,
Adding a vision models to Ollama. So this really, I think it started late
last year, with the some of the early LLaVA models that they added, but recently
they've gone to the all in on adding a lot of support for vision models. And I want to look at how we can use
those both in the command line, but also how we can use them through an API. And then perhaps later on in
the future, we can look at how to use them through LangChain The next thing that I want to
go through in some more depth is the whole, OpenAI compatibility. Now this is becoming a standard thing. I may have mentioned LiteLLM
in some of my previous videos. That's one of the tools that we've
been using before for giving open LLMs, a similar kind of API to OpenAI. But Ollama has now integrated this
into, their whole sort of system. And it certainly makes it easier to
go from model to model, and being able to benchmark them against each other
So we'll have a look at that as well. And lastly some of the other small
things that they've added actually into being able to use and save and
load sessions in here with your models. which is really nice thing if you're
working on something and you want to save it so that you can come
back to it and try out different prompts and stuff in the future. All right. Let's jump in and have a look
at these things one by one. So one of the cool things that they
added towards the end of January with Ollama was the whole addition of
Python and JavaScript libraries. So previously they had an end
point that you could then call and you could use things like the
Python request library to do that. You could use LangChain,
et cetera, to do that. But now making their own libraries,
means that a lot of that is just going to be taken care of for you. So the cool thing is they have
both a Python version of this and a JavaScript version of this. the JavaScript version is just an
NPM install and quite simple to use. I'm going to focus on the Python
one in here and have a look at this. So you can see that the setup
for this is pretty simple. We just basically PIP install Ollama. And then you can see that for using it
for both sort of cases it's very simple. We can see this code up here
that we just import Ollama. we then basically put together a set of
messages in a similar kind of style to the OpenAI chat stuff, and then we can call
the ollama.chat chat endpoint from here. in JavaScript, we've got a
very similar kind of thing. You basically, for both of
them, you specify your model. you go through and do that. One of the things that's
really nice here too, is that With the addition of the vision models. which I'll talk about in the next section. you can now basically sort of
automate calling these models. if you wanted to do a thing where
you've got a folder full of images, and you just want to write up
maybe a just quick description for them or something like that. you can just basically automate
this and just have it go through. And this is, one of the things
that I think is really interesting about this whole library, is that most people have been focused on
using the large language models, purely as chat bots and things that you interact
with real time, going back and forth. I think people are seeing that one
of the, the newer sort of ways of using these things is to basically
have them automating certain tasks and just going on in the background. and this library really opens
that up for the Ollama models. And the cool thing is, don't
forget, once we've got this going, we can use it on a LLaMA 2 model. We can use it on a Mistral. We can use it on a Mixtral model. you can use it on a whole bunch of
different models, both big and small. You can use it on a bunch of the
code models and stuff like that. And so just having this sort of sit there
on your system, going on in the background and processing stuff, I find to be a
really useful tool for a lot of different tasks that I'm trying to automate. whether that be combined with a scraper
that goes and gets the latest news each day and brings it down, analyzes
it, then just picks up parts for me. that kind of stuff for looking at every
screenshot that I do and basically cataloging the screenshots that I make. all these things are very tools
that are pretty easy to write. Once you've got this sort of ability
to just easily access the local models on your drive, on your machine. and that doesn't have to be
real time interaction with you. They can just run like a Cron job
at various times throughout the day. So let's just have a little play
with doing this in Python locally. So I can show you some examples. So using the Python library
is pretty simple in here. We basically just import Ollama. and then we can use ollama.chat
pass in the model that we want. and then we've got the sort of standard
messaging, role of user content, That kind of thing as we go through. If you want to do something streaming,
you can say here, I basically bringing it in, passing in, the user content. Why is the sky blue? This is one of their examples. and then we can basically stream out,
we can basically sort of print out for each of these parts, as we're
streaming back here And you can see the I'm using the Mistral model here. And I'm using the LLaMA 2 model
here for the exact same thing. So if we look at it, let's see how long it
does actually take to output the response. So it's got a first load
up the Mistral model. Generate the response, then
load up the LLaMA 2 model and generate the response for that. And you'll see that it takes a little bit
of time to actually load up the model. but once it's got that, it's then
basically streaming quite quickly here. And I'm not using a
super powerful Mac here. So, okay, the Mistral part is done. it's now loading up the L LaMA 2 part. And you'll see that then now
it's generating, the LLaMA 2 answer that we've got there. So, this is the simplest example. Like I talked about, you can actually
also use it to load up images, to do a whole bunch of different tasks. one of the things that I've had some nice
success with is basically using it to have a Python file, that's scraped something. And then bring it down and then use
that either for extracting information out of what it's scraped or for
doing something like getting news and summarization, stuff like that. So, in the past, you would
have been tied to doing this with something like LangChain. Now, if you've got your prompts,
you can just basically come in here and set this up. and as you can see that, just
like we set up the role user and content and stuff like that. We could also feed in more messages. We could have a system prompt in here. We can have all the sort of
normal stuff that we have in here. Okay. So the next thing up is this whole
addition of vision models and more support in Ollama for vision models. So the main models here that they've
added the new LLaVA of models. so this is basically LLaVA 1.6. I haven't made a video about this. I'll possibly make some videos about
using some of these models because they're certainly getting very cool. And really kind of amazing that
we're seeing such good open source performance, starting to
get close to some of the things like GPT V, like the Gemini Pro-Vision. And some of these things
that are out there. so basically they've added in a few of
these different models, both the 7B the 13B and the 34B LLaVA model in here. And they've edited in a few
different ways to use this as well. So you've got the standard command
line interface way of where we can pass in a link to actually using
the image and just basically pass in a prompt with that as well. So you can see here that you've
just got ollama run llava. And you're basically just saying,
describe this image and then passing in the name of the image. That's in a local folder. you'll be able to get a
description back from that. the way that I find to be more
sort of useful is to actually use the libraries for this. Here's a good use of being able
to automate this Where you can get a set of, images or, local image
paths, from a folder something like your screenshots folder. and then basically have it describe those. and just save those with the path
to a spreadsheet or something. So that any time you want to actually
do something, and you could also turn this into a multimodal RAG
kind of thing, if you want it to do something like that as well in here. It's kind of interesting for
doing that kind of thing. Anyway, they've given you a
code for doing this in Python for doing this in Java script. It really makes it too easy to do
this overall for things like, image, descriptions, that kind of thing. you can also use it for
things like, text recognition. So Surprisingly, these models are
actually quite good at being able to read texts that is actually in there. so one of the things I find is that this
is a great way for you to be able to sort of index a whole series of images,
very quickly with info that's in them. And it's something that, previously we
would've had to use GPT V to do this, and then even before then it would have
been even harder having to use some kind of captioning model to do this. So this is the vision capabilities, in
Ollama they've gotten a useful for doing a bunch of different things in here. The third major announcement that they
announced recently just came last week. And this was the addition
of the OpenAI compatibility. so like I mentioned at the start,
there have been, frameworks out there for doing this. things like, LiteLM, from
Berry AI, I think it is. and there have been other people who have
sort of integrated this kind of thing. I think The Together AI API
has also integrated this. so what this basically allows you to do
is use both the OpenAI library itself, or anything that's compatible with
that to access the Ollama models here. So you can see here, if we look
at the OpenAI Python library, we can see that, it's using the
exact same library now as OpenAI. So this is OpenAI's library, except
it's just pointing the base URL to a local path, in this case. And that's where Ollama
is going to be running. we don't need an, API key. You can basically, just put in
anything for the actual API key there. and then the key thing
is you add in the model. So you can see from these examples here. the model itself is the LLaMA 2 model. And then it's got the exact same
format as OpenAI uses for everything. So you've got role being system
for the first one, content, you are a helpful assistant. Then you've got role user,
role assistant, role user, etc. as you go through this. So it makes it very easy to just code
up something, that you want to use. And the cool thing with this is
that it can be done in Python. It can be done, with the
OpenAI Python library. It can be done with the, OpenAI
JavaScript library in here. Makes it very easy to do this. On top of using the OpenAI libraries,
there are a lot of other libraries out there that make use of this format. So they give an example
here of the Vercel AI SDK. So this one you could be pointing,
using the Vercel SDK in Javascript but again, you're just pointing the base URL
to a local path where you're actually running your Ollama URL in there. So the cool thing with this, also is
that really most things that are out there are already supporting the OpenAI
APIs, and models and stuff like that. So this makes it very easy to
convert something like Autogen, to actually, Now start using an
Ollama model for your Autogen. So you can see here, basically
setting up the, Autogen, assistant agent, proxy agent here. Basically again, passing in the base
URL, passing in the model that you're going to actually use, for this. And then you can just run your
code, as if it's using the OpenAI APIs, but you're actually running
everything locally on your device. it doesn't mean, of course,
that the model's going to be as good as the OpenAI model. You're going to have to try out
different models, but you'll probably find that, if you're trying out the
Mistral or the Mixtral models, you're probably going to get some, good
results, and there are probably going to be a lot of tasks that you used
to require, pinging the OpenAI model. Suddenly now you can be doing them locally
and you can write your code to take advantage of these things locally in here. Finally, just to finish up, they
do mention that, they're working on, adding more things to this. I think, for sure we'll probably
see function calling come in the not too distant future. There's talking of maybe doing some
of the embedding stuff, which would be really cool if we could run some of
the embedding models locally as well. then you could do a sort of full
RAG with everything in Ollama. Perhaps even, all being quantized
models and stuff like that. Certainly going to be
interesting to try that out. also interesting, the possible
addition of the log probabilities. I haven't really gone into the log
probabilities much in my videos, but this, becomes a really interesting thing,
That OpenAI added in their, returns not too long ago Definitely worth checking
out Okay, lastly, I'm going to show you some other updates that have been added. so they've also added a bunch of things
in regards to different CPU's and Being able to use different CPU's et cetera. most of those things, I guess most
people are probably not going to notice. if it was working for you
before then it's probably going to be still working for you. If it wasn't working for you before
though it could be working for you now. some other things that they've added is
some really nice things just to make it easier for actually seeing what's in here. So if we want to be able to see
the available commands, now we can just go through all these, were you
able to then look at things like, if we come down here to show, and
then one of the, see the model file. we can see the model file. We can see that, okay, what
we've got actually set here. we've got where that the actual
model is, The template for this. the different parameters to
stop and stuff like that. we can set those
parameters in here as well. If we want it to do that. what we can do too, is we can do
things like if we want to see, okay, what is it that we can set? We can come in here. And see that, okay, we
can set a parameter. We can set a system string. we can set a whole bunch of
different things in here. so let's go and set up a system string. so whenever you set the system string
and you're trying to test out a model, one of the first things I always do
is I come up with a system prompt that is kind of like the opposite
of what most people would want. So I will say something like you're
a drunk assistant that slurs your words a lot and speaks rudely. you can see now we've got that set. We can now ask it. How are you today? And we can see sure enough,
we've got our drunk assistance. So this is the kind of thing
that I want to see in a model. so that I know that it's actually
doing, You know, it's paying attention to the system prompt. far too often people make slight
changes to system prompts, wondering if it's working or not. So If you want to actually see,
okay, what is the system prompt show? and you can see sure enough,
there is our system in there. Okay. So you can see now that I've
set the, system prompt in here, I can ask it, how are you? we're getting an answer back. And then now if we want it to save this,
we can actually just go to save and then the model name that we want to call it. So I'm going to call this one drunk. so we've created a new model then now. so now if we exit out of this. So now after you've saved it, you
can come back and get your session back just by coming in here. and you can see that we
can load up the model. We can use it and then it should
have the system prompt, et cetera, all set, just like we had before. And you can say, sure enough,
we've got our drunk assistant again, responding back. So all these things just make it easier
for testing out models and from being able to, use some of these things in here. So just to finish up, I would say
that if you weren't using Ollama before, you certainly should check it out now. certainly check out using the libraries. I find them to be really useful. using, some of the new commands for saving
and loading models are really useful and for doing some of the vision stuff. So the multimodal stuff is
getting really interesting. I'm planning a whole video just
on the VLMs and looking at some of the things around VLMs for this. Anyway, as always, if you got comments,
please let me know what you're using Ollama for in the comment section below. If you found the video useful,
please click and subscribe. it really helps out. And I will talk to you in the next video. Bye for now.