Okay. So in this video, I want to go through
how having Ollama models running on your computer locally allows you
to run LangChain locally and use those models to do different tasks. So I'm just going to look at
some simple tasks, like how we set it up and stuff like that. and then finally, I'll finish up with a
task where we're going to get it to go and do some scraping for us and extract some
information all using, the LLaMA-2 model. Okay. So in here we can see if I come
down, I'm just using vs code here. and I've just got a couple
of Python files open. I've made a condor environment
just to install LangChain. there's nothing special about that there. You can certainly set that
up or just use, your local Python, if you really wanted to. So if I come into the terminal, I
can just come and have a look at. what we've got going here. If I can see the models that
I've got going in there. So I've got my Hogwarts model from before. I've got the LLaMA-2 models. and, I've got some other models
which I'll show you in the next video about setting those up. so first off let's look at just
the most simplest thing, how do we actually load the Ollama model. So in this case, we're going to use the
pre-made LLM in LangChain for Ollama. and we can just set that up with
a streaming callback for this. and then we just instantiate our LLM,
passing the model that we want in this case, I'm using a LLaMA-2, and
then passing a streaming callback. Now I can just call the model
locally from my Python code here. Okay. So now I run the code. And you'll see. Sure enough, it's basically
triggering the Ollama model. So it's doing this via an API. It's actually, one of the things that I
didn't cover in the first video is that Ollama is running an API that we can
actually trigger both with a LangChain, but we could also do it with just a
standard user interface, et cetera. So that's something that perhaps
we'll look at in the future is building a next JS app With
LangChain and Ollama are there. All right. So the next step up is I'm just
going to make a basic chain here. So I've got the same things from before. don't forget, this is an LLM,
so I can add in, temperature. I can add in max tokens. I can add in things like
that if I wanted to here. what I am going to do is just
make a simple prompt template. So just give me five interesting
facts about, and then I'm going to insert that in here. So I could say Rather than
Roman empire go for the moon. So, okay, you can see that
we can just set up our chain. we're just going to pass in the run
chain and print it out as we go in. So in here, we can also pass in, you
know, verbose equals true or not. If we want to see, the
streaming out of this. Okay. So in this case, I'm going to
turn off the callback manager. So we're not going to run that and
I'm going to turn verbose to be false. and now if I run it. You'll see that it takes a bit
of time because it's actually running and getting the data. but sure enough, once it's finished, it
will actually just print the data out. So this is probably more
useful if you wanted to write something to a file, et cetera. We can see there that
it's gone and done it. And we've got the five
interesting facts about the moon. And it's gone rounding and
gotten those quite nicely. All right. Let's look at the last example. So this is definitely a bit of a
jump from the previous two examples. So now we're going to be doing
some RAG and we're going to do it using a web based loader. So we're going to load a web page in. we're going to run that and split that up. we're going to put that in a chroma DB. So you will need to have
chroma installed for this. and, we're going to also use an arg
pass to actually pass in a URL that we want it to get back some data. So let's have a look at the imports here. We can see that. Okay. We're bringing in recursive text splitter. we're bringing in the webpage
loader with web based loader. We're bringing in chroma. And then we've got some embeddings here. So I haven't looked into it too
much, but apparently Ollama seems to, according to LangChain, has its
own embeddings GPT-4 All has also made, some quantized embeddings
that you could use in here for this. we can see we're bringing
in Ollama like normal. so we're gonna just have our main
function, which is gonna just pass through, looking for the argument URL. we're then going to load that URL
We're then going to split that URL. we're then going to put it into chroma. We're then going to split that data up. and we're going to then put
it into a chroma DB here. Then after that, we're going
to set up our LangChain prompt. So here is the prompt that
we're using, We're pulling it in from LangChain hub here. Now, you can certainly print this
out and have a look at it and play with it yourself and change it. And then we're setting up the
retrieval QA chain where we're just going to pass in the LLM. We're going to pass in our
vector store, which is chroma. We're going to pass in our prompt here. And then I'm just going to ask it, set
up a question, the question, what are the latest headlines on, and then I'm going to
pass this in and get the results out here. watch them stream out. So to set this up when I actually
run the file this time, I need to pass in an argument called URL. And I'm going to pass
in TechCrunch in here. And that's going to kick start everything
off so that we can see that, okay, it's filtered out the TechCrunch. it's loaded some documents. It's loaded the LLaMA-2 model. And now we're going to see that it's
just going to basically go through and give us the headlines from there. So, if we look at TechCrunch,
we can see that these are the headlines that it's gotten out. So I haven't have to write
any parsing code to do this. The language model has done this itself. And you could imagine that we could
have lots of little tasks like this, where we could just send it off
with a Cron job or something during the day to bring back information. and save it for us on our local drive. And you can see here that it's managed to
get those top stories and list them out. Now interestingly sometimes it will list
out five sometimes it will list out 10. I didn't put anything in there about that. but you could certainly
play with that yourself. So this gives you an example of just
using LangChain to do simple tasks with a local llm that you could do a
variety of different tasks for this. If people are interested we could look
at setting up a full local RAG system for documents and stuff like that
using this model Certainly the LLaMA models are pretty decent at being able
to get through this and you can run a chroma locally et cetera for this. Anyway as always If you found the
video useful please click like. And if you've got any questions
please put them in the comments below. I will talk to you in the next video. Bye for now