Ollama meets LangChain

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Okay. So in this video, I want to go through how having Ollama models running on your computer locally allows you to run LangChain locally and use those models to do different tasks. So I'm just going to look at some simple tasks, like how we set it up and stuff like that. and then finally, I'll finish up with a task where we're going to get it to go and do some scraping for us and extract some information all using, the LLaMA-2 model. Okay. So in here we can see if I come down, I'm just using vs code here. and I've just got a couple of Python files open. I've made a condor environment just to install LangChain. there's nothing special about that there. You can certainly set that up or just use, your local Python, if you really wanted to. So if I come into the terminal, I can just come and have a look at. what we've got going here. If I can see the models that I've got going in there. So I've got my Hogwarts model from before. I've got the LLaMA-2 models. and, I've got some other models which I'll show you in the next video about setting those up. so first off let's look at just the most simplest thing, how do we actually load the Ollama model. So in this case, we're going to use the pre-made LLM in LangChain for Ollama. and we can just set that up with a streaming callback for this. and then we just instantiate our LLM, passing the model that we want in this case, I'm using a LLaMA-2, and then passing a streaming callback. Now I can just call the model locally from my Python code here. Okay. So now I run the code. And you'll see. Sure enough, it's basically triggering the Ollama model. So it's doing this via an API. It's actually, one of the things that I didn't cover in the first video is that Ollama is running an API that we can actually trigger both with a LangChain, but we could also do it with just a standard user interface, et cetera. So that's something that perhaps we'll look at in the future is building a next JS app With LangChain and Ollama are there. All right. So the next step up is I'm just going to make a basic chain here. So I've got the same things from before. don't forget, this is an LLM, so I can add in, temperature. I can add in max tokens. I can add in things like that if I wanted to here. what I am going to do is just make a simple prompt template. So just give me five interesting facts about, and then I'm going to insert that in here. So I could say Rather than Roman empire go for the moon. So, okay, you can see that we can just set up our chain. we're just going to pass in the run chain and print it out as we go in. So in here, we can also pass in, you know, verbose equals true or not. If we want to see, the streaming out of this. Okay. So in this case, I'm going to turn off the callback manager. So we're not going to run that and I'm going to turn verbose to be false. and now if I run it. You'll see that it takes a bit of time because it's actually running and getting the data. but sure enough, once it's finished, it will actually just print the data out. So this is probably more useful if you wanted to write something to a file, et cetera. We can see there that it's gone and done it. And we've got the five interesting facts about the moon. And it's gone rounding and gotten those quite nicely. All right. Let's look at the last example. So this is definitely a bit of a jump from the previous two examples. So now we're going to be doing some RAG and we're going to do it using a web based loader. So we're going to load a web page in. we're going to run that and split that up. we're going to put that in a chroma DB. So you will need to have chroma installed for this. and, we're going to also use an arg pass to actually pass in a URL that we want it to get back some data. So let's have a look at the imports here. We can see that. Okay. We're bringing in recursive text splitter. we're bringing in the webpage loader with web based loader. We're bringing in chroma. And then we've got some embeddings here. So I haven't looked into it too much, but apparently Ollama seems to, according to LangChain, has its own embeddings GPT-4 All has also made, some quantized embeddings that you could use in here for this. we can see we're bringing in Ollama like normal. so we're gonna just have our main function, which is gonna just pass through, looking for the argument URL. we're then going to load that URL We're then going to split that URL. we're then going to put it into chroma. We're then going to split that data up. and we're going to then put it into a chroma DB here. Then after that, we're going to set up our LangChain prompt. So here is the prompt that we're using, We're pulling it in from LangChain hub here. Now, you can certainly print this out and have a look at it and play with it yourself and change it. And then we're setting up the retrieval QA chain where we're just going to pass in the LLM. We're going to pass in our vector store, which is chroma. We're going to pass in our prompt here. And then I'm just going to ask it, set up a question, the question, what are the latest headlines on, and then I'm going to pass this in and get the results out here. watch them stream out. So to set this up when I actually run the file this time, I need to pass in an argument called URL. And I'm going to pass in TechCrunch in here. And that's going to kick start everything off so that we can see that, okay, it's filtered out the TechCrunch. it's loaded some documents. It's loaded the LLaMA-2 model. And now we're going to see that it's just going to basically go through and give us the headlines from there. So, if we look at TechCrunch, we can see that these are the headlines that it's gotten out. So I haven't have to write any parsing code to do this. The language model has done this itself. And you could imagine that we could have lots of little tasks like this, where we could just send it off with a Cron job or something during the day to bring back information. and save it for us on our local drive. And you can see here that it's managed to get those top stories and list them out. Now interestingly sometimes it will list out five sometimes it will list out 10. I didn't put anything in there about that. but you could certainly play with that yourself. So this gives you an example of just using LangChain to do simple tasks with a local llm that you could do a variety of different tasks for this. If people are interested we could look at setting up a full local RAG system for documents and stuff like that using this model Certainly the LLaMA models are pretty decent at being able to get through this and you can run a chroma locally et cetera for this. Anyway as always If you found the video useful please click like. And if you've got any questions please put them in the comments below. I will talk to you in the next video. Bye for now

Info

Channel: Sam Witteveen

Views: 38,620

Rating: undefined out of 5

Keywords: natural language processing, langchain in python, embeddings, langchain demo, langchain tutorial, langchain, vectorstorage, chroma, chromadb, langchain ai, ollama, llm, run ollama locally, install llm locally, langchain locally, mistral orca, mistral 7b, orca, ollama custom prompts, llama-2

Id: k_1pOF1mj8k

Channel Id: undefined

Length: 6min 29sec (389 seconds)

Published: Fri Oct 13 2023