LangChain Retrieval QA with Instructor Embeddings & ChromaDB for PDFs

Video Statistics and Information

Video

Captions Word Cloud

Captions

All right. In this video, we're going to continue looking at the multi doc retriever. We're still going to be using ChromaDB for our database, for our vector store. but the big thing that we're going to add in this one is we're going to add in, embeddings that are actually running locally. so to do this first off, we need to have a GPU or it's ideal to have a GPU running. So I've got just a T4 here. I'm not using a super powerful GPU. you could run this on the CPU. It's just going to take, a fair bit more time to do this. So that you'll see that I'm bringing in the same stuff. We actually don't need that anymore. the sort of two new ones we're going to bring in, the instructor embedding, which I'll talk about in a sec and basically just the hugging face for using that here. So another difference I made in this one is a lot of people asking about PDF files, multiple PDF files so I swapped out, the text files for doing multiple PDF files in here. And actually, if we have a look in here, you'll see that what I've done is just put in, some papers. so these are just some peoples from archive. about react tool, format, flash attention, alibi. So just some stuff around the topics that we've been looking at in the large language models recently. the splitting and stuff like that is all the same. so we've got, you know, basically we just bringing it in. We're just using the simple pyPDF loader in this case, bringing things in. And then the next key thing is we just get to the embedding. So there's two ways of doing the embeddings. you can use just the normal hugging face embeddings. So this is using things like sentence transformers, and there's a whole bunch of different models around that. They vary in degrees of quality. and a little bit will also depend on your data as well. Which ones sort of match this? So, an example of just using a standard sentence transformer would be this one. so this is one of the used to be one of the top models. for doing this. but in my testing, I actually came across that a newer model it seems to be doing better. So I decided to go with that. and the new model that I'm going with is the instructor embeddings. So I think these kind of deserve a whole video to themselves to explain the paper and stuff like that. The idea here is that these are custom embeddings, depending on what it is that you're using them for. in this case though, we're just using the instruction embeddings, and we're using the XL variety of this. So we bring these basically into LangChain. there, you can see that, we're going to run them locally. So it's downloading the model. It's downloading all the files for this. we're actually telling it here. That we're going to put it on the GPU. So this is what device Cuda is here. If you want it to run them locally, you could put it device CPU. for doing that, it's definitely gonna make it a lot slower though. and you see, it's going to basically load these up and bring them in. And by default, these are operating at a sequence length of 512. Which is fine for the splitting that we're doing over a thousand characters. That should be fine in this case. Okay. Once we've got the embedding set up, we're then going to need to make our vector store here. So this is all exactly the same as the last video. We're basically just passing in the new embeddings here. so we're not using OpenAI and embeddings anymore. Okay. Once we've got the embedding set up, we're now going to basically just go along with what we were doing before, so we need to set up our vectors store. And here we're using ChromaDB for setting up the vector store. We persist, a directory. we're going to need to create this from documents. So we're going to pass in the instructor embeddings. and we're going to pass it in the document text that we've already got out from that. So, this is exactly the same as the previous video. We haven't really changed anything. The only thing we're doing now is we're using, these instructor embeddings, in there. we now basically can do the same sorts of things, of making a retriever. Now, obviously this retriever is using, our new, embeddings for that. And now the retriever is going to be using the new embedding the instructor embeddings to actually find the various contexts that match based on a query in here. Next up we need to basically make a chain. so this is again the same as before nothing really different in here we're passing in the retriever that's gonna take care of the Vector store, the embeddings, Those parts there. i've just added some little bit of code here just to wrap the answers When we get the mounts. and we can see that if we look at this We can see that okay starting off what is flash attention. And it's going to go and get the three top documents and in this case Not surprisingly the document that the embeddings have chosen as the similarity that's closest to what we want to know. Is going to be this in this flash attention paper or this pdf here. and so basically it gives us back a definition for flash attention We can then ask into different parts of this so here it mentioned io aware so i wanted to ask out what is that. It basically is able to go through and find again from that same paper. We mentioned tiling i go through can find out an answer for that as well. So then i thought okay let's ask some other questions just to see okay what's there By asking what is ToolFormer we're then able to see, Can it basically, is it gonna return the same thing what's going to get and sure enough here we're getting a ToolFormer is a language model that learns And the self supervised way. and so this is basically just showing us the rewriting of the output from these three examples from ToolFormer. The three different contexts. we can basically ask it some more questions about it what tools can be used with ToolFormer? Can you search engines calculators translation systems by a simple api calls. And then we can even ask it more, about different examples and stuff so this is actually a good way to if you've gone through and skimmed a paper and you want to actually ask some specific questions you can get some things out of this. it's interesting when we ask it this question though it's also getting it's answer from the augmenting llms paper. Which i think from memory also is this is actually a survey paper so it contains some things about ToolFormer in there as well. So it's basically gone and looked and decided the top three contexts were from the survey paper ToolFormer paper itself and then another one from the survey paper. if we ask it some questions about retrieval augmentation. Now the only paper that we've got that relates to this is in the augmenting llm survey. sure enough it's able to get some of those. If we ask it some specifics about differences between REALM and RAG models it's able to then tell us these kinds of things. So the idea here is that we're still using OpenAI for the actual language model part. In the next video we'll have a look at trying to get rid of that and just go to fully running everything locally. but we're now using the embedding system for actually using the instructing better we're not using OpenAI for this. So the big advantage for this means that your data never actually has to go all of it go up to the large language model to OpenAI. Now obviously the context as they come out i still going up to OpenAI. so it's not like none of your data is going up but it's not going all up in one shot just to do embeddings For this kind of thing. But the key thing is it's not just putting all your data up as it's doing the embeddings in one shot. So you do have a little bit more privacy here in doing it this way. Of course this is still not ideal if we want to basically never have our data Touch a server. So in the next video we'll look at using an actual language model to do the replying part as well as just the as well as the embedding part here. Okay the rest of the notebook is the same just going through deleting the ChromaDB database and bringing that back in that's the same as what we looked at before. If you want to try out Using just the OpenAI GPT 3.5 turbo You can do that here. that's it for this notebook As always if you've got any questions please put them in the comments below if you found this useful please click like and subscribe In the next video we will look at using custom models for everything For this. So okay i will talk to you in the next video bye for now

Info

Channel: Sam Witteveen

Views: 23,866

Rating: undefined out of 5

Keywords: OpenAI, LangChain, LangChain Tools, python, machine learning, natural language processing, nlp, langchain, langchain ai, langchain in python, gpt 3 open source, gpt 3.5, gpt 3, gpt 4, openai tutorial, prompt engineering, prompt engineering gpt 3, llm course, large language model, llm, gpt index, gpt 3 chatbot, langchain prompt, gpt 3 tutorial, gpt 3 tutorial python, gpt 3.5 python, gpt 3 explained, LangChain agents, pdf, chat pdf, vector stores

Id: cFCGUjc33aU

Channel Id: undefined

Length: 8min 57sec (537 seconds)

Published: Wed May 10 2023