All right. In this video, we're going to continue
looking at the multi doc retriever. We're still going to be using ChromaDB
for our database, for our vector store. but the big thing that we're
going to add in this one is we're going to add in, embeddings that
are actually running locally. so to do this first off, we need to have
a GPU or it's ideal to have a GPU running. So I've got just a T4 here. I'm not using a super powerful GPU. you could run this on the CPU. It's just going to take, a
fair bit more time to do this. So that you'll see that I'm
bringing in the same stuff. We actually don't need that anymore. the sort of two new ones we're going to
bring in, the instructor embedding, which I'll talk about in a sec and basically
just the hugging face for using that here. So another difference I made in this
one is a lot of people asking about PDF files, multiple PDF files so
I swapped out, the text files for doing multiple PDF files in here. And actually, if we have a look
in here, you'll see that what I've done is just put in, some papers. so these are just some
peoples from archive. about react tool, format,
flash attention, alibi. So just some stuff around the topics
that we've been looking at in the large language models recently. the splitting and stuff
like that is all the same. so we've got, you know,
basically we just bringing it in. We're just using the simple pyPDF
loader in this case, bringing things in. And then the next key thing is
we just get to the embedding. So there's two ways of
doing the embeddings. you can use just the normal
hugging face embeddings. So this is using things like sentence
transformers, and there's a whole bunch of different models around that. They vary in degrees of quality. and a little bit will also
depend on your data as well. Which ones sort of match this? So, an example of just using a standard
sentence transformer would be this one. so this is one of the used
to be one of the top models. for doing this. but in my testing, I actually
came across that a newer model it seems to be doing better. So I decided to go with that. and the new model that I'm going
with is the instructor embeddings. So I think these kind of deserve a
whole video to themselves to explain the paper and stuff like that. The idea here is that these are
custom embeddings, depending on what it is that you're using them for. in this case though, we're just
using the instruction embeddings, and we're using the XL variety of this. So we bring these
basically into LangChain. there, you can see that, we're
going to run them locally. So it's downloading the model. It's downloading all the files for this. we're actually telling it here. That we're going to put it on the GPU. So this is what device Cuda is here. If you want it to run them locally,
you could put it device CPU. for doing that, it's definitely
gonna make it a lot slower though. and you see, it's going to basically
load these up and bring them in. And by default, these are operating
at a sequence length of 512. Which is fine for the splitting that
we're doing over a thousand characters. That should be fine in this case. Okay. Once we've got the embedding
set up, we're then going to need to make our vector store here. So this is all exactly the
same as the last video. We're basically just passing
in the new embeddings here. so we're not using OpenAI
and embeddings anymore. Okay. Once we've got the embedding set up,
we're now going to basically just go along with what we were doing before,
so we need to set up our vectors store. And here we're using ChromaDB
for setting up the vector store. We persist, a directory. we're going to need to
create this from documents. So we're going to pass in
the instructor embeddings. and we're going to pass it in the document
text that we've already got out from that. So, this is exactly the
same as the previous video. We haven't really changed anything. The only thing we're doing
now is we're using, these instructor embeddings, in there. we now basically can do the same sorts
of things, of making a retriever. Now, obviously this retriever is
using, our new, embeddings for that. And now the retriever is going
to be using the new embedding the instructor embeddings to actually
find the various contexts that match based on a query in here. Next up we need to basically make a chain. so this is again the same as before
nothing really different in here we're passing in the retriever that's
gonna take care of the Vector store, the embeddings, Those parts there. i've just added some little bit
of code here just to wrap the answers When we get the mounts. and we can see that if we look at
this We can see that okay starting off what is flash attention. And it's going to go and get the three
top documents and in this case Not surprisingly the document that the
embeddings have chosen as the similarity that's closest to what we want to know. Is going to be this in this flash
attention paper or this pdf here. and so basically it gives us back
a definition for flash attention We can then ask into different parts of
this so here it mentioned io aware so i wanted to ask out what is that. It basically is able to go through
and find again from that same paper. We mentioned tiling i go through can
find out an answer for that as well. So then i thought okay let's ask some
other questions just to see okay what's there By asking what is ToolFormer we're
then able to see, Can it basically, is it gonna return the same thing what's
going to get and sure enough here we're getting a ToolFormer is a language model
that learns And the self supervised way. and so this is basically just showing
us the rewriting of the output from these three examples from ToolFormer. The three different contexts. we can basically ask it some more
questions about it what tools can be used with ToolFormer? Can you search engines calculators
translation systems by a simple api calls. And then we can even ask it more, about
different examples and stuff so this is actually a good way to if you've gone
through and skimmed a paper and you want to actually ask some specific questions
you can get some things out of this. it's interesting when we ask it this
question though it's also getting it's answer from the augmenting llms paper. Which i think from memory also is
this is actually a survey paper so it contains some things about
ToolFormer in there as well. So it's basically gone and looked
and decided the top three contexts were from the survey paper
ToolFormer paper itself and then another one from the survey paper. if we ask it some questions
about retrieval augmentation. Now the only paper that we've
got that relates to this is in the augmenting llm survey. sure enough it's able
to get some of those. If we ask it some specifics
about differences between REALM and RAG models it's able to then
tell us these kinds of things. So the idea here is that we're still using
OpenAI for the actual language model part. In the next video we'll have a look at
trying to get rid of that and just go to fully running everything locally. but we're now using the embedding system
for actually using the instructing better we're not using OpenAI for this. So the big advantage for this
means that your data never actually has to go all of it go up to the
large language model to OpenAI. Now obviously the context as they
come out i still going up to OpenAI. so it's not like none of your data
is going up but it's not going all up in one shot just to do
embeddings For this kind of thing. But the key thing is it's not just
putting all your data up as it's doing the embeddings in one shot. So you do have a little bit more
privacy here in doing it this way. Of course this is still not ideal
if we want to basically never have our data Touch a server. So in the next video we'll look at
using an actual language model to do the replying part as well as just the
as well as the embedding part here. Okay the rest of the notebook is the same
just going through deleting the ChromaDB database and bringing that back in that's
the same as what we looked at before. If you want to try out Using just the
OpenAI GPT 3.5 turbo You can do that here. that's it for this notebook As always if
you've got any questions please put them in the comments below if you found this
useful please click like and subscribe In the next video we will look at using
custom models for everything For this. So okay i will talk to you
in the next video bye for now