Using Langchain and Open Source Vector DB Chroma for Semantic Search with OpenAI's LLM | Code

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone in the last video we look at you know chroma DB which is an open source embedding or vector database it's like an alternative to pine cone and we do it some Hands-On we saw how to install you know uh chroma DB how do we index documents inside it how do we retrieve the you know semantic documents and you know how do we use our custom uh let's say the embedding model because chroma DB has its own embedding you know model so we check with you know how to do with sentence Transformer and you know you could go and check what we have done with the chroma DB so what exactly in this video we're going to do we're going to integrate or let's say use this chroma DB inside the langcheon Library so we know that the line chain is let's say you know uh one of the popular Library nowadays that that we use to develop let's say llm or the you know large language model applications and we did Cover actually you know a line chain in the multiple videos so you could go because we have in a lot of videos with the pine code even the Lang chin as SQL agent you know document question answering using language so you can go and check those videos if you want to you know refresh the like chain thing but for the purpose of this video I will be you know explaining tools whatever the term terms we're going to came across so what we're going to do in this particular video first of all we're going to let's say index documents using the line chain utilities inside the chroma DB then we will retrieve those semantically similar documents you know for a specific query then we will see how do we you know store or let's say persist our chroma DB so that you can load it and use it again somewhere else and then we will see how do we integrate let's say chroma DB with the llm so we will see example of open AI chart models like let's say GPT 3.5 or gpt4 and then eventually the line chain has the concept of chain where you're going to you know combine different things so you can combine let's say your chroma DB prompt llm together and perform some action so we're going to see the question answering chain which can answer from the list of documents as a context and then we will see there is another chain called Retriever qh and we will see that so first let's start with you know let's install first of all we need open AI because we're going to use llm from it we require line chain because that's what we're going to use we require sentence Transformers so the embedding that we're going to use for the chroma DB we're going to use the sentence Transformer which is open source embed and I will also show the syntax to use the open AI embedding right and then ultimately we want to install it say the chroma DB so let's execute this thing we require one more thing called unstructured so the unstructured is actually you know a library used by the line chain and that Library used when you want to read you know files from the let's say directory and the process them right that's where we require activities you know unstructured so let's install this one also we have set up documents that we're going to you know use as a part of this example those are the same documents here you could see they are all related to the paids and we use similar or the same documents actually in the previous you know uh tutorial so we're going to use the same document and mostly we're going to focus on the you know whatever the line chain or chroma DB aspect of the tutorial so in the line chain you know you could use let's say you know let me clear the output because you know I have ran it earlier okay so what we want to do we have a documents or the let's say list of document inside a particular directory so from the Lang chain utility we can use let's say something called document loaders and we can use the directory loader this directory loader takes the directory path and then using that loader we can load the documents and you will simply get the list of documents right so the document is some kind of a representation let's say you know the language you will and we will see how does one document you know look like so let's first up on you know load those documents and we are printing actually the length of documents like how many documents we got you see it is even using internally some analytic data so we got the five documents because we actually have the five documents another thing is that you know so we want to store these documents into the vector DB so most of the time we will be chunking because those documents will be bigger and we want to chunk them in a small small you know chunks so that we can store them in let's say Vector DB and then we can retrieve those symmetrically similar chunks and then pass to llm so we can use you know Langton has again utility for it so the one of the you know uh popular the recommended utility for the you know text splitter is called recursively split by character so this is actually you know recursive text splitter what it does here is the interesting thing so it takes the list of separators that you can pass you know using which it can actually split your text and this is what it does actually first it try to make sure it gives all the paragraphs you know intact like so first it will split at a paragraph level then if let's say adding paragraph is actually exceeding your character limit then it will try to split based on the sentences and eventually it will try to split on the base upwards right so it will try to keep those chunk let's your text Chunk as semantically you know uh together as possible that's what it is actually going to do here so that's what we have used so we're going to use you know recursive text character text splitter it takes what is the chunk size that you want to do let's say we uh we have created one function here right which takes some chunk size as a parameter chunk overlap since we know that you know it might split based on the particular words also so what if something like this happen few words from the sentences go into the one chunk the other words going to the other chunk right so to to make sure we have our context at least properly presented at least one place we can add some kind of a chunk overlap so if it is played so what will happen the two chunks will have so let's say we split chunks somewhere here let's say we we split this chunk here somewhere then what will happen this part right will go to the one chunk and again some part of it actually go to the next chance so there will be some overlap actually so there will be at least one chunk which has let's say that one sentence complete at least so you can specify the number of characters maybe 200 character 100 characters something like that right and then we can print actually the length of documents right so we are simply passing the documents that we got into the earlier stage which are those five documents and we are simply splitting so we are getting actually small small docs which you can call it as a chunk actually okay now if you say we just got five chunk because our documents are small right so if I create the you know it's a chunk of only 100 characters then I will get more uh chunks so I got the 27 chunks but let's skip the you know uh bigger chunk like you know uh thousand character chunks now we got this chunk that we can index now into the chroma DB right so do that thing we require some embedding so what we will do from the line chain embedding we're going to use let's say load sentence Transformer embedding okay and then we can use a particular model called this this model all mini LM this model we will be using you can specify any other model whatever you want I have shown in the earlier video that we could even do this thing without passing model because uh this chroma DB uses some default actually uh model right it uses some default model now once we have let's load this embedding because we need to You Know download those embedding okay it started you know downloading it is a small I think less than 100 MB or something yeah so it downloaded this particular model sentence Transformer model so let's clear the output another thing is that just like we imported embedding you know so Lang chain has a vector stores it supports many Vector stores like chroma is one of them the language sorry Minecon is same there is other Vector store Also let's say we're going to import the chroma from the vector stores of the Lang chain and then we can create the chroma Vector DP from the document so it has a method that you know chroma from the documents and the list of chunks or the talks what we created we can pass them and we can also give the embedding that it should use this particular embedding to actually convert these chunks into the vector and it will get stored inside this particular DB okay that's it so this is how you let's say load documents using length chain and eventually index those into the DB now once you have this DB maybe we should uh let's see we create any here so you don't see any folder or anything coming from this DP right because this DB get created in the memory because we haven't specified any persistent directory that where we should store this particular DB so it got stored in the memory let's say Ram or your memory now what we want to do we want to actually query this particular Vector DB so that we can retrieve the relevant documents right so let's say we said you know what are the different kinds of paids people commonly own because we know we do have one of the document which actually talks about that good thing right so we specify our query we take the DB and it has a method called similarity search so the vector DB dot similarity search we pass our query and we should get the matching documents and you can see how does each document look like each document if you print a document right let's see how the document look and you could see this is the document object that's what I was talking about that it creates this you know representation uh the document when you create the chunk so this document object as you could see it has a page content which is nothing but the text of that chunk but it also has some metadata which mentions from which file this you know which file this particular chunk belongs because this is really important if you want to go and verify right uh the answer so this is how you get the matching talks and you could print it's a page content that's what we did right we read the page content now with this Chunk we can also pass some other parameter like B1 let's say there is other function which also written the score the matching score for this particular you know document and then we can also specify how many such a chunk or the docs we want to retrieve from the vector DB and then we can print those matching Rock You Could See we got actually two document this is one document this is another document each of them has a page content metadata but then we will also have some extra thing which is called the score so this actually represents the matching score between this thing right now this is how you know you can search the symmetrically similar document and that's what we have seen as a uh without actually using Lang chain in the previous thing this what we saw with the line chain right now how do we store this chroma DB so what we can do actually so when we create the chroma DB or the chroma DB from the documents right we can also specify one more parameter called the persist directory and we can specify let's say we want to store it here called chroma DP and then we can create that Vector DB now if you call the persist method it will actually see we got this chroma DB here we got this index and there are some other files right what we got so this is how you can save and you can load it so let's say we have saved it somewhere then you can use again the chroma again this chroma is coming from the vector store that we imported okay here so we can take this chroma and if you give the directory path here it will load your existing DB from the directory right so let's call it as a new DB and you can load that new DB and then now you could use this new debut to find let's say your document and you see you are able to get the document by KBR printing only one document here so this is how it works you while creating the chroma DB you can specify the persistent directory right and when you want to use that DB next time somewhere let's say you did your experiment here in the collab right but now you want to use it on your server it could be AWS ec2 there you want to load it something like this in your program or let's say on your localhost right so you will specify the chroma and then you will specify the embedding function that has been used right because it is an existing DB and then where it exactly stored so that it can load those thing okay now next part is actually okay this is all about only getting the similar docs okay semantic search but there is no question answering I am not getting any answer to my query I am only getting the relevant document right that's the next part is called how do we integrate with the llm right so we're going to let's say as we you know saw earlier that we're going to use open AI for that purpose right so line chain has a chat model so chat models are nothing but which support you know conversation so like we have GPT 3.5 we have gpt4 and even there are other you know let's say the llm providers that can have it right so what we do first of all we want to store our openai key into some environment variable that's you know this line chain is going to look for next thing we import our chat model here then we can create the instance of this chat open AI model and one of the parameter we can pass is the model name as I told you we can even pass gpt4 so let's say we're going to use GPT 3.5 and this is how we got the llm instance now let's use this llm instance to you know get our query answered directly using so let's think of like a document keyword document question answering right so we can read more about here the Langston question answering here there are you know a good documentation is available what we're going to use we're going to use something called question answering chain so from Lang chain change so what is chain inside the line chain so chain is nothing but we can combine different components like let's say llm is one component there is the prompt is the another component right so you can combine those different components to achieve something right that's what nothing but the chaining those operations so we're going to use one particular chain called you know question answering chain and we're going to let's say load this QA chain here and which requires the llm which is it is going to use 11 to actually you know get the answer from this document and this is something called the chain type so let me tell you exactly what exactly is chain type mean so actually when we going to pass those documents to the llm right how do we want to generate the answer so let's look what exactly this stuff so here is the link you can go and you can see what is stuff means okay stuff means simply stuffing or to fill something right so this is my query and this is the list of documents that I retrieved from the let's say Vector DB so the stuff is nothing but I simply put all of this document inside this context and then I pass this is the simple thing that we always use so whenever we use question answering this is what we do we just stuff our you know documents inside the context so that llm can answer right but there is some other better approaches also actually right so what are the other better approaches so we can okay maybe I should open a new tab okay let's say the refine how refine works again we have our query we have a list of documents with the chunks but if you see it is either document right so see what's happening it take the first document then it take the query okay and then it generate the answer using the model then it take that answer then it again take the second document so you actually got answer using the first chunk or the first stop now we're going to pass the second dock and then you're going to pass all the information the query you have the intermediate answer nothing but the previous answer you got using the previous documents and the current document at the context so see what's happening this is called refining you are generating answer using every chunk sequentially and refining your answer now theoretically this is definitely going to generate the better answer but if you see what's happening here we are making multiple calls to llm in the sequential so the first thing cost right if you have five chunks we're going to call 5 times your model the another thing is that it is happening in the sequential nature right so it is going to take more time to get you the response so if you have a chatbot or the conversational interface it might not be feasible to make a multiple call and you know keep your customer or user waiting for the answer right but it is definitely you can see whether you could use this refine thing which is definitely going to give a better quality answer but what we are using here we are using this stuff is nothing but you know we're going to Simply pass out all those two three documents whatever we have directly inside the prompt right that's how we're going to create the chain so let's load the chain now we can answer using this chain so what does chain require chain require you tell me what are your the documents you have got the matching documents and the query that you want to answer so we can use our again the same function which is your vector db.similarity search and give the query so we get the matching talks and that we can pass I don't need this extra comma here and let's see what we get now we should get the direct answer from it so see we got the answer right this is particular the answer but now what's happening under the hood if you want to see what's happening under the hood what prompt it is using when you create the chain you can do one thing you can put this one particular flag called verbose as a true so that you could see what's happening under the hood so let's run now using uh you know this thing and you should see what's happening under the hood so you could see the new chain got started you could see the prompt used actually so this is the prompt use the following piece of context and answer the user question if you don't know the answer just say that you don't know the answer so this is actually instruction we are passing right and then our document is going here this is our document as you're going as a context and then this is the human our queries and so that's why we are using chat model so chat model has this syntax right we have some system mode or system role which where we can put some instruction and then human which is your user mode which will ask something and then you will get the answer right so this is what's happening under the hood when you use it as a verbose is equal to true now you could even customize this prompt so I'm not going to cover in this tutorial but you can search you know that how do we customize this particular prompt so that you can put your own prompt here right now let's go and see what is the retrieval QA again here if you could see the chain requires the matching documents and then query right then retriever QA is another chain what it does it actually does this uh you know finding the matching documents as a part of chain itself so let's see what we need to create a retrieval QA we again from line chains we retrieve the retrieval QA I think I have given the link that you can go and check Maybe uh do I give the retrieval QA specifically but you could search actually in a retrieval QA of the line chain and we should be able to get the you know retrieval QA line chain yeah there is a special it's a page dedicated you can read more about it actually right let's go back and um yeah so we create the object of retriever QA there is a function called from chain time again we pass the llm you can pass whatever the stuff refine what you want but here we have one more argument called retriever we want to pass a retriever retriever is nothing but the component or the object which will go or which will retrieve the you know semantically similar documents from your let's say Vector DB so be going to pass Retriever and then we can actually create retriever out of our Vector DB so you can take your vector DB as a retriever and you will get the retriever object that can be used right and then you can directly ask the query every time you don't need to pass now you know those matching documents uh yourself so this is another let's say interface that what you have where retrieval QA kind of you know take Retriever as one of the input right so you can play around it actually there are a lot of different chain there is a lot of different syntax that even I am not able to uh you know remember I just see the document so you can simply refer this tutorial and then modify try it and play around it right so this is how um you know we just saw you know how to load the documents how to create the chunk make sure you explore the different text splitter available in the Lang chain because you know as per your requirement you might want to use some different splitter right then uh yeah this is one more thing so instead of the sentence Transformer embedding what if you want to use open aim building right so the syntax is pretty you know you could do something like this from length in embedding instead of sentencing a Transformer you can import the opening embedding and then you pass it by the name of the model that you want to use for the embedding and it will work fine right but since we are dealing with let's say Vector DB uh open source Vector DB I thought of using the open source embedding right and personally I didn't find any much difference in using open AI embedding and the vector I always use let's say sentence Transformer embedding right and then we saw how to create the chroma DB from the documents which is nothing but the Chunk we saw how to get those similar document even the documents with the score how do we persist it then how do we initiate the aluminum use the question answering chain and eventually use the retrieval QA right this is what we so let me know if you have any doubts you know inside the comment section maybe I'm thinking to create a dedicated tutorial on this thing stuff refine mapreduce there are different types you know that you can pass those documents to the llm I hope you found this video useful thank you
Info
Channel: Pradip Nichite
Views: 12,351
Rating: undefined out of 5
Keywords: langchain, chroma, chroma db, llms, semantic search, open ai, nlp, gpt, vector db, open source vector db
Id: 5NG8mefEsCU
Channel Id: undefined
Length: 22min 19sec (1339 seconds)
Published: Mon Jun 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.