LangChain vs. LlamaIndex - What Framework to use for RAG?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi today we're going to explore Leng chain and llama index especially how to perform retrieve log meta generation with both Frameworks and recover the basics and of course then the differences of these Frameworks if you don't know what Rec is here are the basics when you want an llm to generate answers on data that it was not trained on you can directly pass the data to the llm in the prompt but you probably don't want to pass your complete data set to the llm since first it's a lot of unrelated data for the answer so-called noise and second many models are not able to handle this amount of data this is the so-called context window limitation this is why you want to store your data in a database query the required data and pass this small piece of data to the LM so this is the exact workflow you first have multiple data sets let's say text files Json files or websites and you load that data into memory then you don't put the complete data set into the vector store but you first split the data in so-called chunks or in llama index is called nodes but you make smaller pieces of that very large document then you create embeddings embeddings are vector representations of your text of the semantic meaning of the text and then you can compare the similarity later of that vectors and you store the vectors inside that Vector store so that's the indexation step after the indexing step you can make retrieval so you've got a question and you embed that question you now compare your question to the vectors inside the vector store and you retrieve the most similar documents from that Vector store you take that small amount of documents pass it to a prompt and then pass the complete prompt and the documents to the llm which will generate a final answer from that combination so that's the generation step in retrieval augmented generation okay so this is how it works in theory let's now have a look at how this works with LMA index and length chain in practice okay I'm in vs code and here on the left you can see we've got two files the length chain iPod notebook and the Llama index iPad notebook so these perform similar tasks but yet different in syntax and a little bit in the approach of course then we've got this data directory which contains a text.txt file and this is a text of a fictional restaurant where we've got this Q&A format so what makes your pict unique and then you've got the answer so this is what we want to answer with our rec system this the data which was not trained into the model so before we can start let's first install the packages so I've created this requirements.txt file so you can run pip install minus r requirements. TT file this will install Leng chain and also Lama index into our virtual environment and after that we use OPI so we going to load our OPI environment variable so you have to put it inside here you open I API key and we will use it here in the Lama index and lench Len chain IPython notebook okay so the first step is to load the data inside our memory so we've got this approach from length chain so we've got a directory loader and we've got multiple other data loaders like text loaders PDF loaders and so on which are specialized to perform a loading task and I use the directory loader pass in data path and we only filter for txt file so this will be loaded inside memory and after performing that let's have a look at the documents variable and as you can see we've got a list and this contains an instance of a document class which has got this page content attribute and also if we scroll to the right we can see that this also has got a meta or this is a very long text I will not scroll ah here it is okay so sorry uh it has got this page content attribute and this this metadata attribute in this metadata attribute we've got source and this is the path to the to the file where we loaded the data from so this will automatically be stored inside this metadata object and we can access the page content here we access the first object of our list and access the page content attribute so this is how you load data in Leng chain in llama index it works pretty similar you've got also a directory loader where you pass in the name of the directory and directly can pass in the load data method so this looks pretty similar and yeah we also got a list with a document class but this document class contains a little bit more information so every document has got a special ID we don't have any embedding St inside there and we also got this metadata but inside the metadata attribute we've got the file path the file name file type uh size creation date last modified and so on so this default contains a lot more information so the drawback of this is that it makes of course the uh object a little bit larger since you store more information in memory so that's a drawback but you get more information now we want to create multiple smaller documents so-called chunks from our larger document let's start with the link chain here we've got multiple text Splitters which are in the text splitter module of length chain and we choose the character text splitter which we instantiate where we pass in a chunk size chunk overlap and some other stuff so let's create that and after creating our text splitter we use the split documents method where we pass in the list of documents which only contains a single document so now we've got chunks and as you can see again we've got this document instance multiple document instances now with with page content and also here the uh Q&A and so on so here we can see see that we've got the metadata and it is the same like before so we've got metadata and the source to the original text file in Lama index it's a little bit different here you also import a text splitter instantiate it like with the same attributes like in length chain but then you create so-called nodes so this is a different class than the original document class so let's have a look at the nodes class again the noes class contains more information so we've got two notes here so important here you can see this is a different class than this class this is a document class and this is a text Noe class in length chain this is not the case we've got again documents but not a text note also so there's is a different class we can see that this note means that it um always contains data which was already splitted we are text splitter and what's very nice as you can see this contains a lot of information and it also contains the relationship I think this is a very nice information which can be used in some Advanced retrieval topics so we've got a relationship and here we've got the related note information so we've got this note ID and we can see that this note ID is the same note ID like here so we know that these nodes originally belonged together I think that's some valuable information when you do Advanced retrieval okay let's go to the next step which is indexing so to create an index we need an embedding function and we also need a vector database so in our case we will use chroma as our Vector database and Omi embeddings to create our embeddings from our chunks so we use the from documents method from the chroma class and pass in the chunks so this is the data we want to embed and the embedding function so this will create an index and now we've got our vector store we can also turn the vector store into a retriever by using the S retriever method so this provides a standard interface for retrieving data from the vector store so with the retriever you don't put data in the vector store this is the vector store which is responsible for that to retrieve data you use a standardized retriever this retriever has got a method called get relevant documents and here you pass in your question this will be embedded and then you make a cign similarity search against the other documents inside the vector store and pr4 you get the four most relevant documents back from the vector store so this is how you make retrieval in length chain so let's do that now in llama index so again we have to import a vector store so we use the chroma Vector store again but in this case it's a little bit more complex to set this up so we need a client and we also need to create a collection so we can call it however we want I would just call it like this and then we use the chroma Vector store class and pass in the chroma collection so this way create our Vector store but that's not enough in Lama index you have to create a storage context and you use the storage context class and use the class method from default where you pass in the vector store so now we've got a storage context and we can now use another class called Vector store index which has got a class method from documents so it looks very similar to length chain we pass in our documents we pass in the storage context and we pass in the embedding model and this will create our final index now so as you can see here I passed the documents but not the nodes to pass noes so the chunks we have to make it a little bit different we just instantiated it like this where we pass in the nodes as named argument pass in the storage context and also pass in the embedding model so in my opinion this is the pref referred way to create this index and again like in Leng chain we can run the S retriever method to convert our index into a retriever and now we can run the retrieve method how long does it take to prepare a pizza and this will now return the most similar nodes so we only created two nodes we only get back the two most relevant nodes from our Vector store or our retriever okay the next step in this process is to make an llm call with the retrieve documents so we need to create a chain to do so so let's first go to length chain and in length chain there exists two methods to do so so so there is a chain interface which is the Legacy interface in linkchain and there is the new Lang chain expression language this is the preferred way to create a chain so we will use the preferred way the L chain expression language way to create a chain so the first step is to create a template this is just a string with variables so this is a normal Rec template where we pass in the context the context are the documents which were retrieved from the vector store and then we've got the question which also has to be part of the template after that we instantiate our prompt with the from template method where we pass in the string this will create an instance of the chat prompt template we also pass in a chat model which will be gbt 3.5 turbo and this is how we create our chain so in the Lang chain expression language you can use the pipe Ator to pipe output from the first step of our chain to the second and then to the third and then pipe it to the output paror so this is actually achieved with operator overloading so there is a lot of complexity going on inside this little pipes but as user of Lang chain you don't really have to care about the implementation of that and you just have to use it that's why your user framework and what's going on here in this chain is that we pass in the question with this item getter pass it to the retriever so we make the query to our retriever so this actually runs get relevant documents so we store the documents inside the dictionary so this the key and this is the value after that and we do the same for the question but for the question we don't do any processing we only take the question as it is and the output will be passed to The Prompt which contains here the context variable and the question variable so after that we pass the complete prompt to the model and pass the output of the model to the output parer so this is our chain which we want to execute or invoke and now we've got this invoke method where we pass in a dictionary so we only pass in a single argument here can see this is only question and the question is how long does it take to prepare a pizza so this is our complete chain and all of the information will be retrieved from our Vector store so on average it takes 15 to 20 minutes and if we have a look at the text we should find 15 to 20 minutes here we can see this is the correct answer which was retrieved from our Vector store so this is how we do that in length chain now let's have a look at how this works in llama index so again we've got our retriever but in llama index you do that a little bit different as you can see we in state an llm like the chat op ey class and Link Chain but here it's only the op ey class and we use the S query engine method and we pass in the llm so this adds a little bit more abstraction on top of our chain so the L and expression language is a little bit more low level um another alternative is to use the settings object so this is a Singleton which I think is a very nice solution to handle uh ground truth values inside your application so you can always know if you use settings then the llm which will be used for example like this so we don't have to explicitly pass it this will will always be the attribute of that settings object I think that's very nice if your application grows bigger but in this case we will just comment that out so we will instantiate OPI convert the index to a query engine pass a single argument and then query the query engine how long does it take to prepare a pizza so let's run that and you can see we get the response on average it takes about 20 15 to 20 minutes so this is the correct answer it works the same way like here but in lch you have to know a little bit more what's actually going on by constructing that all on your own here you've got this single method to do that so what's the difference here so in this example we created a prompt on our own in llama inex we did not have to do that so what if we want to have a custom prompt to do that we can use the query engine and run the get prompts method on that and here you can see this is our prompt this is of type respon resp synthesizer and we can access it like this so we can use this syntax here and overwrite our prompt by using that key but the first step is actually to create a new prompt so we use the prompt template class and what we're going to do here is a little bit different than before because we want to see that we actually override our prompt so we want that the bot always says hello my friend at the beginning of an answer then we've got again the context and also the query and now we create our new prompt template so this looks very similar to length chain but now we want to update our prompt of the query engine uh with our custom prompt so this is the key we can access and we run the update prompts method on that we pass in the key and here we pass in our new prompt so this will now update the prompt and if we have a look at our prompt again we can see that this is now of prompt type custom so let's run the query engine again and now we should see a difference in the behavior hello my friend on average it takes about 15 to 20 minutes yeah so that's it this is how you can perform retrieval with llama index so the key takeaways for me are that in my opinion both Frameworks work pretty similar and are straightforward use one key difference is that the default use of Leng chain is more low level and higher level in llama index L chain had a high level chain interface which was because many people were actually fighting the framework to customize prompts for example llama index also provides a lower level API if you want more control in my opinion llama index seems to be a little bit easier to learn since length chain is the bigger framework but if you learn one you can easily switch from one framework to the other okay that's it thanks for watching see you bye-bye
Info
Channel: Coding Crashcourses
Views: 10,231
Rating: undefined out of 5
Keywords: langchain, llamaindex, rag, retrieval augmented generation
Id: xEgUC4bd_qI
Channel Id: undefined
Length: 16min 50sec (1010 seconds)
Published: Thu Mar 07 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.