Talk to Your Documents, Powered by Llama-Index

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today I'll show you how to implement chat with your document system in just four lines of code I recently started exploring llama index which is an alternative to link chain just like link chain Lama index gives us the ability to build powerful applications based on large language models these applications include document q a data augmented chat box knowledge agents the great part is you can connect different types of data sources including structured and structured and semi-structured data sources so using lamb index you can quickly build llm based applications just like Lang chain the main reason for me of exploring Lama index was the ability to fine-tune embedding models to improve the performance of document Q and A systems using large language models so basically that means you can not only find during the large language model on your data set but you can also fine tune their betting model and that will improve the performance of your recruited system I'll cover this topic in a future video in this first video I'll show you how to build a chat with your document system using Lama index you probably have seen this diagram before so there are different components the first component is basically loading your documents then you want to divide your documents into smaller chunks with a predefined chunk size so each of the chunks you compute different embeddings these embeddings are numerical representations of the text contained in a chunk then you create a semantic index so basically this is your vector strap after that you can actually use the swipe to store to chat with your documents now in order to chat with your document the system takes your questions compute the same embedding as the embedding it computed for the chunks then performed a semantic search on your knowledge base so as a result of this search it will return a certain number of chunks that are relevant to your question so let's say by default number index returns only two chunks out of all the chunks available in the knowledge base and these return chunks will be used as a context to the llm of your choosing so the llm gets the question from the user along with the context that was written by the embedding model during the semantic search and the llm will generate a response and that is going to be shown to the user as you can see this whole process can be divided into four different steps and I'll show you how to implement this whole process using just four lines of codes within gamma index okay so let's start looking at the code so I'm using Google Chrome in order to the notebook and the code so first we're installing all the required packages so we are installing lamba index then open AI so for the initial example I will show you how to implement the document q a using open a llm and open AI embeddings but I'll also show you how to replace the openai llm within open source llm from hanging face and for that we will need Transformers package to access the hug interface based llms and the accelerate package in order to accelerate the processing or speed of running local llms now for the initial example we will need the open AI API key in order to access open AIS embeddings and llm so you'll need to provide those okay next we need to import the required packages so we are importing open AI from lamba index then we are inputting two auxiliary functions or objects one is e Vector store index and the other one is a simple directory reader the use will become clearer when we look at the code and then we are also importing some packages just to properly format the output from the model now in order to chat with your documents you will first need access to those documents so the way I'm going to do it is I will create a folder called Data and within this folder I will upload this file now this specific essay is titled what I worked on and this is basically an essay written by allgram when he was working with y combinators so we're going to be using this as an example data set but you can upload any type of documents for example text Word documents or PDFs now going back to the architecture first we will need to load the documents now Lama index provides a very simple functionality to do that so there's this class called Simple directory reader you simply need to provide the name of the folder or path of the folder so in this case our folder is called data and that's why I provided the name data then we need to call the load data function on top of that and that will load our documents now the great thing about this is you can have different type of documents within the folder and this class will Define which loader to use with those documents and we can also look at it this is basically the text file and the whole text file is loaded in here now the next step in our architecture is to create a vector store so this involves dividing your documents into chunks Computing their beddings and then storing both the embeddings as well as the chunks in a vector store again the number index makes this very easy we're going to use the vector store index class then we are going to be creating the vector store from documents we'll pass on our documents in here and this will create a vector store index for us okay so we just created our index now next we are going to look at how to customize different options for example the chunk size the type of embeddings and everything but before that we just want to ask questions so for that we need to create a query engine now I find this to be very streamlined in Nama index compared to so here basically you can create a query engine on the index by using the as query engine now if you want to chat with your documents and want to have memory so you were going to use a function called as chatbot and that will basically enable memory but as a query engine you don't have the memory component in here now in order to use this pretty engine to get a response based on the user question so it has to go through this process that it needs to compute embeddings for the question then do the semantic search and use the returned chunks as context with the lln along with the question to get to generate an answer all of this is an implemented using a single line of code but basically what you do is you use the query engine then call this function called query and you pass on your question to this function so for example the question that we are asking is what did the author do growing up and we will get a response in here now the response that you get has a lot of different components in care but we're simply interested in in the answer I'll show you what all this text means in a bit but first let me show you how to get the answer from the llm for that we will use the display function from IPython and put the response in markdown and also bold it so here's the answer from the model the author worked on writing and programming outside of the school before College they wrote short stories and tried writing programs on an IBM 1401 computer using an early version of quadrant right so this is the answer that was generated by the llm using the context provided by the embedding model as you can see apart from the Imports we were able to build a document q a system using just all lines of codes and this implementation is very clean next I want to show you how to customize different parts of this diagram for example how do you define the chunk size chunk or left or how do you change the llm and so on and so forth but before that let's look at some very important Concepts when it comes to the vector stores so right now the index that we created is just in memory but you can persist this by calling this function persist on the index that you created now the way it's going to work is if we run this so by default it's going to create a folder called storage and within this you will see multiple Json files now this functionality is critical because it enables you to create the index one install it on your disk and then use it in your future runs right now in order to load a vector store that was stored on the disk we're going to be using the storage context along with the load index from Storage class now the way it works is that you will use a storage context to read the content of the index and then from the storage context that you just created you are going to recreate the index right and then you can use it exactly the way we did it before now I hope this is clear another question that comes in mind is what exactly is in this Vector store now in order to show you I'm going to open all these four files and the two important files that we want to look at is the stock stored and the vector Stone so the vector store as you can see here is basically the embeddings that we computed for each of the chunk and depending on the embedding model that you choose you are going to get different number of embeddings per chunk now similar to the vector store for embeddings there is a doc store so here you will see different chunks from the document so basically this is text that is to add in two different chunks and the third most important thing is index store basically this has the hash or address for different chunks and using this index tool it determines which embeddings in here belongs to which chunk now when you create in a vector store so it's going to store both the embeddics that it computed as well as the original chunks from the documents now during the retrieval process the llm is going to have access to both the embeddings as well as the chunks that are retrieved based on the embedding model I hope this is clear now let's look at some of the customizations that you can make the first customization that we are going to be looking at is how to change the llm now in order to change the default values we are going to be using the concept of service context so this defines all the default values of different parameters within lamba index Now by default I think that llm lamb index uses is The DaVinci zero three model however if you want to change it to let's say GPT 3.5 turbo so we basically use the open AI function within in Lamb index create an llm based on that so you can Define the temperature the max number of tokens that is going to generate right and then ask that llm to this service context to change the defaults so here we're changing the default llm to GPT 3.5 and then ask that Ellen to the vector store index that we just created and then you can use the vector store index exactly the way we used above now if you want to use another llm such as let's say Google Palm so again you can simply import the Palm llm from gamma index llm class then pass that from added in to the edit and within the service context that will basically change the default value and then again you can pass it exactly like here to the vector store index and then it will start using the Palm llm in this case again you will need to set the API key for pump but the process is going to be very similar to what we did for open AI model now another parameter which is very important when it comes to these chat Bots for your documents is the chunk size so in order to change the channel size we can follow the same pattern so here you can simply pass on and Define two other parameters so the first is the check size the second was one is chunk overlap so here I'm using a thousand tokens for example with an overlap of 20 tokens so I would recommend everybody to look at the documentations and see what the default values are and how you can change them in here now another way of doing this is you can set the global Service context so in that case you don't have to pass it to the vector store index that you create you simply set the global Service contract using the service context that you just created and then you can use the those as defaults in the rest of your code now in this last example I'll show you how to use an open source llm from hugging face and the way you do it is we we're importing the hearing phase Ln class mlama index llms then you can also import prompt template from the prompts class this is very similar to what you probably have seen in that chain right so here's basically the system prompt because some of the open source elements need a system prompt but this system prompt is specifically for the stable LM models in close the user query within some special tokens right now in order to use the llm it's basically using lamba CPP in the background so we create an llm based on different parameters so for example the context window you can pass this on maximum new tokens that you want to generate right and other parameters such as the temperature that you want to set you want to do sampling or not right then you need to pass the model name as well as the token as a name this system prompt if the model supports one then the prompt template with the create user query right if you have multiple gpus and you want to all use all of them you can set the device map to Auto now in some certain cases there are certain specific tokens that are going to be used as stopping IDs you can pass on those depending on the model that you're using because different models are going to be using different tokenizes right so these are different parameters you can set essentially this will create an llm class now you can simply pass that editing class to the service context that we created right along with the setup chunk size that we want to use this will set the default element to this hugging face llm and then you can do query on your web restore and this alert them is going to be used for generating the responses now this was a quick overview of how to build a document q a system you using gamma index now learner index is extremely powerful that it has some really cool features for example one of the applications that I'm currently looking at is how to fine-tune embedding models along with fine tuning the additives uh so this is going to be the first video in the series on the number index I will be creating more advanced tutorials and on the on different topics and different features that Rama index have if you like what you see on the channel and would like to support it check out my patreon now if you want to build a q a system and want to use LinkedIn instead of lava index I would recommend you to check out this video as always if you found the content helpful consider liking the video and subscribe to the channel thanks for watching and see you in the next one
Info
Channel: Prompt Engineering
Views: 78,124
Rating: undefined out of 5
Keywords: prompt engineering, Prompt Engineer, natural language processing, chatgpt for pdf files, ChatGPT for PDF, langchain openai, langchain in python, embeddings stable diffusion, Text Embeddings, long chain tutorial, vectorstorage, chroma, train gpt on documents, train gpt on your data, train openai, train openai model, train openai with own data, langchain tutorial, how to train gpt-3, embeddings, langchain ai, llama-index, llama-index ai, llama-index chatbot
Id: WL7V9JUy2sE
Channel Id: undefined
Length: 17min 31sec (1051 seconds)
Published: Sun Sep 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.