Build a PDF Document Question Answering System with Llama2, LlamaIndex

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
PDF Excel files document files are everywhere getting exact answers for the queries that you have can be a difficult process well what if I tell you there is a solution out there in today's video I'm going to unveil a super power that you can utilize for asking questions to your PDF and other documents in natural language what will I be using in this entire video well I'll use tools like llama index the amazing llama 2 model which is open Source Apache Cassandra gradients llms and python so stick around and watch the video till the end to witness magic let's begin before I jump and show you the actual implementation I want to break this video down into the multiple Technologies and tools that I'll be using to create the solution so I'll Kickstart the activity by starting the discussion on llama index what exactly is llama index well Lama index which was previously known as GPT index is a data framework for large language model applications which provides tools to ingest structure and access private or domain specific data it's basically available in Python and supports a variety of data sources including unstructured data like documents PDFs videos and images Etc Lama index basically provides a high level API that makes it easy to get started with data ingestion and query as well as lower level apis that allows Advanced users to customize and extend any modules to fit their needs so it basically gives you an entire framework where you can kind of have advanced retrieval and query interfaces over the data which kind of allows you to feed in any llm input prompt and get back retrieved context and knowledge documented output so this is the entire blog diagram of how Lama index functions so you have documents like PDFs Word files and and text files all of that is converted into something called as an index this index is stored into a vector database once a query arrives there is a retriever that kind of retrieves relevant nodes based on the query the query engine then decides which particular index to kind of display and then you have the final response so I'll kind of try to explain this even further so think of a document like a PDF document that is loaded into llama index the document are passed or broken down into chunks and these chunks are basically called as node objects the node objects are split into overlapping chunks and a semantic Vector representation is computed for each chunk using a large language model the semantic Vector representations are then stored in a data structure called as a vector store the vector store is used to build an index so this is the entire indexing process that goes behind the scene when when you kind of start using Lama index so once you have the query in place there is something called as a retriever that comes in handy so the retriever is a tool for extracting and Gathering relevant information based on a user's query so this plays a very vital role in terms of extracting the relevant answers for the query so we've looked at most of the modules but one piece that was still remaining is the query engine so the query engine is built on the top of the index Creator module and the retriever module the query engine goes through the various results that have been kind of extracted by the Retriever and then it kind of selects based on similarity which is the most probable output and then gives out the response so this is how the entire llama index functions the next piece of discussion that I want to have is on the open source large language model that I'll be using for this entire activity which is Lama 2 so Lama 2 has been open sourced by meta and it's basically an auto Auto regressive model so it kind of released on 18th of July 2023 in partnership with Microsoft and this is basically an open source model that is something that I've already specified just to give you some context llama 2 has been pre-trained on two trillion tokens and we also had llama 1 but the context length of llama 2 is twice as large when you compare it to llama 1 there are three models which are there so you have the smallest model with 7 billion parameters then you have 30 billion parameter based model and then you have a 20 billion parameter based model so just to kind of highlight the improvements of llama 2 over llama 1 the first point is increased training on tokens Lama 2 is trained on 40% more tokens as compared to llama 1 longer context length so with a longer context so llama 2 has a context length of 4K tokens the current version of Lama 2 has been fine-tuned and aimed at being optimized for for dialogue applications using rlf so this is the uniqueness of Lama 2 given you can kind of run this entire piece locally as well which is where I'm building the solution using llama index and Lama 2 and I'll kind of show you the vector databases that I'm using for Vector databases I'm using Apache Cassandra which is what I'll discuss next for using a vector database I'm basically using data Stack's implementation of Apache Cassandra I've already discussed the advantages of Apache Cassandra in my previous video I'll add the link to my previous video so that you have a better understanding of why Apache Cassandra is so widely used so you can Kickstart the Journey of using Apache Cassandra on data Stacks by going to as. dat.com once you reach the website and once you sign up you will have a prompt to create a database so what you have to do is you have to select Vector database which is something that we'll require in this entire activity enter the database name a key space name select the cloud provider as Google Cloud you can choose the other Cloud providers as well and then you can select the country once you go through the entire process you will have an active Vector database with you all you have to do in order to use the database is you have to go to the connect section so once you reach the connect section there are two things that you require the first thing is the application token so if you look at the selection that I've made you have to select database at administrator and then click on generate token so it will kind of generate a Json file for you the other piece that you require is a secure connect bundle so for that you just have to click on bundle and it will download a zip file once this is up and running you've kind of done the heavy lifting with respect to the vector database now the other piece remaining is how do you access Lama 2 model which is where I'll show you the next solution so there are many ways wherein you can access Lama 2 but one solution that I really liked and this is something that I've started using on a daily basis is gradients llms gradient makes it very easy for you to personalize and build on top of the open source llms through a simple fine tuning and inference web API there are various open source models which are available like Bloom 560 Lama 2 code Lama Etc then there are various embedding models which are available and gradient also has integration with Lang chain and Lama index which is where I'm kind of using gradient to kind of do the entire heavy lifting for us in order for you to start using gradients llms what you require is gradients access token and gradients workspace ID and once you click on create workspace it will kind of create a workspace ID for you so you can find the access token in the profile section once you create your own profile and paste it into an environment variable which I'll show in in the coding section so this is something that you require before you kind of start using the gradient llms as well now that all the heavy lifting is done let's actually move to the implementation piece wherein I'll show you how you can chat with your PDF using llama index Astra DB and gradients open source models so let's begin before I begin just some Basics that I want to reiterate again I have already created a Google collapse session and I'm currently using a CPU based session I've already uploaded two files here one of them is my secret Json file which is used to connect to the vector database and the other is the secure connect zip file so these are the two files that I've already uploaded now I've also uploaded two PDF files in a folder called as documents so I've uploaded the attention paper so attention is all you need is something that revolutionized the entire llm ecosystem and the other PDF that I've uploaded is the Apache aand drop white paper so these are the two PDFs that I've uploaded I'll ask questions to these two PDFs and I'll kind of show you the results as well so this is something that I wanted to State before I actually move on to the code so let's begin so I'll Kickstart the activity with the installation so as you can clearly see I require llama index I require Pi PDF Cassandra and the other libraries that are there before I start using them I'll have to install them which is what I'll do right now so I'll quickly run the cell so I have the installation in place let's move forward I'll Kickstart the activity by importing the OS and the Json module and if you go back to the earlier part of the video wherein I had specified for using gradients llms you require the access token and the workspace ID so I have my access tokens and workspace ID in a unique new feature by Google collab called as secrets so I've kind of saved my secrets here in Google collab so I have my gradient access token here in this particular secret and I have my gradient workspace ID as a secret here so both my secrets are here so I don't have to reveal them while kind of going through the code so this is something that I wanted to share so I'll quickly go forward and run the cell so basically the gradient access token and workspace IDs are added as environment variables through the OS module so I'll quickly run this cell so this piece of code has been executed now I'll move forward now here I'll import functions from Lama index and Cassandra so I'll quickly run the cell just to check which version of Cassandra am I using so I'll quickly run this piece of code as well so I have Cassandra 3.28 do0 so this is the version that I'm currently on now in order for you to ask questions to your PDF the first thing that you require is a vector database to store the indexes which is where the first thing that we'll do is we'll connect to the vector database so this is a readily available code which is available in Astra DB's website so once you create the database if you scroll down you would see this particular piece of code which is something that you can kind of easily kind of copy from there and paste it here this can kind of help you connect to the Astra DB or the vector database that's available all this is basically doing is it's kind of extracting the client ID and the client secret from the token that you had supplied which is the Json file and it's kind of then authenticating it based on the client ID and client secret that you've supplied once all of that is done it kind of creates a session on Apache Cassandra cluster and then it kind of executes a simple command called as select release version from system. loal if everything works fine then it will kind of give out the system release version for you if there are issues then it will kind of give you an error message so let's execute this cell so it gives me a proper release version which is where I have established a successful connection to my Vector database so we are good in that aspect let's move forward one of the reasons why I prefer using gradients interface for accessing open source models such as llama 2 is because the overall abstraction that they have done is simply commendable and you just have to write very few lines of code mostly one line of code to create an llm instance so I'll show show that to you as well so I'll quickly unhide the cell whatever you're seeing online right now is all the amount of code that you have to write in order to create an instance of the Llama 2 model all you have to do is you have to create an instance of the class gradient base model llm Supply the base model slug which is like a unique identifier for this particular model specify the max number of tokens and then save it into a variable called as llm that's all that's required so I'll quickly move forward the other piece that we'll require in order to generate the index are embeddings and here is where again I'll use gradient approach of creating embeddings as well so all you have to do is you have to call the gradient embeddings function pass in the access token workspace ID and the gradient model ID or slug which is used for generating embeddings and once you do that save everything into a variable called as embed underscore model so I'll quickly do that as well now what we have so far is a valid connection to a vector database which in our case is Apache Cassandra hosted on data Stacks we have an active embedding model with us we have an active large language model which is llama 2 with us using gradient apis what we now have to work on is executing the Lama index part of the entire solution so which is where what I'm doing is from llama index I'm kind of importing the function service context Dot from defaults and I'm specifying the llm I'm specifying the embedding model and I'm specifying the chunking size so for index generation what you have to do is you have to recursively split the text that's part of the PDF once the splitting has happened you have to convert the text or the chunks into indexes which is where this particular function comes in and you also have to supply the large language model that you're kind of utilizing for the embedding generation and also the embedding model so this is something that I'm doing through this piece of code and I set the global Service context as this particular service context so I'll quickly run the cell almost all of our heavy lifting work is done in terms of code now I'll quickly move forward and I'll load the PDFs and I'll kind of Point llama index to where my documents or data resides so I simply create a variable called as documents and I call the function simple directory reader wherein I pass in the directory and I call the function loore data once all the data resides in documents I want to find out how much documents have we generated in the entire process okay so I'll quickly run the cell and the output here tells me that you have loaded 17 documents the attention paper had 11 pages the Apache Cassandra white paper had six pages so what this particular function does is it kind of splits the PDF into single pages and saves all of them as individual documents that's how llama index is working behind the scene here now if this idea is clear we have the documents ready in the document variable all we have to do is we have to set up the index the query engine and then ask questions to our documents okay so here is where we are generating indexes for our documents and saving it into our Vector database so which is where Vector store index doore do doents and I pass in the documents and I and I pass in the service context for this particular instance variable that I've just created I want to create a query engine for this which is where I call the function asore query uncore engine and save it into a variable called as query engine so with this heavy lifting done what I'll do next is I'll kind of run the cell now is the point where you will see magic happen Okay so I have the instance of the variable query Engine with me and I call the query function and I pass in a question here so let's start with something related to what is Cassandra and let's wait for the response so here is the output it says Cassandra is a distributed storage system so this is the result it's generating from the PDF I'll ask the next question how does Facebook [Music] use Cassandra according to the paper look at the output according to the paper one of the applications in the Facebook platform that uses Cassandra is for storing data so it kind of highlights the specific section in terms of how the entire platform uses Cassandra so amazing right I have a PDF I don't have to go through the entire PDF in order to understand the output all I have to do is I can kind of ask questions to my PDF in natural language amazing isn't it now I'll ask more questions but this time to the attention paper what is multi head attention so here is the output multihead attention is a type of attention mechanism and the output kind of follows so if you have so imagine a use case wherein you have 10,000 documents you just stuff it in and you kind of pick a good scalable Vector database which is where I've kind of chosen Cassandra because of the scale that you can kind of achieve so then once you have all the PDFs loaded into a vector database you can ask questions n number of questions to all the PDFs or documents that you have I can also ask what is positional encoding and let's wait for the answer so it says positional encoding is a technique used in Transformers and yeah it gives me it's giving me the exact results it's also giving me sometimes a quotation in terms of where this exist in the PDF overall this is an amazing solution where you're kind of using the latest and greatest llms so this is something that I wanted to demonstrate in today's video I wanted to show you the power of how you can use llama index open source llms using gradient Apache Cassandra using data stacks astrab and Python and create an end to-end amazing solution the initial demo that I showed you is something that I've created using gradio the code will be available in the description section of the video so you can check that code out as well but I've kind of packaged the entire python code that I've shown you into a gradio based application I hope you enjoyed this video and if you do like the content that I create on my channel it would be super motivating if you can press the Subscribe button and also press the Bell icon to be notified for amazing videos on data science and machine learning thank you so much for watching the video [Music]
Info
Channel: Bhavesh Bhatt
Views: 169,565
Rating: undefined out of 5
Keywords: llamaindex, llama2, llms, apache cassandra, astra db, vector database, how to use llamaindex with astradb, generative question answering, llama2 question answering, build question answering system, automated question answering system, question answer system nlp, document question answering, gradient ai, saving embeddings in apache cassandra, question answer nlp
Id: pApPGFwbigI
Channel Id: undefined
Length: 19min 35sec (1175 seconds)
Published: Wed Nov 15 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.