End to end RAG LLM App Using Llamaindex and OpenAI- Indexing and Querying Multiple pdf's

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello all my name is krishak and welcome to my YouTube channel so guys in this video we going to create a rag that is an llm app which is nothing but retrieval augmented generation where we'll try to query from multiple PDFs with the help of L index and openi now this was one of the project where people were requesting like anything from past couple of weeks so that is the reason I thought of why not create this specific project for all of you now since we have already started the playlist of Lama index step by step we will go ahead and build some amazing rack system more complex advanc rag system we'll try to use vector vector databases we'll try to use external database and we'll try to see how we can easily retrieve in the form of queries right so let's go ahead and let's build this specific project we will completely start from scratch please make sure that you hit like for this specific video and we'll keep the like Target to th000 as usual I say in every video so please try to do do that and share with all the friends as you can and make sure that you implement Implement multiple things and do tag me in LinkedIn because I really want to uh check your implementation too so let's start uh from Basics we will go ahead and start so first of all I have I will go ahead and open my terminal and here we are specifically going to use Lama index along with open AI right API so I'm going to create a command prompt so quickly let's do K deactivate first of all we will start with creating the basics uh from creating an environment so for this I will go ahead and create cond create now see guys creating environment is through three different ways I've already created a video in my channel so cond create minus p and here we are going to use VNV python equal to 3.10 with Y okay so here what is basically happening we going to create this V andv environment and then we are going to activate this specific environment now as soon as I probably go over here so here you can probably see V EnV environment now what is the main task in this specific project okay here there is a data folder here you can see that I have two PDFs one is attention is all you knowed and one is yolo. PDF right we will try to read this particular PDF convert this into a vectors or in the form of index and then we'll try to retrieve any question if I ask any query if I ask with the help of Lama index I should be able to retrieve all the det details from here so this is in short a retrieval argumented generation llm app right so uh yes uh the environment has got installed now I will quickly minimize this I will go ahead and write cond activate V EnV and remember this process I'm going to probably do for each and every project okay now this is done I've already covered this usually now over here I've created this EnV folder the reason I've created this because I'm going to use my open API key right so that is the reason I've created EnV I'll not show you the key but you just need to write open AI API key uncore API key and that will be your key name whatever you are specifically getting from open API right then along with this you have this data folder you can put any PDF as you want it is up to you okay the third thing I am going to create a requirement. txt now inside this requirement. txt I'm going to install three important libraries one is the Lama index the second one is something called as open API and the third one is nothing but P PDF so this three libraries I'm going to specifically install so let me go ahead and write pip install minus r requirement. txt oops so I'll go clean my screen so let's go ahead and do it pip install minus r requirement. txt so all the installation will specifically happen because I required this three libraries Al together now as we go ahead with this this specific playlist more amazing projects I'll come up with you know where I'll create more advanced rack system I'll include databases I'll try to use Lang chain and through this way you'll be able to implement things in an amazing way itself right so yes uh let this installation basically happen and then I've also created my test. iynb file initially I will show you in this specific way you get a Crux you get an idea about it then from the upcoming project will convert this into an end to- endend project with deployment okay so all that is the pattern that we I really want to follow but again the main thing is that I really want to show you how things how you can learn these things also okay so let this installation basically happen and at the end of the day guys please make sure that you practice unless and until you don't practice you will not be able to understand so that is the reason Implement along with me so that you will be able to understand if you get any errors if if I get any errors I'll try to solve them and if you find any issues in this you can directly ping me so guys the installation has taken place since I'm using zupiter notebook I'll also install one of the library I'll clear the screen and I'll write pip install iy kernel because I will require this to run my jupyter notebook okay so pip install IPI kernel one more library that I specifically missed uh was something called as do to load. V the reason why I use load. EnV because I need to load the environment variable so for that I will be using this librar is called as python d.v okay so let me do one thing uh first of all I'll install this IPI kernel and then probably go ahead and again install all the specific libraries then I will be very much like we will uh implement it step by step I'll write all the steps over here initially what we will do then how we are going to use the Llama index how you are going to probably convert all the content inside the PDF first of all loading the PDF then inside the PDF what all text are specifically there how we convert that into vectors and then with all those vectors how do we convert and uh how do we do the indexing using this uh Lama index all those things we will try to discuss now let's quickly go ahead and install the requirement. txt again uh I think this is done perfect I will clear the screen now we are good to go uh I will go ahead and select my kernel 3.1.0 now let me go ahead and save it now everything we have set up in file everything we have now it's time we go ahead and Implement our solution now the first thing over here is that I will go ahead and import OS okay now the reason why I'm importing OS because I'll set up my environment variable okay my environment variable with respect to my open API key but before that I will go ahead and write from EnV I'm going to import load uncore load. NB okay and then let me go ahead and initialize load Dov with this so once I probably execute this that basically means all my environment variable be ready over here and it will be loaded okay so this is the first step now in the second step I will go ahead and write OS do Environ os. Environ and here I will go ahead and write os. getet EnV so I have my go. get EnV and let me go ahead and call my API keys right and with respect to this my API key will be nothing but it'll be something like this open API key so now I think you got a complete idea like how we specifically calling the API key over here so it will be nothing but os. get EnV and inside this I will set up my environment variable and my environment variable will be the same thing over here right so this is done now with the help of this environment variable any open API key I will be able to call it and I will be able to use it so that is the reason I have written in this specific way now what is our aim Now understand first thing guys here is my data folder it may have any number of PDFs I have to pick up all the specific PDFs so in Lama index you have lot of different functionalities which will be able to pick up all this particular PDF or load this PDF and convert this into vectors or indexing okay some indexes so let me just go ahead and show you how things can be done in that specific way so first of all what I will do I will go ahead and write from Lama index I'm going to import two important libraries one is Vector store index and then I will also have simple directory dator now since I really want to read it from the directory itself so I'll be using simple directory reader Vector store index will be responsible in converting all the text into vectors and it'll index those vectors okay so the next thing what I will do I will go ahead and write documents is equal to and simple directory data now you know what where all my PDFs are present inside my data folder in the current project location inside my data folder so I will write data do loore data right see there are multiple ways you can also take PDF convert that into a text and then convert into a documents also so once I specifically execute it here you'll be able to see that it'll take some time 2 seconds hardly so it has read both this PDF now if I really want to verify it you'll be able to see if I write documents so I have all my data now it in short this document is basically creating the metadata all those information see attention. PDF it has all the information you know all the content inside this particular document you'll be able to see the content also so perfect so this is what is my first step I have created my documents over here now I will take all these documents and convert this into an index okay now to convert this into an index what I will write I will go ahead and write index do Vector store index which I have already uh imported right I will write Dot from documents and here I'm going to basically use documents documents so in short what I'm doing I'm taking all these documents that I have and I'm converting this into an index okay there is also one very important parameter so in short what is basically going to happen that inside this documents whatever the PDF content is there it is going to convert that into vectors and then it is going to index them after this uh you'll be able to see that I can also give one more parameter just to show how quickly and there is a parameter which is called as show progress is equal to True okay so when I write show progress it will show me each and every stt how much time this generating embedding is basically taking right so hardly 3.9 seconds and here you'll be able to see that all the indexing is done so if I probably go ahead and execute index so this is my Vector store index so in short it is basically created the vector store index in the form of word embeddings okay I will show you how you can save this index also okay so once this is done now from this index I can directly query any question that I have okay so for that what I will do I will write query engine I'll create a query engine okay it's just like a search engine okay and I will say index dot ask chat engine also option is there I will show you as chat engine option as query engine option right now we'll focus on as query and in the upcoming projects when we create more complex rag system at that point of time I will be using all the other option so right now I'll be using as sarey engine and inside this I'll not give anything right now later on I'll also give some parameters over here okay so I will go ahead and hit this so this is my query engine now now if you probably go ahead and execute this query engine also you'll be able to see this is a query engine retriever that basically means it is responsible for now retrieving any information that I right let's say I ask any question it will retrieve the information from those indexes okay so this is what we are specifically going to do now instead what I will do in order to execute our query I'll use this this object query engine. query and let me just go ahead and write what is uh Transformers so this is one of the topic that is obviously present in this attention. PDF okay so once I specifically execute this and print the response okay so it it'll take some time again based on the from the indexing that is basically going to pick up uh once I get the response so in 7 Seconds you probably got the response and now let's see the response so response basically says that over here the Transformer is a model architecture so it is providing the response sir in order to see it more properly I will go ahead and write print response now see most amazing thing so here you can see the transfer is a model architecture that re relies entirely on attention mechanism to draw Global dependencies input and output it is used in sequence transaction such as language translation replaces everything is there let's try something what is YOLO there are two PDFs now you can see I've developed a similar application with the help of Lang chain but I think this gets the seamless results okay so now this query engine say since it is able to find the content from here itself it is giving you the response if it does not not find what will happen I'll show you that also so again here you can see a new approach YOLO is a new approach to that frames object detection as a regression problem it uses a single neural network to predict bounding boxes so it's working absolutely fine right so here you are able to get the response itself let me give you one more way how you can display this response in a much more better way okay so what I will do over here before print response let me import one more Library so here I will write from Lama uncore index comma sorry do response dot dot okay dot response it should be dot print utils okay and I'm going to specifically import print see there is an option you can print metadata you can print response you can print Source no so right now I'll be using print response okay now what this function does see this huh because here you will just not get one result see this is the topmost result that you are probably getting from the llm from the Llama index from the indexing part right from from that index that is squaring right multiple multiple uh response also you get it but which is the most suitable response that will be in the first uh you can basically see say that it is the first item in the list okay now if I execute this now you'll be able to see see the final response is this okay uh okay final response so here you can see PRP print response here you are saying that okay I will also use one more parameter I'll say show source is equal to True okay now see I've got two responses and why this has been selected as the final response so here you can see this is my final response along with this I've also got two more right one with 81% similarity so here you can see you look only once realtime object detection this this this right so so here is the next one right other information so here from Top from all these responses this is the best response that is probably selected so that is the reason it is showing you over here itself right if I probably go ahead and ask what is what is attention is all you need right so that basically means what I'm trying to say is that you will get multiple response but which is the best response that will get selected and it also shows you the similarity sco so let's let's execute this okay based on this response you can see if I use this PPR printor response and if I show See the s shorts Source node it will provide you all this information from where it is specifically picking up all these details okay so I will take this scrollable event so let's go ahead and execute this and let's scroll this okay so here you can probably see the this is my final response the best response the paper attention is all you need proposes the new network architecture along with the other sources notes like what were the other things that was given out of there but it it had only similarity of 81% this had somewhere around 78% right so input the law will never be perfect now see uh the attention is all you need proposes and network architecture called a Transformer this this this but which is the best one that will probably be shown now the best thing about this is that I can still modify all this things I can modify in a way like how I want instead of just two right now I'm getting only two responses what if I want four responses what if I want five responses what if I want six responses right everything that is possible so for doing that how can we do that let's let me show you so here I will say from Lama index from Lama index Again by using Lama index only we'll do dot retrievers import Vector index retri again I'm telling you guys if you don't have open a API key just just put $5 credit guys see so much things you can specifically learn and this is all possible because I'm using open IA yes I will show you open source llm models also but there also you required to do a lot of setup okay so that is the reason I'm saying that just put $5 just try to see it okay now for this query engine right now I have not given any parameter right so Windex uh Vector index retriever is there so I've already imported it sorry I'll not use this so uh okay let let's use this because I think on the above I've used Vector store index okay perfect so Vector index retriever I'll use along with this I will write from Lama index dot query engine because now I'm going to change my query engine with a lot of parameters I'm going to import retrieve query engine okay and then I'm going to write from Lama index dot indicis dopost processor and let me just import the similarity similarity processor similarity post processor I'll tell you why I'm specifically using this three libraries so first of all as I said retriever first of all anything that I probably create as an index as a query engine it becomes a retriever right I have to give that as a retriever so this index that you probably see over here right this is a vector store but once I probably execute this query engine it is it is basically becoming a retriever right so uh sorry where it went yes here right this becomes a retriever so I'm going to create a variable called as Retriever and here I'm going to write it as Vector store index retriever I will provide the index over here so index will be something like index whatever index I've created and here I will change my similarity top K let's say I want four different results okay four different results so this will be equal to okay four different results initially by default it is giving you two response or three responses right here I let's say I want four so I'll put this as four now this retriever right I will give it somewhere right so here I will write query engine is equal to retriever query engine and inside this I will say hey this retriever is my first parameter I will give the same parameter I'm saying now whatever my query engine is this is the first time I'm giving this specific query before the query engine was just I'm saying index as query engine so this was my Retriever and I am considering that as a query engine but this time I've created a different retriever and then I'm creating a query engine over here right so if I go ahead and write this query engine and provide you the retriever then what I'm going to do I'm going to execute this let's see now I will give the same question what is the attention is all you need and then finally you'll be able to see that I'm going to see the response now see I've got four responses right this is my first response right then you have Source node one then you have Source node 2 you have Source node three uh if I probably see Source node 4 Source node 3 three right so all this information you're specifically getting that basically means now out of this we are able to get four different responses so this is super important uh and you can modify based on this like how much what is the kind of results that you specifically want now I will show you this functionality also which is called as retriever sorry similarity postprocessor what exactly it does now guys let's talk about this threshold that is given see there is a similarity of 81% there is a similarity of 78 7 28 see I can also make sure that I say that hey if the similarity is above 80 then only show me the response okay and for that we will specifically be using the similarity post processor okay so let me go ahead and write post processor processor and here I will write similarity post processor now inside the similarity process uh post processor I will give my similarity there is a there is a parameter which is called a similarity cut of and here I'm going to give you 80% okay so I'm saying above 80 just give me above 80 and the next parameter I will go ahead and add it over here so I'll write comma and I will go ahead and write nodecore postprocessor which will nothing be my post processor okay perfect now let me just go ahead and execute this and okay Tuple has no attribute call manager so let's see what is okay I have to give this in the form of list that is the problem okay so I can give multiple post processor like I how I want so this is executed what is aention is all you need has got executed uh now you will be able to see the response only the similarity score when it is above 80 only that thing will get displayed okay so here you'll be able to see final response and here I'm getting it1 only one response I'm able to get it so this is really really good okay now see anything that is probably happening whatever query you specifically doing it is basically happening with the index right now entire index is stored in our memory right and uh there may also be scenarios that I really want to store this entire index inside uh my hard disk right as a persistent storage so let's go ahead and see how we can basically do that okay and for that I will just write this code let's see this code this will be fun and let's observe this code okay so first of all all the Imports are basically happening we using this Vector store index simple directory reader storage context and this we are using load uncore index from Storage okay see here we are creating a persistent directory that basically means I'm going to create a storage folder over here so here I will somewhere create a storage folder and all this index whatever parameters whatever weights it has it is going to store inside this particular folder and whenever I really want to look load those index again instead of just querying the index that is in the memory I will first of all load from my hard disk and then query right so that my application will keep on running so I will try to store everything inside this storage folder so that is the reason we have written if not os. part exists persistant directory then we have written simple directory we are loading we are we are loading the entire data we creating the index okay if the index is already created then what will happen once it the index is created we will store this so over here we have written index. storage context. persist so whatever parameters is basically present inside this index it will get stored inside this particular folder right you'll be seeing multiple parameters will be there what all files will get created okay else we will go ahead and again load it from the default from those particular folders so that is the reason we have used the storage context and we are going to load this entire storage context in the form of load index from Storage okay and then finally I'll get my index okay okay then again we do index as query engine and what are Transformers I'll get the response again understand understand what is the simple mechanism over here okay what we are doing over here is that let's say I will write it over here okay so this is my index okay index basically means what in short this is basically having the vector embeding right Vector embeddings now Vector embeddings has different different files right different different parameters how the vectors look like so in short this will having around four to five files so four to five files right now all these files information are present in memory we cannot keep in memory because if my PDF keeps on increasing then my memory will keep on getting exhausted so what I will do I will take this index and store it in my disk right my system disk and whenever I require this index I will load it from here and then I will query it so that is what I am actually doing from this specific code right so I'm saying that if this directory does not exist then I will create this index and then I will store this index in that particular directory else I will load from that particular directory right so now let's go ahead and execute this if I probably go ahead and see the response you'll be able to see that I'll be able to get the entire response and once we get the ENT response you will also be able to see that here is all the files that is created inside this folder see I got the response over here now let's watch each and every see default Vector store this is my default embeddings vectors right you can see over here graph store right now nothing is stored with respect to D doc store this has metadata information right with some hash key and all index store so it has some information with respect to this right in short these are mappings one with other mapping but default Vector store this is the main embedding vectors that is created out of it right so I hope you were able to understand this video right and this is how you create a basic rack system with the help of Lama index and open AI so yes this was it for my side I hope you like this particular video I'll see you all in the next video have a great day thank you one all take care bye-bye
Info
Channel: Krish Naik
Views: 26,757
Rating: undefined out of 5
Keywords: yt:cc=on, rag llm, llamindex tutorials, openai tutorials, indexing and querying LLm app, Retrieval augmented generation LLm app, generative ai tutorials
Id: hH4WkgILUD4
Channel Id: undefined
Length: 27min 21sec (1641 seconds)
Published: Mon Jan 29 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.