Super Easy Way To Parse PDF | LlamaParse From LlamaIndex | LlamaCloud

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello guys welcome back in this video let's talk about llama bars from llama index I hope you uh you will agree with me that if the data you fit into the llm is good then we get the good results like garbage in garbage out uh things which we used to talk about in machine learning right that is same for llm also and I will be briefly explaining you the need of Lama and why is it necessary and after that I will show you the UI let's say I will navigate you through the UI of Lama Parts at the last I will implement it via python code with open AI models which is by default used by Lama index and also with the models from AMA Let's Get It Started okay so I'm now on the blog post here it says introducing Lama cloud and Lama pars right so Lama Cloud launches with the following two key components so Lama pars and managed inje and retriable API right so you get the idea that whenever you you you do something Lama part directly integrates with llama index inje and retrieval to let you build retrieve while over complex and semi structured uh documents right here as it says rag is only as good as your data I think you can just go here and read this and you also get the point that I just mentioned that if the data you Fade Into the llm is good you get the good output out of it right and there are different things me here why there there is a need of this and you might also argue that there are there are many other platforms also which does the parsing right and they have also written here good thing that okay getting started with the famous F line starter example is easy but building the production grade rag remains a complex andal problem right yeah they have they have here some problems with they face his results aren accurate enough the number of parameters to tune is overwhelming PDFs are especially a problem so they have started this with uh with PDFs right so there is the Lama parts that they introduced and if you if you just look this diagram here here is the comparison of L part versus Pi PDF right we were using the pi PDF for many cases and you get the idea here they are showing these two different comparisons so you can see here a red highlight means that the question was answered incorrectly right with the p PDF many weere in the red meaning the Lama part is trying to solve this kind of problems there is also this on structured IO which is doing similar kind of things I might explore that in the future also but for now let's go with the Llama parts and let's see how how it is trying to achieve the things I will provide the link by the way you can just go here and see and the example here Baseline Pi PDF plus KN rag the mean correctness score is 3.87 4 well with the Lama parts and uh recursive retrieval it is 4.27 right and just to mention here is also that this service is available in public preview mode available to everyone but with a users limit of 1K pages per per day and you can see how easily we can we can use this in the code also so yeah and then also if you need the commercial use of laap pass you can get touch with them so they will provide you many things and also as I mentioned you in the beginning currently they provide only the support for PDFs with tables but they are having say they they are planning to expand this into other popular document types like pptx HML and docx right so yeah how this works just go through this and this is the playground but this is uh only for let's say Advanced user if you want to get into it but for now I'm not going to talk about Lama Cloud but I'm just talking about the Lama pars right just go through this and see how how they are trying to implement this frequently asked questions and all the all the different things now how to get started is the question right there is a link so which says that okay Lama Parts onboarding Welcome to the Lama Parts the first public facing releas in Lama Cloud you can just go to this web web link and you can follow all the necessary steps as it is mentioned here but I will just show you briefly this is the cloud. Lama index. a I have already created the account there as you can see here this is the cloud. lindex doai you need to create the account first and yeah this is how the UI looks like there is the pars and there is the index if I go to the index as you can see here sign up for the private preview and you need to provide all the information here but I'm not going to go with this one right now I'm going to go with the pars right if you go to the pars yeah this is how you can Implement and the the before implementing there is two things you need to remember use with Lama index this is how it is shown here and use as a standalone API so you can just use call and then use this as a standalone API also there is also this preview paring that we can already open or let's say provide our PDF here and see in the preview before implementing this into our code right I will just click this one I will go to my desktop and go to L par example I have the data GPT for all PDF I will just upload this here as you can see here this is the uploaded version so normal things here authors abstract and all the different things there is also the table there is a small diagram here how this is being passed by Lama pass right if you go to this preview so this is how it is being passed here this is a markdown as you can see here there is the title and there are the authors and there is the abstract it it has let's say par that PDF uh document into markdown it is easier for our LMS to get answer from it right as you can see here providing this into the llms OR providing this randomly makes a huge difference right so yeah this is how it is being provided here now our plan is to use the API key and use that locally for that you can go to this API key icon here and get the API key I have already taken the API key and now just just go to this Lama paral example uh GitHub repository I will provide you the link and just download this first create a virtual environment and activate and install the necessary packages and provide the environment variables in the EnV right now I will open the terminal here I have already cloned that particular repository and created a virtual environment here and install the necessary packages I will open this in VSS code here you can see this is the vs code and first I will show you without using the parser and then using the parser right and there is this same uh GPT for all paper as you can see here I'm using the same paper and I will be asking some questions out of it so this is with no par right how the F line things work what I'm doing here is just normal markdown things to print it in a good way and from Lama index. core we have this Vector store and simple direct reader I'm loading the environment variables and by the way we need the open AI key here so if I if I show you now here here because I will delete this later this is the Lama Cloud API key that I just showed you in the Lama cloud and this is the open a API key you can get it from the open's website once that is done you can just load this and we have this open a API key and inside data I have this GPT for all paper so I'm using the simple directory loader but by the way you can just go step by step here I can show you okay I will first read this right this is done and then I will read the document the document part is also done if you want to see the documents you can just print it so as you can see 1 2 3 4 5 six different things if you scroll here all the different things are being extracted uh by default and then we can create the index so we create the index here and we create the query engine and now we can ask the question let me probably ask the question here okay who are the authors of the paper I will just uncomment first one here and I will just print the response it will just use the open AI model and as you can see here it provides all the names if you want to have the good output just uh use this in the markdown and you can have the output here this is how it works I'm just showing you step by step but if you want to run this in one go what you can do is just let's let me see let I can uncomment this display go to the terminal and then just R Python 3 no par. Pi the same answer that we got here will be shown here I'm just using the Jupiter extension here just to show you the interactive way but yeah as you can see here it shows the answer and yeah you can just go ahead and ask any any questions for example if I go here and ask the uh second question okay who where was the collected where was the collected data loaded on right I can just ask this query and I can just print this it it is shown in the atlas if you have watched my previous video you know that it is loaded in the atlas and similarly you can ask the questions from the from the terminal so yeah this is how we can do with let's say no paral right but now how to do this with parer right I will just close this for now and there is another file called paral open AI here what I am doing is the normal things as I'm show I have shown you before also all these things are normal I can just go here and then just copy all these things I will import these things and here I'm using the uh Lama parser API key and the open API key right I will run this one and now here is the code paral and I'm take me make this bigger this is the paral and I'm using the Lama parer right I am importing the Lama parser here from the Lama pars and I'm providing the AP I key which I get from the Lama cloud and now I say that okay I want the result type to be marked down and markdown and text are being available right now and yeah I can just let me first provide the okay I have already provided AP key now I can run the parser right I will run this and if I see what is the parsel you can see that this is the parsel and the API key is also being printed here and I'm just saying okay I want to read the PDF file file and use the simple directory uh reader and I'm passing the data which I was just using here before right so that's all what it is doing I can just go to this line and I will run these two line shift enter so what it is doing is now creating the documents that was created as before right so now as you can see it started paring the file under the job ID now if I just want to print the document yeah I can print here and if you scroll here it is getting all the informations uh that it was uh used and the same information that we saw in the UI is being extracted here now we can just create the normal Index this is being created and then you can just create the query engine as we did before and we can ask this simple question here so yeah it should provide us the answer the collected data was loaded into Atlas right now you might be thinking okay the answer is same what is the difference between it but you get the idea that we are using let's say we are doing the paring in the Lama Cloud instead of instead of doing it locally right if you just compare these two things we are just we are just using using this documents and do load data and we are passing this document here but in this case as you can see here I can make this uh let me be little bit bigger we are just providing this simple directory reader and then we are loading the data but we are taking it let's say the file extractor file extractor and we are getting it from from this parsel this is the parsel being provided but you can ask complex questions if you want uh for example if you go to this GPT foral paper and maybe you can just go here and ask some random question what is the let me ask one question by the way what is is the b l q value of this mode right I will just copy this I will say what is the B lq so I will go here I will replace this it's no parel but let me go with the paral I will go here and just let me see what I can do is okay what is the b o l q value of this model right I can just go here and run this response let me see if it gets the answer or not right so yeah it says 73.4 now if I go to this and if I go down let me make this bigger yeah as you can see here it is 73.4 it gets the answer from here now one thing just to compare is what I can do is I can just copy this right I will copy this go to the no parts and I will just paste it here it must also get the answer but we are not using the parser in this case right what happens if I run this here windows for this yes I want but it says error because I need to run all the things let me load this again here okay import OS and now I can run this let me see if it gets the answer or not as you can see here the bull Q value of GPT for this is 77.1 right but if you go here it is not 77.1 but it is 73.4 and our paral gets the right answer this is just what I wanted to show you because this is the good part of using the good data let's say I was just experimenting this and I find it really helpful and I just want to provide this and now let's say in some cases you might even want to use the local models right I have also this paral AMA I'm not going to go through all this but what I'm doing here is instead of using open AI model what if we can use the model from from AMA right here what I'm doing is all the normal things until here you can just run this shift enter and then I'm using the paral I will again run this one but what I'm doing here is by the way I can even run this part here let me make this little bit smaller until until wa here I can even load the file here right and I will do shift enter and now I can maybe run the documents by the way I'm showing you this uh for each step but what you can do is with the parser you can already use the persistent persistent DV you can store that Parts the document somewhere in the in the folder here and then use the same document each time right but here I'm passing it each time but you can just use it once and and use the same document for example if I go here and print this document it will it will give me the document right I I shouldn't run it again and again now what I'm doing is just to show you that there is good things here it is the markdown and it is showing things but if you haven't watched my AMA videos I highly recommend you to go and watch this AMA videos uh which is here it will show you how to install AMA and all the different things but I'm not going to go there but what I'm doing here is using the AMA from Lama index. embeddings do AMA I'm importing the AMA embeddings and here I'm using the Lama 2 and the B URL is this way the Llama index is reading right what I can do just run this and it says okay name AMA is not imported because I haven't run this one and now I can go here and run this so it is imported I also need to import the AMA right and I will just use the Lama 2 for now and now I will do the setting so this part is necessary because by default Lama index uses the open AI models and now we want to tell Lama index that okay use the local models instead of open a right I can just pass this here and now I can pass this document into the vectory store index so now it says okay sttp Local Host is not running because what I did was I didn't run the AMA right if I doam list now it is going to run AMA in my machine and you can see that I'm using the Lama to so that is the uh thing also that I want to show you because installing o l is not one enough you need to also run AMA right now if I go here and run the same line it will not show me the error because now I have my olama model running locally so the Local Host 11 1434 is being read by this Vector store index under the hood now what it is doing is creating the index from this past markdown right what we can do now is after this of course create the index as we did before it is taking now some time as you can see before it was using the open a model it was quite fast right and now we can run this and this is just to show you how to print in this markdown so yeah I can just go here and ask the same question okay query who are the authors of the paper it should provide me the answer let's see it is taking some time the authors of the paper are this this this this this that's really good but I want to test with the same query that I used before because I want to just test if this works right I will go here and replace this with this one what is the bull Q value of CP this this this so if this provides the answer I'm really impressed but let's see what it provides the answer here by the way what is the answer answer was 73.4 right let me see what it provides or it even gets the answer or not because we are using the models locally if it provides then it is really really good because there is now then no difference between using well I cannot say no difference but it is kind of okay for us to use the local models also right so yeah it is taking some time as you can see and then now okay it is just providing random values I cannot directly reference this this this okay this is the value this is the value and now it is just providing random stops from the PDF right but you get the idea now I have provided all the code here with no parure how you can use this with parer how to use this and with the AMA also how to run this run this code so yeah I hope now you get the idea how to use the Llama parcel this is I think really good way how llama index is providing things for us because in the future let's say if there is many other file format also being provided and if we can use this in this simple Manner and have the good answers out of it why not to try right so yeah just give a try I just find it helpful and I thought it would be also helpful for you let me know what you feel in the comment section or if you want me to create any new video just let me know also in the comment section thank you for watching and see you in the next video
Info
Channel: Data Science Basics
Views: 2,110
Rating: undefined out of 5
Keywords: llm, chat, chat models, chain, agents, chat with any tabular data, create chart with llm, markdown, chat with your data, own chatgpt, rag, chat with pdf, llamaindex, what is llamaindex, ai, LLM, how to deploy rag application, github, create-llama cli tool, how to deploy rag app step by step, rag to prod, openai, AI, RAG, rag llm, rag ai, llm rag, langchain, llama, metaphor, rags with web search, gpts, opengpts, llamaindex in nutsell, What is llamaindex, GPTs, llamaparser, parse pdf, llamacloud
Id: wRMnHbiz5ck
Channel Id: undefined
Length: 20min 31sec (1231 seconds)
Published: Sat Mar 09 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.