RAG for Complex PDFs with LlamaParse and LlamaIndex v0.10

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey Chris you hear about the new LL index v0.10 release in that new llama parse Library you hear about this yes I sure did hear about that Greg yeah man it it says that it can actually parse embedded tables and figures you ready to check this out today and see if it does what it says on the 10 I did hear it does that yeah and absolutely I can't wait to dig in yeah man well let's get right into it today we'll see you back for the results and conclusions on exactly what this thing is doing for us welcome everybody my name's Greg that's Chris AKA Dr Greg in The Whiz we are co-founders of AI maker space and today we're going to look at one of the newest tools to hit the opsource AI llm Builders Market llam parse and we want to take a close look at this and see if it actually improves on the rag for complex PDFs that we've looked at previously and that the entire industry really all Industries will continue to look at if you've got questions that pop up throughout today's demo please drop them in the slido link that we'll throw into the chat now but with that let's go ahead and get right into it we want to kind of cover a few things because there's been a lot of new stuff released from llama index including llama parse but also even a little bit more so we're going to see if all this comes together to give us really a superior production grade rag experience as with all sessions we're going to align our aim today and try to figure out exactly what you're going to get if you stick around with us for the hour we're going to do an overview of llama index v0.10 we're going to understand llama pars performance on embedded tables and figures this is what we set out to try to take a really close look at and we're going to see exactly how to build a query engine using lava par for your documents that you can leverage in your rag applications so first we're going to go ahead and check out llama index v0.10 and llama parse we're gonna sort of review llama index rag just to contextualize this a little bit a lot of the docs have been changing with a lot of these tools and we want to kind of keep you updated on the latest and greatest in the way people are communicating these tools and their capabilities so of course uh we start with the new release llama Index v0.10 this is the sort of Next Step along with the Llama Cloud platform that was released towards making llama index a real Next Generation production ready and we still see this key word again here data framework for llm applications similar to other new releases we've seen we see that llama index has a core that it's going to contain the main abstraction that we have talked about previously and that many of you are probably familiar with already and the separation between the core constructs and the third party Integrations is the key aspect of the core versus The Hub Additionally the service context object that if you've been building with llama index you're familiar with has kind of become cumbersome over time time and increasingly difficult to use it was meant to be used as an intermediate user facing layer uh to let you sort of Define parameters but it's it's sort of um it's sort of become not really the best solution given this new core llama Hub differentiation so service context is no longer going to be part of your build if you start upgrading to the new version and of course the number of thirdparty Integrations is growing uh lots and lots and lots of them um many hundreds at this point which is really really cool so you know this is kind of from their blog dropped a link in the chat here the core package underlies and then we have Integrations we have all the Llama packs and then there's some experimental and fine-tuning stuff happening as well so keep an eye out for that but the the sort of the big takeaways from V 0.10 are the service context removal and the core versus the Llama Hub all right so let's talk a little bit about llama index just in general they've updated all of their docs as they came out with v0.10 but they still are very much a data framework this is something that's unique in the industry and they're focused on helping you build llm application that can benefit from quote context augmentation this is the big idea behind llama index context augmentation let's demystify this for a second let's demystify this idea of context augmentation what are we talking about we're talking about augmenting the prompt in the context window of the llm that's it we're talking about rag that's what we're talking about why rag well rag because we don't like confident responses that are false these are called hallucinations fake news nobody likes it we need to be able to fact check with reference material that we can add to our prompt augment our prompt with it and then we can generate better answers now we we talk about context augment a one way to think about rag that we've been communicating to our audience and that we encourage everybody to sort of break down into its core component pieces is as dense Vector retrieval Plus in context learning so we sort of see this context augmentation coming in here so we're going to kind of walk through when you ask a question we're going to send that question to an embedding model it's going to create a vector format of that question after the tokenization and the embedding process we're going to then look in our Vector store our Vector store going be made up of our documents and we're going to look for similar stuff to the question using a simple similarity metric we'll set up a prompt template that we can use to augment our prompt context and it'll say something like use the provided context to answer the user's query uh don't answer if you don't know and we can then take the materials that we find that are similar we can shove those into the prompt context window this process is dense Vector retrieval right we're simply using a dense Vector representation or sometimes it's a sparse dense right sometimes it's a little more computationally efficient if you're using something like pine cone that's out of the box but let's say it's dense Vector retrieval in general in a naive way and we're returning natural language for the prompt now as we set up our prompt template and we're giving it more information in context this is the in context learning piece this is the big idea from the gpt3 paper language models are F shot Learners now together these two things these are rag these are context augmentation the the important thing to note about this setup is that it is completely independent of the llm that you put it all into when you're done and then the llm provides the response in the end so we're augmenting the context window we augmenting The Prompt so why are we doing this well we're doing this because when we prototype we want to make sure that we're going through the same industry standard process that everybody goes through we start with prompting we move to Rag and then generally we're thinking about fine-tuning it's not always linear but oftentimes it is and this sort of mental model shown by open AI is to start with prompt engineering you can think about optimizing the context what the model needs to know through rag you can also think about optimizing the llm the way the model needs to act through fine-tuning eventually you'll probably do both end up fine-tuning both your embedding model and your chat model in the end as you try to reach human level performance in your application here's an example from open AI devday so generally we're seeing a order of operations where we're doing rag before fine-tuning this is generally a cheaper First Step it's going to be easier to update with the latest information and it's going to give us that fact checkability that's all super dope so what you can take away from this is that again rag is context augmentation rag is context augmentation and note this is all independent of the llm so the cool thing about llama index as a data framework is that llama index and rag in general they really pose no restriction on how you use the ls you can of course go to fine tuning you can put different things in to different llms but it's really really focused on the data piece on being data Centric right because the data Centric Paradigm hasn't gone anywhere and as we heard in L index's v0.10 release rag is only as good as your data and I like the sort of Al model the framework that they provided in that blog of the rag data stack that's different from classic ETL we're going to load the language data we're going to process the language data we're going to embed the language data and then we set up our Vector DB what's interesting about this is that we load the data we're chunking we're tokenizing as we're processing we're deciding on chunk sizes we're we're trying to figure out do we need any sort of uh meta data or hierarchy or or how are we exactly setting up the way to think about this data either short form or long form as we do embeddings there are obviously many different embedding models you can fine-tune them as well and then the actual Vector database setup or the setup of many indices many Vector databases and how to exactly move between them is sort of an art unto itself especially depending on your organization's data so in contrast to Classic ETL all of the decisions that were taking in the rag data stack they affect the ultimate application and the key pain points for people building today this is again from llama index which generally we absolutely agree with this is what we're hearing from folks out there in the marketplace as well is that results are sort of not right they're not accurate they're not good enough a lot of the time and there's just too many things to think about from chunk sizing to hybrid retrieval to exactly which model to if I should fine tune to all of this stuff and it's just a lot to deal with and then of course everybody has just ridiculous amounts of portable document format PDFs sitting around that they'd love to be able to use and PDFs are famously hard for us as humans to deal with and they've been something that's been challenging for llms to deal with as well so that's where we are today as we try to tackle llama parse we're looking at this not accurate too many parameters and PDFs these pain points the data syncing issue is sort of this sort of live data as it's changing and updating uh this separate issue we're not going to cover this today so let's talk about llama index rag tooling for a second we've covered this previously we're going to link a few events to you guys where we have covered it but we essentially have a way to ingest data those are data connectors we have a way to structure the data those are indices those are vector databases which is the simplest type of of index and then there are different engines we're going to build a query engine today and as many of you have heard us say before query engines are to llama index as sort of the chains are to Lang chain so the query engine is really at the heart of llama index now we're seeing this sort of chat engine emerge from llama index as well which is cool and that is going to dovetail directly into this idea of agents and you know the data frame work as data agents because with the chat interface you're able to sort of go back and forth a little bit better as you engage with and in interact with your applications through different reasoning and different cycles of decision making again if you would like to know more about the constructs we've covered this in previous events and sort of digging in deeper to the core llama index package that has been sort of put together in v0.10 definitely check these out but that's enough a background for today right we're here to talk about lamap pars and it's in public preview mode and we want to understand exactly how well it's doing what it's doing well what it's not doing so well and what to expect in the future and really llama parse at the highest level it's proprietary and it's a parsing algorithm for documents that have embedded objects so we read that it had embedded table and figure capability we wanted to check this out it's also allowing us to build retrieval over more complex documents sort of semi-structured meaning they have tabular and unstructured data meaning it's like tables and language data and then this is all sort of in the spirit of going toward towards production grade context augmentation now lamap par is built on top of recursive retrieval algorithms and work that llama index has done previously so it's sort of a natural Next Step that step is to parse out tables and text in markdown format because they've built a lot of tools already that integrates very very well with that markdown format so again we can build more complex R systems with more complex data now the sort of Flagship example here from their release is the Apple 10K filings and they did a comparison of llap pars versus Pi PDF over these 10K filings they also compared Pi M PDF textract and PDF minor uh you'll notice that the red is uh where the information was not extracted very well and so lot of red um this was the least red amongst all of the comparisons the pi PDF so it was the second best and then you'll notice there are a few red pieces in the LL pars one I know you probably have to squint to look at this but like here and here so it's not it's not per perfect today but it is an improvement over the standard and that's pretty cool which gets us into our testing so what we did is we said okay well we want to test out if this can kind of work on of course the classics right so we we picked up an Nvidia 10K filing we were also very interested like could this work on infographics could this work on more complex figures that were embedded in PDF documents first I was throwing out a couple of image ideas but but images isn't really the same as PDFs so we found not just the Nvidia filings but we found a great sort of related in in many ways in sort of meta ways to what we're doing now ai and the future of teaching and learning from the office of educational technology May 2023 it's a long PDF document 7 plus Pages the Nvidia 10K filing is 90 plus pages so long chunky documents and uh you know lots of sort of infographic esque figures we're wondering you know can it extract the text can it extract the numbers what's going on now um drum roll please well here's what we found the in conclusion and we'll sort of walk you through how we got here but we'll give you the conclusions kind of up front here is that when it came to parsing there was very inconsistent speed and especially with the recursive retriever that we built which was the recommended one that that they uh that they asked that they recommended um it took actually minutes to run requests now the tabular extraction the tabular data was very good and when it worked it worked very very very well so uh definitely that was the The Shining highlight although there was no figure extraction and I think this was perhaps something that we we read but then as we double clicked in and looked specifically at the release blogs uh we noticed that they actually still in the process of building out better support for figures forther document types and of course this is the natural progression right and and then you get into this figure space you're kind of getting into the image space so I can imagine how challenging of a problem this is and of course this is the kind of feedback that they're getting from lots of folks and I'm sure that you're very interested in the day when we can do figure extraction but that day is still not today so how did we figure this out well let's go through it specifically and then we'll see exactly how llama parse is working on the back back end we used some simple models we used open AI text embedding three small model the latest and greatest from them but the small we used open ai's gbt 3.5 turbo we built a recursive query engine to their recommended we use the B ba AI BGE ranker large this is kind of the old two-step right you uh do embedding based retrieval get the docs and then you rerank them and that's it so you know we had the Nvidia 10K filings we had the department of Ed we used open AI models we used the recommended recursive retriever we use llama parse we use llama index v0.10 and with that I'm going kick it over to the whz to show you exactly how this looks in code and give you a little more nuanced Vibes and information related to what you might be able to expect in your application whiz over to you man thank you Greg yes okay so we're going to go ahead and drop this notebook into the chat so you can follow along and uh we're just going to go through a couple things we're g to start with a a a straightforward kind of uh you know portion of this which is getting actual uh you know llama pars to work uh and then we'll move on to uh creating that uh those retrieval pipelines that we saw those query engines that Greg was describing so uh first things first the Llama parse release comes along with llama Cloud llama cloud is has more than just llama parse but for right now that's what we're going to leverage it uh for uh basically llama parse is exactly as uh was described right it is some proprietary uh algorithm that they're using it is behind an API so we don't have any access to exactly what's happening uh behind the scenes we we can kind of infer what might be happening but the idea is that it's an API that accepts PDFs and it returns documents and it can return those documents in multiple formats one of those formats is markdown and the power of having it return uh markdown is that markdown can help us capture these kinds of you know structural relationships within our documentation we can use llama index's markdown node parser from there to help us really understand what's uh what's going on in these documents so that's great we also have of course v0.1 uh zero so this is v0.10 huge basically you know it is exactly the same kind of thing we've seen uh recently from uh other libraries including Lang chain which is this idea of things were getting kind of you know bloated right we have a lot of different uh we have a lot of different you know possibilities a lot of different things we can do uh and they were kind of glutting up the core library and so those were split uh effectively llama Hub is now the source of Truth for everything community and integration and then llama core focuses on just what llama uh index is supposed to do which is awesome so how do we use the actual cool well first of all we're going to need some uh you know we're going to need some dependencies so we're going to grab llama parse uh from our dependencies uh and then we need to create a llama API key we're going to do this through the Llama Cloud so I'll I'll just zoom in some here so you guys can see it basically when you arrive on this llama index page we can go to our resources in the bottom left which is our API key and then we can generate a new key give it a name and then uh store it somewhere safe so this is easy easy as it gets you'll also notice I'll zoom in a lot here so you can see real clearly you'll also notice that you get quite a few pages per day that you can use with the PDF parser and you'll also notice that this is a PDF parser so right now this is only something that works with PDFs uh there's no other file types that are currently accepted uh and as well we can we can see when we look through the uh the code uh that there's only two return types which is either just plain text or markdown but you get 10,000 pages per day which is pretty awesome uh so let's head back to the notebook once we have our API key we can provide it here uh we'll also be using open AI so we're going to slap our open AI key in here and then we need to do this classic uh you know uh cheat code for Google collab we're going to be using asynchronous uh functions here uh you know there is no other uh way around this we just need to run it like boilerplate if you're running this in a notebook um and that's correct yeah absolutely because it only accepts PDFs unless your file is a PDF you're going to have to find a way to get it into a PDF uh luckily a lot of files are already PDFs or they're easily converted so uh it should it should be uh a lower burden to convert a file to PDF than it is to convert from PDF and that's the problem that this tool is solving so the next thing we're going to do is just initialize llam parse uh couldn't be easier right uh we set up our object we're going to say we want it in markdown this is because we're very keen on that uh that structural relationship I'll just zoom in a little bit here um you know we really want to know what the structured data is saying and we want a way to interface with it uh that preserves that structure right so this plain text idea doesn't really help us do that as well whereas markdown gives us notation that we can use with the markdown node parser to uh to understand structural relationships remot equals true or false it's up to you how much text you want to read uh language there's a number of languages that are supported by default we're going to use English and uh in this example we also be using English and then of course we have a number of workers we are going to uh we're going to go ahead and set two workers because we're going to parse two f right so this is the idea uh you can have up to 10 workers at a time so you can do this kind of in batched sets of 10 uh once we've done that we're going to upload some files to our collab instance pay uh close attention to um you know what you're saving these as right so this is the uh file that you'll need to uh send to laa parts right so if I look in my files here you can see I have uh done this process twice so I have two different vers versions of the files I need to make sure that I have uh the actual correct file right when I go to send it to lamap par so please pay very special attention to this if you've named your file something else you'll have to take the name right from here in order for this to work uh if you're watching this in the future um and we do the same thing for the AI report uh which Greg alluded to earlier that's the kind of artificial intelligence report pretty cool document right it's got sweet graphs got sweet figures we're going to see how well llam par Stacks up the next part is the actual parsing this part's totally opaque to us uh we just send the file to an endpoint and then at some point we get back a response and that response is uh documents that can be parsed easily through uh llama index as they're obviously tightly integrated you don't have to use llama Index right the documents are just uh marked down right so you you can use whatever you'd like past this step but uh obviously llama index is paying special attention to their ecosystem so that's what we're going to stay in today uh again only PDF files this is also a very inconsistent process I've found so sometimes this can take a very long time uh sometimes it doesn't take very long at all right um once you've done it for a file I have found that there's there's likely some kind of backend caching here because each subsequent or repeated uh attempt to to Paras these files is very quick but that first time is very inconsistent uh the AI report took quite a a long time to finish whereas the uh Nvidia 10K filing took much less time so it it's uh you know your mileage may vary I would not build this into a a latency critical application at this time but for offline or batch processing that seems super dope um you know you classic we start a job and then at some point we return into this documents uh which is going to be a list and that list is going to have these objects we can take a peek at them and see that this markdown most assuredly preserves some context right we we there's there's no doubt about that this is a this is a table structure in markdown we we definitely have some idea of structure that's being preserved uh which is very important and desired right so that's that first of all that's that's awesome to see uh we can also look at our AI report which is it has uh I I believe literally zero tables uh there I think there's there's a couple at the end that are very kind of simple but um you can see it's still correctly identifies markdown right I mean this is uh this is important um the idea of the uh the markdown being preserved is huge uh and from chat Matt is suggesting well you know we also get HTML or CSV uh earnings or filings and that's absolutely true and uh if if you're if that's the information you're looking for I think that's probably going to be uh best but if you're looking for a combination of that semantic information and the uh and the actual uh you know structured data I do think that this is an excellent uh resource or in cases where you're looking at reports that don't have those uh examples provided as well of which there are uh unfortunately uh quite a lot so we can we can see it does the thing right I mean it it gives you markdown and the markdown can be leveraged to understand some kind of structure about the document right so that's that's that's great uh let's build a query engine to see if this is actually useful right so uh we can see that it yes this is marked down but can this be leveraged usefully uh so the first thing we're going to do is talk a little bit about llama index v0.10 now if you're used to llama index and you've watched some of our previous events on llama index you'll remember me talking about service context Global context setting context context all of this right gone it's all gone uh you know context is dead we're all about settings now so we still have this idea of like a global settings object that we can set so that we can fall back on specific user set defaults uh but we don't have this idea of a service context that we need to pass around and manage instead we're just going to do that how you'd expect normally by passing things into their their Constructor so we're we're gonna really uh this is this is probably one of my favorite changes from the V 0.10 uh just kind of normalizing that uh this library to the rest of the ecosystem feels really good but we still have settings so we're going to set our base llm as gbt 35 turbo and we're going to set our open AI embeddings uh as text embedding three small which is the uh the successor to Ada uh it is just as good or better than Ada and it costs less so that's why we're using small today um you'll notice that we're using gbt 35 turbo so we're really not we're really not relying on the lm's ability to understand the structure right we're really relying on the retrieval processes uh ability to to uh to correctly represent that structure uh which is something that's a little bit different right if we used gbt 4 here we might have some uh potential potential to say like oh well gbt 4 is just really good at this actually guys like uh I don't know it's uh it's it's it's just gbd 35 is good we're gonna stick with it and then we're gonna use the markdown element parser uh this is the thing that's going to uh you know really make sure that we're uh you know we're squeezing as much juice as we can from these markdown files right the markdown element node parser is specifically built to parse markdown elements right so uh we can see that we have this this entire idea of how to uh to use it their docks are are pretty good on this and the idea is simply that this is going to help us parse out this markdown into the constituent Parts uh so those cons constituent parts are uh you know what allows us to understand the structured versus unstructured nature of the data right so uh we want both we want semantic information to answer questions about semantic questions and we want structured data to answer uh those those kinds of semantic questions that rely on some context that's contained within uh tables or figures right uh all we got to do is run this um when we run the actual get nodes from documents you'll notice that it pretty frequently fails uh this does not mean that the nodes are not created and this does not mean that the actual total uh you know process fails just means that sometimes it's not able to understand the markdown that it received from the llap parse endpoint and so it you know we we see some errors it's not a big deal uh you know this is exactly as Greg showed right we're dealing with a a document that's quite long uh in both cases right so um the idea that there's going to be a few misses is totally expected this is a preview their first shot on goal but it is worth noting that there is some potential that you're you're going to miss right and when you do Miss uh means that we're not fully capturing the structure of our data and that could lead to uh some kind of potential issues uh but if we if we only miss sometime right I mean that's that's obviously much better than if we miss all the time or we miss a lot which is the case of some of the other methods that we've seen uh in the past so this is still still an improvement even if there is still some uh KS to be worked out once we have our nodes we're going to grab our nodes and objects so we can create our Vector store index we're going to have our nodes be those nodes plus those objects uh very important uh page number info is just uh metadata so that that's it there's no um there's nothing to it other than that uh we we have metadata via the nodes uh because we know which page we're on so we've already captured that metadata uh so that's how we we're able to capture that and we have another question in the chat that's uh that I'll just answer since we're we're we're all together right now which is um is there an easy to review way yes absolutely you can tell exactly which nodes failed uh it does does tell you uh the node that had the failure so you're able to go in and check and make sure that that data is uh in in a format that you want uh easily yes okay so we've got our nodes parsed and we accept that a couple of them didn't work out hey it happens right this is new technology once we've got our Vector store set up so this is our Index right now we're going to create our recursive query engine we're going to use the uh reranking process so this the flag embedding ranker uh which is going to be powered by uh the BGE rerer large right and we're going to set up a recursive uh uh query engine uh we're going to be able to install some requirements because we have to uh we need the postprocessor flag embedding ranker and we also need to grab flag and beding from their repo uh once we installed these two uh these two requirements we can initialize our flag and betting ranker a ranker for those of you who are uh who who aren't sure what that is right basically when we get a bunch of of of uh things that are likely related to our query we have the chance to reorder the list using a more compute intensive or timeconsuming process that's more accurate right so you can think of it as we very quickly cast a WID net and then we slowly look through what we've got in that net and we we we take our time to reorder that list so in this case we're going to go from 15 retrieved contexts down to the top five of those 15 right but top five of 15 I mean even if the process takes uh you know a millisecond we're still only talking about you know 15 milliseconds which is not bad um it's not great let's be real but it's not bad and this is the idea of a ranker when it comes to our uh similarity top K we're just going to grab the top uh 15 results and then we are going to rank them uh that's it that's all there you go so we've set up our retrieval we've set up our query engine we've parsed our documents they're all in this index now let's do the thing right so we can ask questions like who is the executive VP operations and how old are they right uh we use the recursive uh uh retrieval engine here so we have a number of requests that kind of parse through these nodes which is pretty dope you can see here uh we this we've got that little table right huge huge you love to see that uh and eventually we wind up with this response Deborah shoquist is the Executive Vice President of Operations she is 69 years year old that's exactly right I mean this this information is not mentioned anywhere else in the document that's huge if you're running this in a notebook that's on a CPU instance you're going to notice this takes a very long time right so uh we're using this BGE ranker which is a uh a pretty beefy algorithm right if you're using it on CPU this query is going to take a long time if you're using this in GPU accelerated instance uh right which you can select through your runtime and then go to change runtime type and then select the GPU instance uh you're going to notice it's a lot faster um so just keep that in mind that this is uh the slowness is not representative of llama index or this technology in general it's basically just uh dependent on which resources you select um that's not true of the actual LL parse end point but it is true of this retrieval process uh and then we can ask questions like what is a gross carrying amount of total amortizable intangible assets for January 29th 2023 what a mouthful right well uh we're able to extract that it's uh you know 3,539 million right uh which is exactly what we see this is that's correct and it's exactly correct and it's right next to information that would be wrong right so the the the power of this application is immediately apparent if you're working with that structured data right we can see that this this process allows us to very Faithfully pull out the correct piece of information cont context and again this is not um uh this is not information that is you know available just through text in the document which is important you have to come to the table to get this information so that's great let's try it on the AI education report which again doesn't have a lot of graphs uh or sorry doesn't have a lot of tables but does have a lot of like graphs and figures right so let's let's ask about those uh all of this is just resetting up the uh retrieval uh process on our um on our actual AI report right and then when we query it we can say how many AI Publications on patter Rec nition was there in 2020 and we get this response of there were 30.07 AI Publications on P recognition in 2020 which is definitely wrong right like it should be in the mid-50s um it should be you know uh well it should be the mid-50s right but we instead we get 30.07 now what's interesting to me is despite 30.07 not being mentioned anywhere else in the actual document we do see that 30.07 is associated with this figure so while it didn't retrieve the correct context didn't answer the question right we can see that it is at least able to parse some information out of this uh out of this figure uh even if it's not there yet right I mean this is the this is the for me the signal that things are going in the right direction even if we haven't literally got there yet that's because we understand something about this we get the right we're you know we're we're Landing in the right country we might not be in the right uh Province yet but we're in the right country and that's uh that's good to see we we ask another question right this one should be a little bit simpler can you describe what figure 14 is related to and we get the response that it's related to the long tale of learner viability in the context of AI education it goes on and on and on uh this is not true uh that is figure 13 but figure 14 is what we see here which is uh unrelated to our to our response so again when it comes to the uh the figures so the more like pictoral representations or the graphs there's still work to be done right uh which is expected uh and and I I think that's clearly communicated well enough in their in their uh blog content but we would still like to see it kind of get to the point where we're able to uh to understand better more pictoral representations of data uh you know as as we go forward it's clearly working on it but we're not quite there yet so I would really view this more as a tool for extracting structured uh kind of tabular data versus this this understanding of images and everything like that um so with that we're gonna send the uh send you guys back to Greg who's going to uh uh close us out yeah and take us to Q&A so uh there you go you're that was rocking Chris thanks so much man um that was LL parse everybody that's where the current state of affairs is and we are going to conclude for the day that you know really out of the box llama parse is a great place to start especially in lie of a custom in-house solution um You probably don't want to build it into a latency critical application today but definitely it's got very very nice tabular extraction specifically for PDFs and again that sort of too many parameters to tune issue that they're trying to address address they are addressing it although it is a sort of a black box to us the proprietary solution which you know um it's not open source but it is easier to use as is the nature of these things as they progress and so you know if you're getting started and you're not really trying to mess with parameters it might be a good solution for you to pick up off the shelf now um it could be definitely a lot faster uh we'd love to see figures and the sort of pictorial representation stuff we'd love to see other data typ of course they're working on it uh the data framework trying to lead the way in the industry has a lot of work to do and a lot of good stuff ahead of them so we look forward to contining to check out the latest and greatest from llama index as they release so with that yes we do have a slid you can uh go ahead and ask your question directly in the slid I'll ask the whiz to come back up for Q&A now and we'll go ahead and keep this link on the screen if you do have questions you can also throw them in the YouTube chat so whiz uh little um sort of clarification here on this kind of rag data stack is more uh Dynamic more complicated than ETL why don't ETL decisions affect the application was I think you know brought up when I was talking about this this language from the uh the Llama index um blog materials where they said you know hey loading processing embedding creating the vector DB these are a situation where every decision directly affects the accuracy in contrast to the classic ETL stack um can you sort of explain the way you think about this and if that's the right way that we should be interpreting this uh I would say like this is a difficult one to to answer cleanly because I believe deeply in my soul that ETL decisions yeah affect the application um I think that it's it's pretty clear that uh there's there's kind of like different axes on which they might impact things right so if we're talking about like performance related or latency related decisions I mean ultimately you know it doesn't matter because we're just gonna we're gonna wind up with a pile of data and then from there we can do the other stuff right um but but how we actually transform especially uh it it can be it can have significant impacts on uh on performance on ability to retrieve everything like that so um I I would say it's it's very much still worth paying attention to and very much still a key part of the plan yeah yeah yeah yeah thank you for that question uh that was Anonymous but uh yeah yeah factchecking um the specific wording is very helpful for us and and I'm sure for llama index as well uh okay so viges a asks what is the difference between llama pars versus multimodal models can multimodal models also do this kind of OCR including tables and I think we we sort of answered this throughout right it's it's not really doing OCR um as far as we can tell is that right I mean I I don't know or it is we have no line of sight so the difference between L PRS versus multimo model is not understood well uh they could be using multimo model on the back end it seems unlikely uh or or it doesn't right like the issue is that we're not quite sure what they wanted it to be able to do uh we don't know what they did to create it so I would say at this point though it seems closer to like a pi PDF or Pi mu PDF tool than it does to like a full multimodal model um just but that's based on uh I just thought of it right there's no like there's no facts that uh that would lead me there so yeah yeah yeah I and we're gonna we're going to keep this discussion going with Islam here keep the questions coming guys um we we' we'd love to continue the conversation here so Islam says in the AI ad use case it retrieved the correct table image but not the correct info do you think we could pass the image to gp4 Vision or just the image for chat Q&A um yeah absolutely right if we can get to the node that has the image and we can associate that node with just literally like a DOT ping or something like that right uh in our uh in whatever Vector store we're using uh that yeah we can we can build that logic in what you know when there's an image sent it to uh some kind of process that will help us understand what the imagees talkot uh yes yeah I I think that's a good uh that's a good thought okay cool let's go to this uh question from Cena aizi can you explain your decision on the recursive query engine as opposed to say a hybrid retrieval bm25 you know um retriever recursive does what we want here very well because we're dealing with this idea of a structured piece of information right we kind of want to laser in on this particular uh you know part of the document and then and then make sure we're able to capture the full table now bm25 and kind of combining like these these sparse search methods is still going to be useful but with the recursive approach we can more generally guarantee that we're going to get access to the full table somewhere in our context and because that's what we need right and and someone mentioned or someone noted as Stephen earlier noted right because we got that kind of expanded relevant context thanks to the way that we set up a recursive retriever we we actually were able to see that it understood that that 3,000 number was was meant to to be in millions right so that's that's a piece of context that it might not have retrieved or seen otherwise which is why I think that recursive approach is very helpful for this specific use case that that we saw today yeah yeah yeah cool cool I want to combine sort of two questions here uh one from uh they're both from Anonymous how how does llama parse for PDFs compared to unstructured doio this is the last event we we did and then have you done a comparison with other open source parsers like llm Sherpa is named here but sort of talk about maybe you know your perspective on this a little bit deeper the real benefit of llama pars is that it's integrated into the Llama index ecosystem for me I think that there are certainly other ways or more engineering heavy ways that you could approach these problems and custom build solutions that are uh that are better for your particular use cases uh but I I feel happy to say things like uh I agree with the benchmarks that they've released uh llama par does feel better at preserving structural relationships than say the the kind of python packages that are meant to do this um that might not be true forever and so the comment might not age well into the future but as of time of recording right like uh I think it's I think it's pretty good um but the main benefit is that it's baked into that ecosystem uh I would say that uh otherwise there there are lots of options you can explore that are that are going to be useful to you yeah yeah um so we've got sort of a question that kind of stacks on you know some of the some of the theories we heard earlier gp4 vision can we do this can we do that how do you chunk PDFs containing tabular data and maybe if I can just contextualize this with another comment charts are the weak link what is happening out there to see a chart and convert it into a kind of reverse prompt to unwind the data so the model can understand um can you talk a little bit about how you think about uh what the best way to chunk PDFs out there is and then what the space is as far as you can see yeah I I mean I think it comes down to we want to have our tables preserved as some whole Chunk we I mean that's just tremendously useful right uh llms are good at kind of seeing or reasoning about charts even gbt 35 turbo right so I I I ultimately I think the the way we want to think about that is that uh you know charts tables all these kind of like figure based elements they're their own node and then our text is nodes around them and then they're connected via some kind of hierarchical metadata where it's like these are figures from this section along with these passages and and that's the way uh in terms of like the prompt you just feed the table in I mean if you if you just take like a markdown table and you show it to gp35 Turbo it's going to be pretty good at answering questions about it and if we need more uh you know say like aggregate information or we need derived information from the chart uh you can hook it up to things like code executors or uh or you know processes that are similar to gb4 is code interpreter where we can actually like load it into some uh you know uh python structure and then do operations on it but um outside of that I think you know we just want to make sure that they're their own unit and that they're considered that and that the relationship between them and the text around them is clear but they're Their Own Thing by itself I think is very useful yeah yeah yeah and sort of this idea of if you can get it into a markdown format then you're going to be able to work with it in a lot of different ways and uh okay so then sort of kind of Stack in on that a little bit um I know it's still a black box for us Islam AP par but you know question from another one out there is how effective is this for sort of uh I interpret this as sort of maintaining the structure of the tables um row span column span like um I don't know I don't know if we're talking about sort of the distance between characters here um I mean it's pretty good it's just a markdown table though so uh you know it is good at preserving the structure of the table it is not good at uh being able to like you would not be able to recreate the table as it appears in the PDF from the markdown that you receive there's z% shot I if it was formatted in a specific way so like things like borderless or left left oriented or right oriented all of these or the colors I mean you're you're losing all of it like it it's just going to show you that there's some table and that table has values and here's how they look uh but that's all we're getting we're getting zero information about uh kind of the the way that it's visually presented um which is by Design frankly uh you know it's not meant to reconstitute the table it's just meant to extract it so we can turn it into something like a CSV and work with it right that's right that's right um okay yeah the great questions keep coming in uh we've got a couple uh let's let's rapid fire these Chris real quick um let's do um can we use LOM index for retrieval and connect it to Lang chain is there any Merit to this why not uh Lang chain can uh can understand uh markdown just fine anything that can convert markdown into into some other useful file format or chunk it and convert it you know llama parse is a great tool because it just spits out a markdown file or it spits out a text file and so you can integrate it into a lot of other pipelines all right next is recursive retrieval similar to parent child retriever kind of yes okay all right and then um and then last question to uh keep up with the times here Tariq from the YouTube chat did is it even worth implementing rag with these large context window models like Gemini I mean should we just throw it out the window yes absolutely it is still worth implementing rag uh period uh we can talk about all of the axis you know that that we can examine that makes this true um cost uh effort memory latency uh accuracy uh confabulation hallucination rag is still a huge powerful component um because it lets us do what we want to do which is answer specific questions about specific things in specific documents and the large context window is is amazing and and it's going to help us do a lot of really awesome things uh but for right now um those things just don't push out Rag and you know what that reminds me and I guess it's a good spot to wrap on is this sort of idea of context augmentation from llama index appears to be sort of telling us that perhaps there are other patterns Beyond rag that um that they may have in mind for context augmentation in the future so uh stay tuned for what happens next the industry is going to continue to evolve and we'll keep you guys up to date and up to speed on the latest and greatest whiz thank you for your Q&A wisdom and thanks everybody for joining today if you're interested in learning about when and how to do fine tuning when you're actually sort of done with rag that's what we're covering next week on Wednesday live same time and of course please like subscribe and ring that Bell to stay up with all events as they drop live or ones that we upload if you are seriously ready to accelerate your llm application development like seriously ready to accelerate then check out our AI engineering boot camp our industry-leading cohort-based live online course where you can fill all your skills gaps from building to operating to improving llm applications in production if you just enjoyed engaging with us in the chat or even just watching the chat you may go ahead and join our Discord uh because we can keep the conversation going in there and get you down the path to starting to build ship and share with us if you are not really somebody that wants to engage but you just want to kind of Tinker on your own we've got a few resources we want to share one is our awesome aim index which you can get direct access to code from all of our events and the the other one is our recently released op sourced llm Ops llms in production cohort one materials including the entire GitHub repo check that out if you're trying to get up to speed this is prev v0.1 stuff from 2023 we look forward to open sourcing more stuff as we move forward into the future finally any feedback that you can provide us is great either through Luma or through our feedback form and as always thank you so much everybody until next time keep building shipping and sharing and we'll do the same we'll see you all real soon have a great week
Info
Channel: AI Makerspace
Views: 12,230
Rating: undefined out of 5
Keywords:
Id: 7qsxz2rURG4
Channel Id: undefined
Length: 61min 8sec (3668 seconds)
Published: Fri Mar 01 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.