Advanced RAG with Llama 3 in Langchain | Chat with PDF using Free Embeddings, Reranker & LlamaParse

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in this video you're going to learn how to build an advanced retrieval augmented generation system in order to chat with a custom PDF file we're going to be using W Tre and other open model in order to create our Rock application let's get started if you want to follow along there is a complete text tutorial that is available for ML expert Pro subscribers and it is within the boot camp and then advanced track with W 3 in longchain here you can find the complete text tutorial along with explanations for each of the separate parts of the tutorial along with the source code that is required to reproduce the results so please consider subscribing to ml expert pro thank you while there exists a lot of approaches to building an advanced track applications in this video I'm going to show you how you can use open models and L chain in order to build your own Advanced R we're going to be using w three along with other models for embeddings and reranking and we're going to be wrapping everything within a nice L chain chain in order to chat with a custom PDF file so here is an overview of what we're going to cover first we're going to have a look at the essential architecture and the main components of that architecture then I'm going to show you how you can build your knowledge base this is one of the components then uh we are going to have a brief look at the ranker that we're going to be using and finally we're going to wrap everything within a Q&A chain from L chain and we're going to initiate our L which is going to be using the gro API and it's going to be w three our Advanced arak application is going to be buil using three main components and those are the knowledge base the ranker or ranker and the watch language model in our case the knowledge bait is going to be containing a couple of sub components first of those is going to be the parser and each job is going to be creating some structured information from a PDF file at least in our case this structured information or text is going to be then processed by a text splitter which is going to be creating chunks of the text and then those chunks are going to be created into an embedding vectors and this is going to be done using a embedding model and the output of those vectors or those vectors themselves are going to be stored into a vector database the vector database that we're going to be using is going to be also responsible for finding similar or relevant documents based on our users query and then these documents are going to be passed to our ranker or ranker each job is going to be to do a pairwise ranking and does filter out irrelevant documents and sort the more relevant documents on the top then these documents are going to be passed to our watch language model along with a custom prompt that we're going to be creating and then the result is going to be processed by our L chain chain and return to the user the lest component that we are having in our Rock application is going to be the knowledge base and the main job of this knowledge base is going to be extracting some structured data from a PDF in our case and then we are going to split the data or the text into chunks and after those chunks are created we are going to be using an embedding model to create embedding vectors for these chunks and then save them into a vector database then we are going to be doing reranking and this is done in order to get a better sorting or ordering of the relevant documents that are provided by the vector database based on a user query and this must give you more relevant results or filter out irrelevant results as well but this is very important component since if you don't get relevant results after the ranker then essentially your would not be able to respond with uh a sufficiently good information and the final component is going to be the chain that is going to be uh question and answering chain which is going to be using w three in L chain and this is essentially the front end or the front facing part of your application or user facing part and this is actually the combination of everything that we have as components the knowledge base the rear Anor and then the watch language model in our case w three and of course this requires some prompt engineering and yeah there are a lot of templates or uh default prompts for those chains but in our case I'm going to be using a slightly tweaked version for our AR application so to build our AR application I'm going to be using a Google quap notebook and in it I'm going to install all of the required dependencies we're going to be using lank chain so I'm going to install that along with one chain Gro since we are going to be using W 3 via the gro API then I'm going to using one more part to par the document quadrant client since we're going to be using this Vector database unstructured markdown water fast embed for the embedding and Flash rank for the ranker model so we have a lot of imports here and uh pretty much the only API key that we are going to be needing or are the for the gro and then for one pars of course uh since one pars is not open source so the document that we're going to be using is uh provided on uh Google drive or my personal Google Drive and for the document itself here is it this is the meta reports first quarter earnings or results and uh the important thing about this document itself is that it is pretty complex it has some text formatting it has a lot of tables uh a lot of structured text and some formatting and we will see how well one one pars is going to be able to pass these type of documents while length chain allows you to use a lot of different PDF ERS I found that W pars is providing pretty much the best possible responses or results for a bit more complex PDFs in our case we're going to be working with a financial document and in those types of documents you often see tables or figures or some uh very intricate formatting of the text like bullet points or text or tables within tables and text within tables Etc and those types of structures need to be formatted a bit better and in order to do that uh one pars is I think doing a great job with that to use one pars to prompt our document you need to essentially instantiate a client W up parts in here you need to pass in your API key I'm going to specify that I want a markdown document as a response and uh here is some parsing instructions oh it is it is pretty much a very basic prompt that I want this uh parer to actually extract some text and uh some very well structured data uh we are giving it a Max timeout for the API and then I'm calling the aw data this is essentially an a synchronous task and I'm pretty much just giving the PDF file that I've shown you previously and this should take roughly two to 3 seconds in order to pass the complete document and uh this since this is just a single document I'm going to show you the response uh and from that you can see that this is indeed a very nice formatted markdown file uh in here we have uh the P table which is uh great and we have some bullet points Etc the response here is not visible as good as it is in the original markdown file the markdown parser or displayer presenter within the Google clap is a bit buggy uh and the response is actually much better compared to what we can see here so I'm pretty uh happy with the responses from one one parts and the final step right here is to actually save the markdown file as a file within the data part document and we since we're going to be using this in a bit next I'm going to show you how you can create essentially Vector embeddings for the document and the first part is going to be the actually W the markdown document using the unstructured markdown water and then from that water I'm going to be essentially creating a recursive character text splitter this will get chunks of 2,000 48 characters and it's going to be splitting those with some overlap of 128 characters and this is going to be essentially splitting our document into 10 different smaller chunks or subdocuments if you will and here we have the first chunk uh you can see this is essentially equivalent to the 248 characters so this is a sample Chun and in our case I'm going to be using flak embeddings and uh those embeddings are pretty much specific to the English language uh they have multilingual version I believe but in our case I'm going to be using the base model and this is pretty roughly speaking very optimized model it's only 220 megabytes and it appears to be doing some great jobs on the embedding site at least according to the leaderboard for the embeddings that I've seen seen from Hing face and the fast embed embeddings is actually using uh if you look over here the on NX essentially uh optimized version of the embeddings and I've run this notebook on uh CPU so it appears to be working very fast and I'm pretty happy with the responses uh it takes roughly one to 2 seconds in order to create the embeddings which is very fast for such an embedding model and I'm happy with that and from here I'm actually using the quadrant uh Vector database and this database is actually at least the client for it is open source and it is available on GitHub they have a cloud provided version as well but in our case I'm going to be just storing the database into a local pot for the database uh you need to also provide a collection name in our case I'm going to be specifying just document edings and then I'm going to passing be passing in the documents and the embeddings so the embeddings is actually going to be just the embeddings model right here next I'm going to try to entally prompt the database and you'll see that with these 10 documents it takes roughly 500 milliseconds in order to get the similar documents and uh what is the most important innovation from metm and this was essentially the query that I've tried out and uh here you can see that we get the responses I've just trimmed the to the first 256 characters uh and as well you can see that the responses contain score and uh as essentially the higher the score the more relevant the document at least according to the I believe that the SIM it search is using probably cosine similarity under the hood not really sure about the Quant implementation but probably something like that and those types of measures essentially can give you a score between zero and one and if the score is close or one essentially you have Prett perfect match or very much higher similarity compared to L scores next I'm going to convert this database of our embeddings into a Retriever and this retriever we are going to be using for the longchain chain in itself and the keyword that I'm specifying here is that I want this to return the top five responses and again you can take a look at the documents and uh you can see that at least with this retriever invoking uh you don't have any scoring at least uh with this approach so next I'm going to be adding the re ranker and uh as I've stated previously we're going to be using for re Rank and this is an open source library that is essentially providing a lot of rankers and here I'm going to be using the macro uh reer uh and this one it is only 22 megabytes and it is very fast as well and you can see that essentially how this work is this is a wrapper or a decorator on top of the base Retriever and then you are essentially passing in the ranker here and the base Retriever and uh just invoking this you now get re ranked documents uh and yeah you see here that it takes roughly 3 seconds actually in order to get the response for this query and uh these are the results you can now see that the ranker is actually again containing some metadata relevant score for uh the ranking of the documents which is uh great next we are going to be looking at the watch language model and in our case again we are going to be using the W 3 70 billion parameter model and we're going to be using the gro API and here is pretty much the custom prompt that I'm going to be providing to our chain and this is going to be taking the context which is going to be coming from the L chain relevant documents and the question this is going to be the question that we're going to be uh providing by the user and this is essentially the prompt you can have a look at it it's pretty standard I just want the model to have a look at the context that I'm providing and I want it to provide some helpful information if applicable and I don't want this model to hallucinate or hallucinate and yeah this is a big problem with those types of AR applications and you can't really be sure if or not the model is hallucinating but still uh those types of prompts are somewhat helpful when you try to mitigate this risk but be aware of it and finally we're going to be creating this chain so for the chain we are going to be using the watch language model the essentially the ranking on top of the retriever The Prompt and then uh I want this to be ver both and the chain type is stuff so essentially this will tell the long chain library that we want everything to be put into the prompt along with the relevant documents and I'm going to see show you what uh this is doing under the hood uh thanks to the verbos through and here is the first question that I'm passing into the question and answering machine or our application what is the most significant Innovation from meta and from here you see that this is essentially getting the prompt as I've told you thus far and this is essentially the context for the top three documents that we are providing here and based on that you see that the response from wry in this by the way took roughly 7 Seconds the most significant Innovation from meta is the new version of meta with W 3 which is mentioned in the quote from Mark Zuckerberg Additionally the press release highlights meta progress on building the metaverse and let's go to the PDF original PDF and here you'll see at least at the start it's been a good start of the year the new version of meta AI with W 3 is another step towards building words leading Ai and then steady progress building the metaverse as well so uh it appears that the model picked up uh at least according to the document and a full of threat it's picking up very well what is happening under the hood then I'm going to essentially turn off the verbosity what is the revenue for 2024 and percentage change and here I'm essentially going to be asking the model to extract some data from this table and the revenue is this number 36455 let's see 36455 million is this in millions yeah it says it Millions all right so it picked up even that this is actually in millions and the percentage year over year is 27 yeah so you can see that this is actually correct as well great then what is the revenue for 2023 the revenue for 2023 is 28645 that is correct again and this is increase for the addition information so it is actually providing us with a relevant additional information thanks to the prompt that I've given it uh this is a bit more verbos of course but uh in our case I just want to have a look at whether or not the model is going to be hallucinating some data so how much is the revenue minus the cost and expenses for 2024 calculate the answer so this is essentially going to be this number minus this number and this should I believe be the result let's see uh yeah it appears to be doing a good job as and that this is correct at least appears to be and let's see now pretty much the same question for 2023 so 28 Minus 7 yeah it appears to be doing a mistake right here it is 28 - 7 and it should be sorry it should be 28 - 21 something and the result should be 7 7227 yeah it actually made a mistake here okay so this one got it wrong what is the expected revenue for the second quarter of 2024 the expected revenue for the second quarter of 2024 is the range of 36 to 39 billion so let's see if we can get this from the document yeah the to to be in the range of 36 to 39 billion okay so it appears to be correct the quote is correct what is the overall Outlook of q1 2024 the overall Outlook is positive according to Mark Zuckerberg it's been a good start of the year the company reports strong financial results with Revenue increasing by 27% year by year to 3646 billion net income has also increased by 117% to 12.37 billion okay this again seems very reasonable and the chart with the document appears to be working quite well so this is it for this video we've seen how you can build an advanced AR application in order to chat with a custom PDF file we've seen how you can create or build your old knowledge base for your R how you can use a ranker and how you can use w three in order to essentially wrap everything into a l chain chain or a r windin l chain in order to chat with a custom PDF file thanks for watching guys please like share and subscribe also join the Discord channel that I'm going to link down into the description and I'm going to see you in the next one bye

Info

Channel: Venelin Valkov

Views: 5,601

Rating: undefined out of 5

Keywords: Machine Learning, Artificial Intelligence, Data Science, Deep Learning

Id: HkG06wBbTPM

Channel Id: undefined

Length: 22min 9sec (1329 seconds)

Published: Sun May 12 2024