How TO CREATE Robust RAG

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello guys welcome back in this video let's talk about how we can make our rag application better I have created so many videos in the past but I'm just showing you let's say the simple applications or simple rag applications without tweaking the components inside it so the my aim in this video is by the way tldr for those of you who think that there is going to be some code implementation there is no code implementation in this video I'm just sharing my my ideas how you can make a simple rag application better in the first half of the video and in the second half of the video let's say I will show you or point you towards the links or the documents from where you can get more information to get or to make your rag application even better let's get started I hope you have already seen this picture in many places and in my videos also I have shared this many times right let's go step by step what you can do to to make your rag application better right first thing first let's start on this left corner I will just draw the first diagram or the I'll point you here first so this is the things first right let's say the example here is of the PDF but you can take this as any any documents so you have a document first thing first is you need to extract the information in this case this is the pages of that PDF and you need to make or split it into different chunks right before splitting into different chunks the first part here is how to read that particular document I have seen most of you asking in my rag videos okay I have uh shown you the example of PDF you were asking how to deal with text files and how to deal with csvs and so on right those are the things that you need to explore Bas on your use case okay what is the loader uh that you can use use in order to extract the information out of that particular particular document right that is the first step that you need to explore because creating a good application in the production is not that easy as me showing you in a simple rag video because I'm just showing you or let's say showing you how to do things but you need to do your bit of resorts in order to make it robust that is the uh extracting part and there is the chunking part right this is the overlapping thing here now you need to split that into different chunks for that also in Lang chain let's take the example of Lang chain you have different ways how to do that into different chunks right you can play around with the numbers and that is also again uh your uh choice or how you want to do it because sometimes the 500 chunk and the chunk overlap of 20 works better for some applications for some applications you get it can be 2,00 and 100 and so on so you need to also play around with this number okay what is the best Chong size for your document right and after that comes the embedding part let me highlight this part now so what is this again there are many embedding models some are open source and some are closed source so you need to be also experimenting with different embedding models okay which one performs the best right the main part here in rag applications is the embeddings right you have a text or document and you want to convert that into the vectors so that the machine understand it and we can perform some kind of semantic shorts into it so better the embedding model better the vector vector embeddings let's say because the embeddings also has different or or let's say different or it has yeah let's say different different token limits it has different context right some of them let's say dimensions in a way easier way because this AR from open has 1536 similar to other they have 5002 or something I will show you in the later half of the video different embedding models and which you can choose out of those but these are the things that you need to consider okay which embedding models to use I I just want to raise this question because in my previous video about nomic Ed text you were asking okay why is it better than or why is it not better why is it so fast and so on like for those things you need to do a little bit of research yourself and go and compared two different models by yourself and you can see the difference just instead of listening from others one if you just go and practice yourself that makes you okay doing by yourself makes you better let's put in that way and once that is done now comes the vector storing part right I will just highlight now this part of the diagram here and once you do the embeddings you have the embeddings that is the vector embeddings and you need to store that somewhere right and there are many Vector databases out there and there are databases or you can even do without the databases also right you just build the semantic index and use it but as you can see I have written here store imings into the semantic index and build semantic index that becomes your knowledge base and the example here is pine cone fire chroma DB this is the previous uh diagram I have taken but now there is quadrant and many things I might create a video about quadrant also in the in in the future because that is the one that many people are talking right now and and now just to point out here also you need to experiment with different let's say Vector databases one thing that you need to notice here is also that in the vector databases different Vector databases uses different algorithms and they have their own unique way of storing the embeddings so they store the embeddings and also as I have mentioned here what is stored here the embeddings and the document chunks meaning that they will be storing the context also along with it uh and the metadata for example the document ID the index so that you can sarch between the embeddings in the right let's say edding that is already being stored in the vector database right the choosing the right embedding model as well as the right Vector database is also what you need to consider now let us go to this user part right I will just go to this part now you are user let's say and you are asking some question and that question now goes to the same embedding API that you have used in the beginning and it does the embedding and now that uh Vector you go into the vector database and you sort that that is when the indexing helps because let's say that you have millions of vector vector embeddings it's difficult to sort into the embedding you have index and different Vector databases does their own trick behind the scene to do fast quering and and providing you the right answer because whatever you sarch into the vector database you get the top K retrieval right so now I can go to this last piece here because you get the top let's say you have top three top 10 right you have you get the top three let's say example retrieved information and that information after you get the vectors then you get the context and that is when you pass into the llms and then it synthesize the answer for us right so meaning that the main part here is the vector database and the embedding models how they are being utilized there are different pieces here and there and I hope now you get little bit more idea how to uh get get into this now the second half of the video let me go to my uh GitHub I have created one video before about llm resources and this is the GitHub repository llm resources I'm expanding this uh repository as I progress and now I think I will be more for putting more effort into this because there is some really good content here and I just want to point to this making rag work properly part right so here these are the things which I just explain you so you need to clean the documents choosing the right paring and also there is Lama Parts I created the video earlier you can refer to the Llama Parts if you are worried about okay how what what document loader you should use so L part takes into your things and provide you the clean document that is also one option there is unstructured which helps you to clean your document right and better chunking strategies I just said you that choosing right iming model choosing the right Vector store passing passing instructions and reranking things also and choosing the right large language models and also I didn't cover that but yeah llms also which LMS to pass right people are comparing let say dpt4 with some open source models which is not there at the moment right yeah there are open source models but still we cannot compare those two things but there are different models appearing in the market so you can just test between those models I will also show you the link here how to do that right so for that here are some of the links I highly suggest you to go through this and also if you want to know if I add more things you can just start or watch this repository also I'll just show you all those things here first there is this chunk visualizer right let me just open this this is in the hugging face spaces as you can see here if you if you don't know what is chunk visualizer or how to do the chunking things in Lang chain this is the example of Lang chain again so this is the document you passed and here we there is example of character text splitter or recursive character text splitter let's just check the character text split and now as you can see here this is the document how it is being chunked let's split into different chunks so there is the 200 chunk length you can play around with this number and there is the overlap there is 10 so if you highlight h on top of this it it will show you which part of that is being splitted into different chunks so you get the idea okay how does the chunking things work in the Lang chain these kind of things it seems simple but if you know uh the basics of these kind of things it helps you to do the better chunking let's say in that way and let me go back there is the tokenizer from open AI so there are different tools already available for you why I want to show you this is because I hope you have heard about the tokenization Ws how many WS has how many tokens and so on right let me just say that okay open AI is not open anymore and I will pass 2024 and Dot now as you can see I passed this sentence here and this sentence has 11 tokens and there are 33 characters but if you go here how many words we have we have 1 2 3 4 5 6 but there are 11 tokens and 33 characters it might be confusing in the beginning how these things work right as you can see here open is different token AI is different token and you can see any more and there is comma that is different token 2024 even a whole year is not a one token so 202 is one token and then four is another token and so on right and the base uh let's say rule of thumb here is also a helpful rule of thumb is that one token generally correspond to approximate four characters of text and then here you can see 100 tokens approximately 75 words so I hope if you haven't been through this uh visualizer I hope it's it's a good uh place where you can just go here and play around with this and see okay how the tokens are being created right and then next there is this embedding things I'm going step by step when I explain you the rag pipeline so now let's that you want to choose the best embeding models and now here is the thing called mte leaderboard in hugging P space from here you can choose what is the best eding model right what is the size of the embeding models okay what is the dimensions which I was referring you before what is the maximum number of tokens and all the different things are being explained here you can just go here you can see if you go here there is text embedding three large from open Ai and even better are on the on the leaderboard here right so there is this mistol 7B instruct there is sfr invading mistol and so and this keeps on changing this is the overall but you can even go with your specific use case let's say you are doing some reranking things right as you can see here in reranking you can just go ahead and choose these models and then just try and see which one performs the best right and then also what is Vector database and how does it work right this is really good let's say the blog post from Pine which explains right Roy I hope I'm pronouncing correctly s developer Advocate from Pine con he has explained really good uh what is Vector database and how does it work with the real example also here right you can just go here and see uh if you are new to Vector databases also and and also by the same guy there is this chunking strategies for llm applications he has also explained here how you can do the chunking things right so having this kind of understanding also helps a lot okay then next is the open llm leaderboard right you can just go ahead and see in the h face okay what is the open llm leaderboard things for some reason it is taking some time to load but you get the idea that from here you can go and see okay what is the open llm leaderboard models and just play around with it right and the next one this is the most important one this is chatbot Arena benchmarking llm in the wild right there are many things here there is the Arena Battle where you just ask question you don't know which model it is and it provides the answer and you know after after after after giving thumbs up or thumbs down which model is it let me just ask it what is open AI just example this is just a random example you just go through this and see okay which one performs the best I'm not going and I'm not going through and reading the text here right now but you you see that it is providing some random things here now I don't know which mod model it is right once it provides the answer it will ask here okay a is better B is better it's a tie or both are bad let's say I want to give B is better right if I do this what it is okay this is cloud A3 Sonet and this is GPT 4106 I didn't read the document by the way I just I just want to show you that this is how you can see okay which one performs the better without knowing in your in your subconscious mind that okay which model you are using this helps a lot to to find okay how the model behaves right so this is one example and if you want to know already the models car is the side by side things just choose the model from here choose the model from here write the prompt see the answer and you can even do the direct chat here there is the Vision Direct chat you can even do with the vision let's say image models here there is the leaderboard this is the leaderboard for the LM that was before I showed you about the embedding models and then there was open llms and now this is uh the llm so there are model 73 there is GPT 4 1106 preview on the top there is second one is 0135 25 preview there is cloud A3 oppus and so and this keeps changing so way to track the things yeah just bookmark these things come here and check which model performs the best and use the models in your application that's what you can do right there are other important documents or blog post now and this blog post which uses llama index code behind the scene and the blog post by W is let's say golden block post here so two rag pen points and propose solution because there was a paper called seven failure points when engineering a r system based on that they have provided the answer not only seven but they have provided pin points uh two different pen points and how you can achieve that using uh llama Index right just go through this document it's really really good document if you want to create uh solid rag applications I can just go through there is missing content miss the top rank documents not in the context not extracted wrong format incorrect specificity incomplete data inj scalability structure data QA data extraction from complex PDFs fallback models and llm security so now you just by just going through the headings you know these are the things that we care when we create our R applications right so there is already a solution for this you can just go through this blog post and there is also the code provided in into it I might even create some videos related to some of the topics here because that's the main thing that you learn by doing so yeah just go through this Implement yourself using the Lama index code and see how your rag application performs it mine performs better right and the next one is optimizing rag with hybrid SS and reranking by the way this is also already in this TW rack pinpoints but this three blog post I liked it because it has explained really good way let's say in that way improving rag performance from knowledge graphs and then enhancing rag with multi-agent system so if I just click this one this is the blog post in the vector Hub so optimizing rag with hybrid SS and reranking I hope you know reranking things but yeah just go through this it has explained with diagrams okay how does uh it work and also there is Lang chain examples with it what do you need more there is someone explaining you things with the code also as a example so thank you for all the people who are writing the blog post with clear explanations as well as the code so yeah improving rag performance with knowledge grafts so yeah just go through this and there is enhancing rag with a multi-agent system so there is Agents also that you can do right so yeah there is also things and there is also the code as I said you here it is using the autogen agents so yeah this is just a simple video I just wanted to create I think it already went over 20 minutes or approx 20 minutes but it doesn't matter how long it went but now I hope you get somehow the understanding how to make your rag applications better because creating a simple rag application is just copying the code from someone and then running and say okay Wella there is the code running you have a rack applications and then you start seeing okay why don't it perform better yeah it didn't perform better because when you use the normal V Vana rag applications there are so many things to be tweaked and it is not uh tweaked similar to when you train a machine learning models if you just use the default parameters it doesn't perform well for all the applications you need to modify the parameters based on your application similar to this in llms also in order to have the best answer for your applications you need to tweak between different parameters you need to find what is the PIN points what is the thing that is called in the problem for your rag application to perform not better let's say in that way so yeah just follow some of the approaches and give but don't try to fulfill all or or not fulfill but don't try to implement all the all the solutions at once then you don't know which one was causing the issue just go step by step go with the first part okay chunk kings use some of the existing provider chunking and so on and so on right now I hope you get the idea how to to improve things if you like these kind of videos please let me know in the comment section I will create more of these knowledge sharing kind of things in the in the future videos also if you like the video give thumbs up if you haven't subscribed yet please do so it motivates me to create the good content in the future also so yeah thank you for watching and see you in the next video

Info

Channel: Data Science Basics

Views: 1,407

Rating: undefined out of 5

Keywords: llm, chat, chat models, chain, agents, chat with any tabular data, create chart with llm, markdown, chat with your data, rag, chat with pdf, llamaindex, what is llamaindex, ai, LLM, rag to prod, openai, AI, RAG, rag llm, rag ai, llm rag, langchain, llama, metaphor, rags with web search, gpts, opengpts, llamaindex in nutsell, What is llamaindex, GPTs, llamaparser, parse pdf, llamacloud, groq, qdrant cloud, mixtral, document parsing, parsing, chainlit, rag in action, better rag, how to make rag better

Id: wIXdGief8hc

Channel Id: undefined

Length: 21min 20sec (1280 seconds)

Published: Fri Mar 22 2024