Build your own RAG (retrieval augmented generation) AI Chatbot using Python | Simple walkthrough

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so today's video is about retrieval augmented generation or rag rag has been very popular lately and people are implementing rag in their own NLP studies or in their own chat BS or also in their own task where they really want to Leverage The Power of large language model what is rag about and why people really need rag so rag comes into play when you want to make your large language model relevant so that it gives much more relevant responses to your queries you want to make your large language model up to date with the latest happenings or you want to make the large language model being aware of your external databases or your knowledge basis and also to overcome hallucination increase the response quality all this thing rag being super powerful for that and of course that's one of the reason why it's such a trending topic and mostly all of the Enterprises they would love to have rag in their products but this whole whole drag idea this whole retrieval augumented generation idea is not today's idea it has been existing for the past 2 three years so if you go to this paper of retrieval argumented generation for knowledge and Performing intensive natural language processing task this paper was published in 2022 May and it has been published by I think from few of the authors or the researchers from meta a big shout out to them for this amazing paper so this SC concept comes from there it's already been there let's look into this idea of rag in a very simplified form so one part is Generation which is the easiest part I would say to understand at least because we are using chbt right and we pass a query and the large language model it generates responses that's the generation part so a query is there it passes by the large language model I'm not going through what is happening under the hood it will create a response now in draag we call as retrieval so retrieval means we're retrieving some kind of context or some kind of like passage or some kind of similarity uh sentences from somewhere so this retrieval can be from your external database from your index data by that I mean when you embed all your uh pages of PDF or you embed a whole chunk of uh txt file all these things getting indexed in Vector stores so basically retrieve from Vector databases that's kind of a common practice because Vector databases are pretty powerful in natural language processing so Vector database such as pine cone or wave super powerful for that so we are indexing our data to Vector database and from there we retrieving so this is where retrieval comes and what we retrieve here we use this another layer what we call as similarity searches so we're retrieving the few similar queries have relevant to the user our end users query we are retrieving those part those section of it so that's where the similarity search is retrieved from the vector databases so that's the second part right we know response or the generation of responses we now know what is a trial in a very like broader sense I'm speaking about the next part is argumentation what does argument argumentation means it means enriching something argument something so what are we argumenting here we're argumenting the user query which we are sending to the our large language model so basically whatever prompt we are passing to large language model all this time and how we are argumenting so what we are argumenting the prompt how we are argumenting we are argumenting by passing few extra sentences or keywords so if we if we focus on this part we'll see we're argumenting mostly we're argumenting with the similarity search which you already found by rri we argumenting with system Pro we are argumenting a context which you would like or extra prompt which we as end user would like to implement all this thing we're adding into our prompt now we are no more having a single line prompt we are passing the prompt with the extra context extra system prompt and now this prompt this powerful prompt which we generate bya retrieval and gets augumented is now passed by the large language model and now the respon which comes via this will be much more powerful will be much more relevant to our inder's query because inder already passes a very relevant prompt the response won't hallucinate it is giving a much better response to the query that's why rag now becomes a very powerful for any chat boats or any production level chat boots which you want to create or develop this is a rag in overview rag can be much more complicated there have been a lot of ways L chain or Lama index they handling it very nice these two Frameworks I have a lot of videos out there also with lanen I'm also make trying to make videos with Lama index so those two particular Frameworks are really harnessing the power of rat maybe we could also look into them in future videos but I think it's key to understand these two Core Concepts or three Core Concepts of responses the retrieval part and also the argumented part so that's why today's video is more about the code basis or the few LS of code which we need for this two main aspects and we will see the chatboard in action and I also guide you to the GitHub link also this blog which I written a few days back and I think if you just go through it I hope it will be much more clear the whole perspective of rag let's do that so as we discuss our rag needs a knowledge base right so in our case our knowledge base are nothing but PDF file our EX external knowledge base you can see there is a browse folder will allow us to upload PDF files the st. file uploader if I just go here s file uploader it accepts multiple file so after uploading file if there rpdf file it goes through all the PDF file that's why I use this for Loop and it creates a vector DB as I mentioned before our rag needs to have a vector database that's super important right from from where the retrieval happens for having this Vector database we need to store or index them so this is the function which index it and it also create the vector database but as you see out here get index for PDF file that is done by another function so that's where I'll take you guys here now first so this is the brain of the as I call so in my this particular gith repository you'll find the the entire code base also is this brain. Pi you will mainly find the way to par this PDF file to pars the entire PDF file convert the text to documentation this whole thing is done by Lan chain this recursive character text splitter then it also uses the text splitter further and it creates a documentation then it converts the docs to Index this is very crucial because here we use something called as fast CPU uh which you'll see the requirements. text that fast CPU The Tick token is used uh this is very important to index any file which you have done before as well and then we get the index of the file finally and once we index we create the vector database so I'll just come to our app again here we say we create the vector database that's the whole plan and that's why when there is a PDF file there we create the vector database super so let me just upload a PDF file already you can see if it works or not so if I browse and upload a single file now Lang chain Wikipedia PDF just weet try with this next part is about prompt argumentation by that what we mean we will have a prompt template we will have lot of context from the retrieval so this Vector database will be retrieved search for similarity searches and then it will be augumented the prompt will be argumented there right so we need to lay the foundation for that so we have a pro template like this for a helpful assistant who answers to us this question with of multiple context given to you keep your answer short and to the point the evidence are the context of the PDF extracted the metadata because we are passing the PDF file in the PDF file name so those are the metad datas which you pass to it carefully focus on the Met especially the file L and page whenever answering make sure to add file L and page number when you're citing it so this is how I wrote The Prompt so that's a part of the prompt argumentation I'm already laying the foundation for it's still not over we'll come to that as well and then there are few things about stream lead you know chat Elements which I discuss in my other videos as well I won't go that much uh but it's all about appending and creating a chat Bo style format a chat UI that's why I'm using this for Loop to dump every time and session state is to store each and every prompt and answer prompt and answer that's how it is stor every time so that you know it gives a chat like elements okay so there is a place where you can ask question using the chat input this particular code block allow this place you see ask anything if I just change that ask me about uh PDF you'll see just it should change ask me with the PDF right that's the part now if there is a question we see if our Vector our Vector database is already created or not that's why we are storing it in session State that's why we don't need to every time create a vector database because embedding indexing it requires money because that's how the embedding model works we don't want to vectorize every time that's why it's important that we store it over one session or we store over a a vector database right now we're not using a vector database we just storing over a session so that we don't index every time that's the whole part of this part if there is not a vector database we just stop it cool but the important part is here the part where we'll perform the similarity search it is here you see now what I will do is I will comment the rest of the part just run this run this whole part I am doing this whole stuff in data button I have couple of other videos as well which uses the stream L UI and it actually helps to build everything on the top of it and there are very much separation of libraries uh jobs storage you know you can actually installing libraries like this out here you see how I install the whole libraries so we'll just search so so now we have already let's say we upload langin Wikipedia it just got index and now I ask what is langin you see what it gives us are the relevancy of it the similarity search based on it so it took few context of it from this document Lang chain this is this and it gives us the relevance of it okay again another context of it so these are the two relevant ones it found out from here so what we are doing we are searching for the relevancy out we are the search results here so the what was the idea out here so we get the relevancy we argument The Prompt now once we get this part we concatenate the whole stuff here that's why I wrote in this blog once we get the similarity search we concatenate them send as a context to The Prompt or we aument The Prompt we update our prompt out here the prompt template or we argument The Prompt here with this extract the PDF extract which we just got here from here plus the prom template which is nothing but this part so what we're doing in reality we are performing this part of our rag we're augmenting the promt the key part of it with all the relevant from The Document Plus the prom template once we do it it's much more straightforward so we have more or less out here right now where augmentation of the prom takes place you see and now the next part comes is passing this prompt to our large language model or to our pre-trained model right that's where we'll use open AI so the next part is about adding user question create a placeholder for our message and this is a part where we'll stream and finally added to a session State then we what to doing using the chat completion. create method this method is using the model GPT 3.5 turbo and we pass the message as the prompt which we find out here architecture we pass the pr by GPD 3.5 is done in this few lines of code here you see and what we expect we expect the streaming of response because we keep it as a true and it will stream the response out here and whatever response comes here the result we give it back to a session state to make a chat like user interface very straightforward let's try to implement that so now I will ask who is the uh founder of L chin we see what's happening under the hood so first it sees looks for the relevancy as you can see looks for all the relevancy out here as we said out it searches from the vector DB it don't create Vector DB every time because it create one because we're using session State once the vector DB is there it will retrieve all the simility search this is a simility search in a documentation list of documents it arguments The Prompt it arguments this part of the prompt template what but we give it this prompt with the whole context and finally it gives the very correct answer a very relevant answer let's try with another tougher question maybe I don't know if it's tough enough or not or how much was the L raise or something let's see because I think Wikipedia has that uh particular context so let's see if it can give us an answer you see langin was funded with 20 millions of funding from S capital and from where it got those this is the part from where it got those relevancy out of here so you see it always try to extract the relevancy from the vector database it starts with a simility search where we use this amazing uh way of fast CPU where use fast CPU to get the simility search and finally the simility search is passed to our particular prompt We Are augmenting The Prompt and then using a model like G 3.5 we are kind of generating our response the generation stays same like we do in chity what it has enhanced it has enhanced the relevancy of our dependent or our external dependency such as external knowledge base it has enhanced The Prompt it has argumented the prompt with those contact and the as a result the whole response which is generated from here is much more reliable and that's why rag is very powerful for your next AI product or your uh next Enterprise product rag is a must maybe you have used in different ways but maybe this terminology is very different but this retrieval and argumentation part that's something I wanted to highlight and final generation part which we see always so I hope you guys like this video and this GitHub code will be out there this particular rack chatboard will be in the description as well as this whole uh blog post where I covered each and every part and compared with a simple chatboard by the way that's super useful to understand why a simple chatboard W work and why a rack enhan chatboard is more powerful and also I wrot the Tweet or how I started to build this whole app everything is in this blog post is much more elaborative way okay and also references to them is even in this blog po so please check them out let me know what you guys think about it and if you guys like it please like and share this video leave the comment and I will be covering more videos about it cheers

Info

Channel: Avra

Views: 15,710

Rating: undefined out of 5

Keywords: chatbot python, gpt 4, gpt 3, gpt 3 api, gpt3 turbo, openai, openai gpt 4, openai api, langchain, gpt chatbot, streamlit, streamlit tutorial, streamlit python, chatgpt python, chatgpt 4, chatgpt api python, chatgpt, chatbot gpt, chatbot, chatbot prompts, chatbot tutorial, gpt 4 demo, rag, retrieval augmented generation, RAG chatbot, rag finetuning, langchain tutorials, langchain rag, machine learning, ai, artificial intelligence, natural language processing, semantic search

Id: Yh1GEWqgkt0

Channel Id: undefined

Length: 16min 41sec (1001 seconds)

Published: Fri Nov 03 2023