Full-stack AI chatbot with custom data with Next.js, Langchain, Pinecone and OpenAI

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone in this video we are going to create a chatbot that answers questions based on our custom datum let's take a look what we are going to build as an immigrant in Netherlands I have decided to build my chatbot Based on data about immigration from the official website so this is the website that I'm using as my custom knowledge base I scrubbed all the information from this website from our English version of this website and I made a custom knowledge base um from this information and then I've created a chatbot that uses this data to answer questions so it's suggesting me to ask any legal questions about immigration and let's try to ask something for example I can ask what documents do I need for partner Visa for partner visa and it should understand that I'm talking about Netherlands because this is the only information that it has awesome it answered the question and listed the documents that I need for partner Visa in the Netherlands and it also provided the sources that it used to answer this question so if I uh go to these links um I will be directed to the official website and the information it received so I should set my nationality here to learn more and same with frequently asked questions and frequently asked questions again but um yeah I guess this time they're different yeah it's service and contact and before it was just frequently asked questions and let's try now to ask a follow-up question for example can I get it if my partner or no it's a student tourism and I intentionally said it instead of partner Visa because I want to test if it understands that I'm still talking about partner Visa let's try this question yes it actually understood that I'm still talking about partner visa and it is saying me that no it's not possible to obtain this partner Visa if a partner has a student visa and also for the document sources that I can read to learn more so that's what we are going to build today and I'm going to share that quote also the front end part so YouTube both can be as pretty as mine to make this chatbot we are going to use land chain pine cone as a vector store database and open AI as the large language model so let's first discuss pinecon and Vector store databases what are these in order for our chatbot to get a custom knowledge we have to provide it and we do this by providing a chunk of context information directly with the prompts so instead of retraining their large language model on custom knowledge base which is also possible but more difficult and have its own limitations we are going to use context information so together with the user prompt for example what documents do I need for partner Visa we are also providing context information about this partner Visa so first uh when we get this user prompt we go to our Vector store database where our knowledge is contained and we search for relevant information so we search for residence permit for partner and any other chunks of information that mention partner Visa we get this information from the vector store and then we send it together with the user prompt to the large language model so large language model no that context and then it is able to answer the question based on this context that's how we use Vector store in our chatbot now let's discuss how to build this Vector store in this video we are not going into the details because I was already explaining it in the previous video so if you don't know how to build a vector store please refer to my previous video where I explained how to do that in detail and I've also provided a code for you so uh you can repeat it yourself but I'll just briefly explain but I'll just briefly explain the concept so we get this website and we scrap it so we get all the information from this website and to do this we're also going to use LINE chain and when we get all this information from the website then we are going to split it in small chunks of data so small chunks of text these chunks of text we are going to store in Vector store and then retrieve it when necessary so when I'm asking the question about partner Visa I'm searching for chunks of text that are similar to my query you can use different types of data to create Vector store in my case I was using website but you can also use PDF documents and many other things so if we go to lunch and documentation ah we can actually see in document loaders um there are a lot of things that you can use to create your vector store I was using the website information but as you can see you can use Microsoft Word document you can use Google Drive you can use Microsoft Excel and many many other things so just find that type of data that you are interested in and then you get the information how to create a vector store or refer to my previous video where I explained a step by step how to do that and let's let's go back to pinecon because we need this in the future Pinecone is the place where I store my Vector store there are many other service providers that you can use to store your vector store or you can even store it by yourself uh in my case I'm using pine cone now let's finally go to the code here's our code and in this video I'm going to explain uh what we have here and explain different parts of this project if you want a step-by-step tutorial then please let me know by fill in the form so I know that there are enough people interested in it and then I will create a course where I will go step by step by creating different types of vector store and then by creating that chatbot step by step I was already mentioning this course in the previous video but we didn't get enough people in order for me to do that so if you are interested please let me know and if there will be enough people then I will do that of course so let's review what we have here uh I'm using for this project I'm using new next.js and to be honest I don't know how this new nextges better than the previous version in my personal opinion I prefer the previous version but just in case they will deprecate the old versions of nexjs I I've decided to use the latest version in page document we have our chatbots on the simple form that Maps the messages it has the messages from the AI assistant and it has the message from the user the user then we have downloading state and the input itself and the button sent so nothing complex here and when we send the message then we invoke handle click function let's see what it does first we check if the message exists then if it exists then we combine it with the message history that we already have here we initialize our history and the first message is already set up so when we go to our chatbot this first message is already there uh it has the role and content and that's it and then we combine it with the new message which has off the role in this case reuser because that's what we have sent and the message from this form then we call Api and send query and history to this API call and then receive the response let's go now to API file to see what we have here so that's what we call when we handle click in this call we first receive our question and history from what we have sent here so uh here we sent query and history and then we receive it here then we don't need console log let's comment it and then we call our chain so how does it work I have a separate file chain where I initialize it and here I just call it let's go to line chain documentation to understand what is happening here I also want to refer to the documentation so you understand better how it works and you will be able to customize it according to your needs I'm using conversational retriever question answering chain for that so what it does it it answers questions so question answering chain pre-season based on custom knowledge base here they have example where they built a version from scratch so they built a vector store and then they initialize this Vector store and then they call it in the chain our Vector store is already in pinecones so we don't need this part and if you don't know how to create a vector store please refer to the previous video where I explained in detail how to do that and we just have to create our chain and find our Vector store so we called our API and we called our chain providing question and chat history and that is what is happening here when we create our chain then we can call it and we will see how we create our chain in a moment it's in separate file so in this CPA call we just call it we already have a chain and here we just providing question and chat history exactly like they are doing here the question and chat history that's it when we receive the answer we get the result with the text itself with the answer and with the metadata and one of these metadata is Source documents let me show you what is in the result let's let's ask our chatbot something else for example who can get a highly skilled migrant Visa it transferred the question and let's see what we have here in the terminal I'll make it bigger so this is the last question that we sent last stands for uh quite Messy as you can see let's let's go to the top uh here's our answer those who want to work blah blah blah so uh this is the answer and it is under text and then we have Source documents you remember I was explaining you that we are sending relevant chunks of data as the context to the large language model so in this result it returns uh this chance of data that we were sending and it returns several documents so every chunk of data is a call document and it has the content itself as you can see my content is quite messy with a lot of ants that's because I just scraped the data from the website and I didn't clean it uh and then it has metadata and we are interested in this metadata the metadata is stored in sort documents uh in metadata we have links the uh their Source in my case it's links because I was using the website if you were using PDF for example then the source would be something else so just console log what you have under search documents console log what you have under metadata and see what what you have here in my case I want to extract links from this Source documents and that's what I'm doing here and then I'm returning the answer in the message format we have the role in this case assistant we have the content which is result.txt and we have the links that we handled here and we return it back here when we get the response then we update our history and we just set load into false that's basically it what we are doing here let's now see what we have in the chain so here we just call it as I was explaining earlier and now let's just see what is inside this chain I'll chain is located here under utils here we are initializing chain and again it is all described in launching documents here they are also initializing it so they initialize Vector store in their case they are using another Vector store in my case I'm using pinecon and then they initialize the chain that's what I'm doing here first I'm choosing the model and I believe here yes they are also set up the model first I'm choosing the model in our case open AI then we get the vector store from pinecon and all the details again in the launching documentation you can go to pinecon and see how to get this Vector store we initialize this pinecon Vector store and initialize the chain model and Vector store as the Retriever and in our case we want to return Source documents that's why we set it to true uh that's that's what also is written here they initialize the chain they use the model open AI they use Vector store as their achiever and in our case we also return our source documents and let's now see how we um initialize pinecon store because here again uh we just call it here and um what I have done here all right and then we initialize it here in pinecon client also under utils to get our Pinecone index we have to provide all the necessary information like a Pinecone environment name pinecon API key and also index name so uh here in chain we are also using index name and that's what you have to provide in environment file I am providing you with the environment dot example file where you have to fill all your keys from the pinecon and from open uh API when you fill your keys you just rename it you delete all the name and you delete this part like example and also dot I already have one that's why it doesn't allow me to do that and then you have your keys how to find the keys uh we have our pine cone index in my case IND so under Pinecone uh index name I will call it IND in my case and then we have API key here we have environment name here and the API key itself here environment you put here API you put here same with open AI API you just go to um open AI itself and in platform.openai.com you go to your account here and view API keys and you just create a new API key um give it a name in my case demo create secret key copy it here and then done now I have my demo key and you put it here that's all what you have to do here and then just don't forget to rename it back to environment and that's it now let's recap what we have here in dot ends we are setting our API keys and environment and index name from pinecon in pinecon client then we initialize our pine cone um so it is here we just initialize it again you can just refer to the documentation um to see how to initialize pinecon and that's it we are returning Pinecone that we are using then in our chain in the chain we create a vector store with this pinecon index so we have pinecon index we call pinecon and we find our index then we create our Vector store and we create our conversational retriever uh question answering chain with the model from openai with the vector store and we are asking to return Source documents this chain we are using in our API so here we call this chain we provide question and chat history and then we return the response that we then get here in our page or as in previous nexjs index.tsx and that's it in this file we have more small functions that I'm just using to make a resin look nicer but nothing functional or related to a line chain or large language models I will provide the link for GitHub repository in the description to this video in next video I'm planning to iterate on Vector stores we are going to create more Vector stores in Pinecone using different types of data in the previous video I was already explaining how to create a vector store in Pinecone using website data in the next video I'm going to explain more how to create different types of vector store with different types of data so if you want to learn more about Vector source and artificial intelligence in general then subscribe to this channel if this video was helpful don't forget to click the like button and if you want a step-by-step tutorial please let me know by fill in the form that I will also leave in the description and if we have enough people I will create the detailed tutorial and in this tutorial I'm going to increase the complexity I'm going to use streaming so here we were using just loading and we were waiting for the answer we will try to use streaming and we will also use the prompt template that will allow us to influence how our chatbot answers first the questions because currently it's all handled by the land chain and with prompt template we will have much more control on our chatbot for example we can set the tone in which tone we wanted to answer we can set the format for example you don't want the text but for some reason you want Json that you map later or we can limit it to answer only particular types of question so this is what is going to be in this tutorial if you are interested please fill the form and I see you next time

Info

Channel: Irina Nik

Views: 7,110

Rating: undefined out of 5

Keywords: UX/UI design, User experience, Design, Design course

Id: AMc2A5Abj3M

Channel Id: undefined

Length: 23min 29sec (1409 seconds)

Published: Fri Aug 18 2023