Chat with any CSV or Excel using langchain and Open AI | English

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey hello everyone welcome back to my channel welcome back to local lobat uh for a change in this video we are going to do it in English uh because uh a lot of my friends in Europe and Indonesia have requested uh to make some content in English uh so this video which because it's around open Ai and large language models we are going to do it in English uh I'll continue making some videos in Hindi as well uh but this video very specifically will be in English uh in this video what we are going to do is we going to do a similar thing uh like the previous one where we created a chat bot uh with which you can chat with any PDF similarly in this video uh we are going to create a chat bot with which you can chat with any CSV file or an Excel file uh now one important question that comes is like why do you want to do it uh one of the reasons is that uh if let's say you are into stock trading you're buying stocks and you're buying mutual funds for example you might want to download all the data of companies and see their Fund performance or company performance of The Last 5 Years or last 3 years these are usually in the form of CSV files or Excel files similarly a lot of people also have this habit that uh they track their financials still on an Excel or a CSV file so you could upload your financials uh to this chatbot and ask questions like hey what was my spend in the last one month or uh how much did I spend on groceries in the month of September for example similarly still a lot of friends that I know do their itary planning and their entire trip planning on an Excel file or a CSV file so you could share this file upload it into the chatbot and then ask questions like hey where are we going on day two of our Vietnam DP so all these things will become possible when you have this chatbot and uh one important Point here is that we are going to build this chatbot using typescript or JavaScript and also using Lang chain uh what is Lang chain I will tell you in detail uh let me start sharing my screen with all of you first so here we are uh Lang chain is uh is basically a framework uh that provides a layer of abstraction for us uh for example if you're building a chatbot you might need a CSV parser right the very first thing you will need for a CSV chatbot is you will need a CSV parser you will need some way to convert it into text and then because a CSV can be very huge you might want to break it down into smaller parts as well similarly after this you would want to save it in some database and then you would also need to connect it to a large language model so Lang chain is a framework that provides you some abstractions basically some simple functions or wrappers to do all these things very easily if I have to write this all manually it becomes very difficult so we'll be using Lang chin which does this behind some wrappers or some abstractions and makes our life easier so without any further Ado we'll start coding uh the first thing that we need to do is I've set up a repo already I'll share this in the description of this video but what we have to do is we have to install Lang chain in our repo so I'll quickly install Lang chain here and while this is being installed uh what our CSV is going to do or what our chatbot is going to do is we are going to upload it this CSV which is basically an sap of 500 companies basically in simpler terms this is the uh dividend yield or the eitaa or the price sales of certain companies we are going to upload the CSV to our chatbot then ask questions related to the CSV so the process is going to look something like this what we are going to do is we are going to load our CSV file via some docu doent loader and once our file is loaded we are going to use something called as a text splitter to split it into smaller chunks so let's focus on this particular section first this is the one that we need to focus on first okay so how we are going to do is this we are going to use a document loader that is provided by langine Chain so very simple we go to Lang chain and the first thing that I would recommend everybody to do is learn how to use a documentation really well so you see there's a search box here we are going to search for document loaders so it will give us some let's say document loader data connection here and on the left hand side we see there is a section for Integrations and in Integrations there are file loaders and the kind of file that we are dealing with today is called as a CSV file so I will click on CSV file and the first thing it is asking me to do is install this package now what this package does is it's basically helping you par the CSV easily Lang chain internally will use this so that is why it is asking us to install this package so I'll copy paste this go to my terminal I think Lang chain is still being installed here so while while that is happening we will see our diagram here that after uploading the Cs file we will split it into smaller chunks the reason we split it into smaller chunks is because let's say your CSV file is 10,000 rows long in order to save it or in order to read through it you might want to break it down into smaller chunks of let's say thousand rows each because the search would become easier I'll explain later how it will become easier but we would want to break it down into smaller chunks and we will do that using something something called as a text splitter so we need to First install this package let's say if Lang chain is installed Lang chain is already installed we'll go here and install this package also this is also installed for us let's uh go here and the next step it wants us is because in our diagram we see we need a document loader so as I told you Lang chain provides some wrappers on top of it so this is the wrapper so every document loader has a loader function that is available so we will first import the CSV loader in our code I've already initialized an app. TS file so I'll go here initialize my CSV loader actually import my CSV loader now I need to as per the documentation I need to initialize this with the path to my document so I will go again here and copy paste this and the path to my document is inside this documents directory I've already saved my CSV here I will get the name from here and I will say it is inside documents sl. CSV and finally what I have to do is I need to load this so I will say await loader. load and this will load my docs file load my CSV document in order to see this what I'll do is I'll do console.log docs and let me clear this and run npm run Dev let's see if till this point we are correct so I think it is loaded this data and you can see all the document has come here Center Point company has come here Century Link telecommunication Services has come here so perfect so far we're doing well the next step in our process is if we go to the diagram it is splitting it into ch chunks so again we go back to the documentation we are done with document loaders so we will close it and then we will go to document Transformers and here we see there is text Splitters and we need something called as recursively split by text so again similarly we'll import this function that is provided by line chain and then the next step we are going to do is we are going to initialize this function so let's initialize this function and let's say we set the chunk size to be 1,000 and the chunk overlap to be 200 now chunk size in very simple terms is that you have a 10,000 row column uh 10,000 row CSV let's say you're breaking it down into chunk sizes of th000 each it is slightly more complicated than that but for Simplicity we're breaking it down by let's say every thousand row is a separate chunk for me for example and the overlap is that two chunks can have similarities between them for example uh thousand row can have an overlap with let's say the next row there so we initialize a splitter and finally what we do is we actually call this function and split our documents so we name this splitted docs and we say split documents and we pass it our documents that we initialize here so at this point we have got the documents that have been split and we have got chunks chunks are created now we go to the next step of our diagram and see what we have to do after the chunks are created we need to create embeddings API we need to call the embeddings API and then embeddings will be created now what are embeddings it's a more complicated concept but in simpler terms if I have to explain uh when you have location data location data is usually a lat long coordinate right so similarly embeddings are also coordinates in that way but instead of it being only X and Y it is actually x y z X1 y1 Z1 so the coordinates are huge so basically they have a lot of uh XYZ coordinates which can be plotted on a vector space in simpler language if this was a 2d graph your embeddings would something be like this like one embedding would be here the other embedding would be here third embedding maybe would be here and we are going to search something by how far one embedding is from the other so you see if this is here and let's say there is one more embedding here we can see that in terms of how close something is this and this embedding so let's say this embedding is for cheese let's say for example this embedding is for butter and this embedding is let's say for the color red and this embedding is let's say for milk so when I search for milk the closest search that I will get is cheese and butter and not red because cheese and butter are more closely linked to milk and that is why their embedding in the vector space would also be closer to each other and embeddings usually have more than three coordinates like it it's not always XYZ it is a lot of coordinates here so for Simplicity I've only done this but there are a lot of coordinates and once we have the embeddings we will save it in a vector DB now Vector DBS are also a lot of them that are available we'll talk about them but let's first go and create embeddings we will again go back to the documentation and close document Transformers and then go to text embedding models in Integrations you see there are a lot of embedding models that are available we are going to choose something that chat GPT uses basically open AI so we click on open Ai and we again the very first thing that we are going to do is we are going to import the embeddings so we import the embeddings and then we initialize it with our key so we go here we are going to initialize it let's remove the batch size from here and our API key will be read from an environment variable so we'll save that here so this will be our environment variable by which the embeddings from open AI will be read think until this we are clear that we have create our embeddings now we are going to save it in some Vector store so we have to go inside Vector store and there are so many Vector stores that are available there is memory analytics DB one of them which is very famous is spine code but in this video we are going to use a vector store called as hns wib because it's an inmemory Vector store I don't have to set up any Cloud uh database or anything like that it's an inmemory Vector store so it's easy to set up so what we going to do is we are going to install this Library first while this is happening we'll see how to create how to store your embeddings inside the vector store so what we have to do is it's a simple process we create a vector store like this we go here and say first we need to import it so we are going to import this library and then we are going to say not from text but from documents and we are going to pass our documents here and our open AI embeddings here and this is how our Vector store is created so this until this point our Vector store is created our data is now stored in the vector store our embeddings are stored inside the store now the next thing that we need to know is how do we ask question to this chatbot so for that we are going to look at this diagram so as a user when you ask a question very similar to how we create the data into embeddings and save it similarly every question is also converted into an embedding which is using the same open AI embedding API Lang chain does it behind the scene using rappers but it creates calls the API creates an embedding and then based on some similarity search or santic search it will go to the vector DB return the matching documents for you which basically means there could be your asking for let's say give me the financial for Reliance but for Reliance it could be Reliance Geo Reliance electric Reliance fiber or could be any of them right so it will give you some matching documents which are basically the chunks that we created and finally to extract a proper answer out of these multiple chunks that we have we will pass this chunks that we have got to something called as an llm model which is a large language model if you have used chat GPT it is basically GPT 3.5 turbo or if you have GPT 4 that can also be used hugging face also provides a lot of large language models Facebook also has its own large language model in this example we are going to use GPT 3.5 turbo so we will pass all these chunks to our language model which will translate it into an answer and send it to our users how are we going to do that very simple we will go back again to our library the first thing that we will need is something called as chains and uh we will use the popular chain which is retrival QA so we will import this chain here because we need a large language model also so we will import open AI as well because this model will give us GPT 3.5 turbo so what we'll do is we'll first create the model here we will say model new open Ai and we will say that the model name would be GPT 3.5 I think the name is this let me verify it here it is GPT 3.5 turbo okay so now my model my large language model is initialized init llm model now I need some way to extract the embeddings from my Vector store so I'll say I need a vector store retriver and that will be using again going back here that is basically like this so I'll go here and create a vector store dri and finally I will create a chain which will chain my model with the vector store retrival so I need to import this chain which I have already done so I'll say retrival QA chain from llm pass it the model that I have and also pass it the vectors to to rri that we have now this has created a chain and now I can ask any question to it so my question would be let's say let me go back to my CSV file that I have this is the CSV file and I will ask questions like what was the eitaa for uh Advanced Auto Parts okay so let me ask this question what is the AA sorry a beta for advaned Auto Parts answer would be something like this await chain and I have to call this chain with the query and the query is basically my question and then I will console.log both the question and the answer answer okay let's try to run this now and we Face an error saying open AI key not found the reason is I am using an environment variable here I have initialized my key inside EnV file but I need to load a package so that that environment variable can be fed into this file now that library is called as the en V we'll go to en EnV this is loading let's go to the package itself we need to install it like this so let's install EnV then we are going to import it here import Star as. EnV from EnV and then ask it to import all the environment variables defined inside EnV file so now if you see I've have created a QA chain I've asked the question I've asked the answer and then we are going to run this now and let's see if we get the correct answer or not it takes a bit of a second and uh hopefully it should run what is the ABA for Advan Auto Part it has exactly return as the ABA for Advan auto part is here which is correct we can verify it as well the ABA is this okay perfect let's ask one more question what is the dividend yield of the same company so we will go here we'll ask one more question and then I will also show you a UI which is using the chart bot like this what is the dividend yield for Advance autop part pass this question here here and then console.log the question console.log the answer and let's console log both the question and the answer that we have let's run it again and see if it works and it will take few seconds to execute the first answer should be correct which is nice good and the dividend yield has also come out so both of the answers we have got now I think this completes the chatbot part of things that we have created first thing we are using a CSV loader to load our CSV document then we are splitting it into chunks using a recursive character text splitter then we are calling the open AI embedding API to create embeddings and then we are saying store it in the vector store which is this finally to question and answer to this chatbot we are initializing our open AI model via GPT 3.5 turbo we are creating a retriver to get the embeddings out of the vector store and then we are connecting the model with the vector store retrival via our retrival QA chain and then via this chain we are actually asking questions and getting answers now I'll show you a sample UI where we are going to upload our CSV file and our CSV file is here we upload the CSV we click on the upload button and I'm going to ask the same question that I did in that demo so upload is success and now we are going to ask the same question what is the ab beta for what was it for advanc auto parts for Advanced Auto Parts and let's see if the answer comes out correct perfect it is $853 some dollar that has come out let's ask our next question also what is the dividend yield and uh hopefully that will also come out to be correct perfect uh but overall if you like the video uh please do subscribe to the channel uh if you want to read how to apply for jobs in Europe there are certain videos that I have done uh if you want to see how is the life in Europe there are certain videos that I've done here and if you want to see a similar video like this where I've created a chat bot to talk to a PDF uh this is the video that you should refer to and I hope you like the content and I'll be making many more such videos in the future thank you

Info

Channel: local lo baat

Views: 743

Rating: undefined out of 5

Keywords: chatgpt, csv reader, chatbot, llm, gpt3.5, langchain

Id: X_v837zhxfE

Channel Id: undefined

Length: 23min 39sec (1419 seconds)

Published: Mon Oct 02 2023