💬 OpenAI Chat with Excel CSV using LangChain

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you can chat with CSV file and Excel file using land chain and open Ai and I'm going to teach you how to do that in this video first I would like to quickly show you a demo this demo is from a file called Pokemon so this is a file that has got a lot of Pokemon related information looks like this it has got a bulbaser and all these Pokemon names I'm I'm not sure like the only thing probably I know is Pikachu so I'm going to look for Pikachu so Pikachu is electric some of the details 320 and all these things with respect to the column so what I'm going to do is I'm going to first ask the system that what is Pikachu stats Pikachu stats okay so this is the question that I've asked to the CSV and using open a and launching and we're going to see the response the responses says Pikachu has a total stat of 320 okay let's go verify Pikachu has a total stat of 320 cool that's correct and then HP of 35 attack of 55 35 55 and then we go on and then see we have got a lot more information most likely we can assume probably it is correct because the speed of 90 and the speed of 90 and we have got one and false so that means that this system whatever we have built here is not hallucinating it is exactly answering from the input CSV so you can chat with the input CSV using land chain and open Ai and how do we build that that is exactly what this tutorial is going to be this code is not my original code I came across this code on Twitter which was quite popular as you can see this is a viral tweet and I was quite interested in replicating this result for my own purpose so what this code is doing here is it's trying to use Lang chain which is an amazing Library if you want to deal with large language models and of course the king of large language models at this point open AI so we're going to use Lang chain and open AI to build a system where we can chat with CSV file and Excel if you have got an Excel converted to CSV the easiest way to do it and use with this system the code that I'm going to share this entire code will be shared in the YouTube description so you don't have to take any notes while you are watching the video all you have to do is watch the video till the end so you know what exactly we are doing at every single point the first thing that you need to do is you need to install lamp chain and open AI chroma DB was a dependency I got an error that chroma DB was not available so I decided to add chroma DB to my pip install so pip install in the quiet mode land chain open Ai and chromodb once all your installation is finished then you have to import the required in libraries from langchain dot document loaders import CSV loader which is going to help us load the CSV file then we need to create a vector store index X so for that we are importing Vector store index Creator then we need to create a question and answering retrieval system so for that we are importing retrieval q a which is one of the chains from launching chain and most importantly we need open AI key to interact with all these things so we're going to import from langchain.lms import open here and for us to use open AI we need to enter open AI API key so for that we are importing OS operating system so that we can set an environment key you can click this link go to this link and create a new opening API key if you have not and then paste that open a API key here and run this particular cell so this cell will add the open AI API key into your environment variable so that people won't see your API key make sure you never share your API key with anybody now after this we are going to download the CSV file the CSV I found it available on a gist file so we're going to directly go to the gist and download the CSV file into our Google collab environment so when I W get we're going to get the CSV file into our Google collab environment and as you can see doing this band which is to run a shell script A bash script W kit and the CSV file and we have downloaded the CSV file which is about like 43 KBs after we have downloaded the CSV file now we need to load the document so the CSV file at this point is only present inside our local environment which in this case I'm running it on Google column so inside Google collab if you're running it on your local computer so it is inside your local computer but still it is not loaded into the current session so for that we need to use CSV loader and give the file path with pokemon.csv in my particular case the CSV is currently in my home folder like wherever this code is getting run so I'm just giving the relative path but if you are running this on your local computer then make sure that you give the absolute path so that the current session does not get confused so for example if your CSV is inside a place like this see colon slash let's say my documents slash and wherever that location is then give the absolute path if you are learning running it on your local computer if you are running this on Google collab then this should exactly work fine so loader is equal to CSV loader file path is equal to Pokemon dot CSV in quotes so at this point we have created a loaded document now we're going to use the loaded document with this loader object and create an index so the loaded document we're going to create index what type of index are we creating we are creating a vector store index so we're going to create an object called index Creator and the type of index that you are creating is a vector so your index so we're going to use Vector store index Creator we just imported and we are going to create an object called index Creator and using that we're going to create another object called doc search that is going to come from index.creator from loaders and the loader file if you've got multiple csvs then you can add multiple CSV here so we have got the doc search object so it looks like it is using a duck DB in the backend for loading the tabular data here so the next thing is at this point we have our open AI environment successfully set with the API key we have loaded our input CSV file in fact before that we downloaded the input CSV we loaded the input CSV and we have successfully created the index now all we have to do is we have to create a question and answering chain using the index that we created so retrieval Qi is a question on answering chain from link chain we're going to use retrieval QA and then we are saying that we are going to use the large language model from openai and the chain type is stuff and the retriever is going to be this so from Doc search dot Vector store as a retriever and what type of input we are going to give which is question because we're building a question and answering system now this chain is created now all we have to do is create a query and send it to this chain the chain will get a response back and then we can print the response let me quickly give you a run through of this cold code and then we will see a couple of more examples and I'm also going to highlight a problem that I found so now first thing is we're going to install the required libraries line chain open a chroma DB then we're going to import all the required libraries set up the open AI environment key if you are doing it on your local machine all you have to do is this only once you don't have to repeat it every time you run this code but if you are going to do it on Google collab you have to repeat this every time you open this particular session the next thing is download the CSV file or if you already have the CSV file give the right path of the CSV file here and create the loader object then create the index here using Vector store index Creator and then create a retrieval q a chain using Lang chain and while you are creating the retrieval q a chain most important thing is for you to Define that you are going to use the open AI large language model and this particular index that you just created as your retriever after that you have to define a query which is the question that you want to ask a chat with the CSV file and then the answer that you want to get using the chain which is the response and then finally print the response you're going to get the output in itself so I can in fact say print the response of result that will actually print the response of the resultant itself not the entire question so now let's go ahead and ask another question to this particular CSV file so I want to pick another another let say Pokemon here let's pick Hood food okay I'm going to pick hoot suit go back here and ask what is hoot suit stats run the question Center to chain get the response back print the result says has a 60 HP 30 attack 30 defense let's go see who toot has ideally has different information and that for us is the hallucination oh no it is actually right 60 30 60 and 30 30 defense so the 60 is HP let's go and then see the first second third fourth fifth sixth first second third fourth fifth sixth so correct like I'm so sorry AI don't mistake me because you are correct so we have 60 HP 30 attack 30 defense and we have all the information about hot food let's ask the question for something else some would ask the same question for um bulbuzer the first one I've liked many years back when I watched Pokemon I remember ask the question then see Bulbasaur has a 45 HP 49 attack 49 defense so go see 45 HP 49 attack 49 defense so absolutely working fine so now can I ask slightly more sophisticated question like what is the most powerful Pokemon in terms of HP run this question so what I'm expecting is to give me the largest highest HP so it says curium black curium has the highest HP stat so let me go here search for qm black qm as the highest HP 125 125 125 so it looks like it is it is actually done a really good job so the next thing that I'm going to ask is the least the least powerful Pokemon in terms of HP it's going to answer me so it says leap has the lowest HP with 66 lileep has the lowest HP with 66 various leap okay 66 is the lowest is it I can see the lowest with something else also so this is the part where I wanted to highlight that this system is not completely robust for certain questions it does really good like for example if you go ask like in this data set there is nothing related to age so I can say what is the least powerful Pokemon in terms of age most likely it will say Okay Pikachu since its first generation as the lowest start okay it says start so let me ask the question what is average age what is the average age and when I ask this question it it actually answers so even then you can see that it is being held it is hallucinating because there is nothing related to age here but it is still hallucinating so the point that I'm trying to make here is this system is good as long as you can add a final layer on top of this that can validate if it is hallucinating because if it does not then you can see that the data set that we have given does not have anything to do with age like the data set that we have given in fact we can ask what are the columns in the what are the columns in the document and it is going to tell us all the columns that are in the document name type 1 type 2 total HP attack defense special attack special defense speed generation and legendary oh I think when I ask for age it probably assumed generation so let me ask look for Pikachu okay so maybe maybe I don't know maybe it looks for Generation I'm not sure but my point is this system actually hallucinates a lot so if you're going to use it then you need to have some kind of validation layer and then it should understand or it should validate that the answer coming out of this is not um it's it's it's not actually you know some some random hallucination let me ask another question do you have a column called age ask the question get the response okay no no we don't have a column called age so this is good so you need to implement some kind of validation like this when you get user input but other than that um for for a lot of other tests that I did the system seems to be working quite fine so why did I make this video even when I thought this system is not perfect first I wanted to give you the liberty of playing around with this thing because this is quite exciting a lot of Enterprise data in a lot of companies are still CSV files or tabular files are from SQL so if you want to use that this is perfect solution for you to ask question with that but again a lot of Enterprise files you don't want to have hallucination with that when you ask question and answer so consider this more like a prototype or learning exercise where you can build a chart system with CSV and Excel file like I said Excel you can convert it to CS we start using it using Lan chain and open a do not trust everything that comes out of the system make sure that you have a proper validation and a proper check to see whether this works fine other than that this is an amazing system like we have come so far especially with the Lang chain and open a how we can interact with the kind of documents that we can interact with so I'm really excited about this future I'll link this document as a file in the GitHub repository so you can directly start using it other than that the only thing that you need is open a API key which you can get from this particular link I hope this tutorial was helpful to you in learning how to create a solution a python code that can help you to chat with CSV and Excel using land chain and open AI if you have any question let me know in the comment section otherwise see you in another video Happy programming
Info
Channel: 1littlecoder
Views: 24,837
Rating: undefined out of 5
Keywords:
Id: nr-mDSi9LxA
Channel Id: undefined
Length: 15min 3sec (903 seconds)
Published: Sun Apr 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.