LangChain & GPT 4 For Data Analysis: The Pandas Dataframe Agent

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
link chain a framework for building applications powered by language models it aims to help developers do two things the first one is to connect language models to new sources of data the second one is to make language models authentic to let them take actions in this video we're going to use the python version of the framework and we're going to explore the pandas data frame agent to try to understand what the future of data analysis might hold to do that we're going to use a lang chain wrapper around gbt4 to analyze and extract insights from data in a pandas data frame with thousands of rows the data we're going to analyze is an e-commerce data set with customer orders and I'm going to show you how we can use these tools to seriously speed up the data analysis process to get started we only need to pip install pandas langchin and Dot in and then we drop the API key from open AI into an environment file so the first thing I'm going to do is I'm going to load the API key using load that app and find that in and then I'm going to import pandas then I'm going to import chat open API from link chain chat models and I'm going to import create pandas data frame agent from langchain Agents we're going to use this as a wrapper around gbt4 which is a chat model alright so let's have a look at the raw data we want to analyze we have a CSV file with 13 300 and something orders with an order ID a timestamp for the order a subtotal price and a costume ID and we're just going to use pandas to import this CSV file into a pandas data frame and here we have it thirteen thousand three hundred and thirty nine orders and of course we wouldn't be able to paste a file of the size into the chat gbt prompt next up we're going to instantiate the chat model gbt4 and we're going to instantiate the agent and note that the agent takes the chat model as an input as well as the data frame and this is all there is to it now we're actually ready to have the agent start analyzing the data so we're going to start off with a simple question we're going to ask what is the total revenue generated from all the orders we can see that a chain has now been executed this is going to take a few seconds and here we have the total return by the agent so this wasn't a problem we even get the pandas code that the agent used so let's try a slightly harder question what is the average order value this is a metric that any e-commerce business wants to track and again no problem we get the results it takes the mean of the column subtotal price so let's try to increase the level of difficulty once again let's ask what the repeat order rate is this involves both finding the total number of customers and the total number of customers that have made a repeat order and this time it fails let's have a look at the error all right so we see this with the language models sometimes that we get an output that is non-actionable digging into the error right now so I'm just going to write what is the overall repeat order rate and sometimes that's all it takes to have the language model reevaluate and return what you want all right so it gives us the total number of orders and then the number of repeat orders and then divides those two numbers and I guess technically you could call that a repeat order rate but it's not really what we're after we usually want the fraction of customers that have made a repeat order so I'm just going to give that input to the model and now we see that it counts unique customers and then the number of customers that have made a repeat order and then it calculates the repeat order rate by dividing the two numbers like we usually Define it in e-commerce all right so in the last example I'm going to try to go even harder let's try to have the language model do an rfm segmentation of the customers and again we see that we get the reasoning behind the approach and it seems like the agent actually knows what to do and it actually creates an rfm data frame based on the order data I think that's really impressive so this is without a doubt going to speed up data analysis work going forward and before I end this video I just want to make a few comments on where I see this going in the next three to six months so what we can do now is we can have an agent analyze data in this case the pandas data frame agent by asking specific questions and it will try to return an answer to the question you're asking but what we really want is we want to be able to work with the agent so not only do we want the agent to return an answer but we want it to return the data frame that comes out of the analysis so that you as a data analyst a data scientist can continue the work and then you might ask the agent again at some point about something new and what we really want is we want to establish a feedback loop another thing we're going to see is we're going to have the agent reference external web pages so think about when you write code you usually look up documentation online we can have Lang chain do that the framework already includes a python requests tool that can have an agent extract information from a web page and now it's only a matter of building this into the data analysis process alright that's it for now if you enjoyed this video like And subscribe thanks for watching
Info
Channel: Rabbitmetrics
Views: 38,420
Rating: undefined out of 5
Keywords: langchain, langchain chatgpt, langchain tutorial, langchain pandas, gpt 4, large language models, data analysis, data analyst, python pandas, python pandas dataframe, data science python, langchain in python, langchain ai, data analysis with python
Id: rFQ5Kmkd4jc
Channel Id: undefined
Length: 5min 52sec (352 seconds)
Published: Mon Apr 17 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.