PandasAI - Data Analysis Made Easy (Powered by OpenAI)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video is sponsored by brilliant recently a new library for data science was released its name is pandas Ai and pandas AI is a python library that integrates artificial intelligence capabilities Independence making data frames conversational this means that you can talk with your data frame and get answers quickly by only giving simple prompts and you can even automatically generate plots with prompts and in this video I'll show you how to work with pandas AI so let's get started okay here I have a Jupiter notebook of a pandas AI demo that you can get from the official GitHub repository and well here you can see how to get started with pandas Ai and you can see some simple examples but I'm not gonna use this one I'm gonna use a real data set that I was working in another tutorial and this data set is going to help us see the strong points and the weak points of this library because this library is currently not perfect it has some issues and we'll see how this Library works so I use this template to get started with this tutorial I copied some lines of code and I pasted in my Jupiter notebook here first we have to install pandas AI so I'm going to run this one I already have this Library so I'm going to continue with the next step and this is the template we have to use to start with pandas AI first we have to import pandas then we have to import pandas Ai and then we have to import open Ai and this is because I'm going to use the open AI API key to work with pandas AI so we have to use these three libraries and then I'm going to use this data frame that has some sales data from a supermarket and I'm going to choose these three columns so I'm going to quickly show you how this data frame looks and here's the data frame we have three columns gender product line and the total so basically this shows how much female and male spend on each product line and this is the data set we're going to work with okay now to continue with this tutorial then we have to get our open AI API key and this one we get from the open AI website and well we have to go to this website which I'm gonna leave on the description and we have to click on view API keys and here we have to create a new secret key then we have to instantiate and open AI object and well this is going to be the llm and also a pandas AI object and here we're using the llm from open AI so we have our pandas AI object and with this we can start talking with our data frame so I'm going to run this and now I'm going to use the pandas AI object so here I write pandas underscore or Ai and then I'm going to use that run and the first argument we have to pass is DF and the second is the prom parameter so I write prompt and here we have to write any query we have for example let's see the data frame again here's a data frame and now let's ask pandas AI which unique products are in the product line column so the only thing we have to do is go here in the prompt and type our question so here right which products are in product line so product line is my column and I want to see the unique products so if I run this we should get our answer as if we were working with chat DPT so let's let's give it a second and we have the answer so the product line includes this product this other product angle we get all the products and to verify this is correct I'm gonna use that unique function and see which columns are here so we see that Health and Beauty Is Here electronic accessories and the rest of the products so this is correct we got the correct answer and we got this only by asking a simple question and now let's continue by asking a more complex questions to see the capabilities of pandas AI so in this case I'm going to use again the Run function and pass in that DF data frame I have and in the prompt I'm going to ask pandas AI for example I want to calculate the total spent by each gender so as you might remember we have this data frame and I want to see how much female and male is spent in total so if I want to know this with pandas AI we can ask a simple question like calculate the total is pinned by each gender so this is our prompt and now I'm going to run this and now we have the answer we got the total spanned by females and the total Spain by males and if you use that group buy and that's some function you can see that this answer is correct but the thing is when I try to make a plot using pandas AI I didn't get this right answer so to show you this much better I'm going to ask pandas AI to make a plot uh with this data and here is my prompt so it's plot a bar plot that shows the total spanned by each gender and in this case the question is basically similar to the previous one we already know the amount expand by each gender but now we want to see it in a bar plot and it should get it right but in previous test I did it didn't get it right let's see if now we get the right answer and here we have the bar plot so here as you can see we have female and mail and we have a title total spanned by gender and in the mail we get a value close to this amount 155 but in for female we get a value close to 600 which is inaccurate because here is 167 and well it doesn't make any sense I mean a pandas AI could get these amounts right but at the moment that we asked to make a bar plot it didn't get the it didn't get it right the amounts are not correct and the previous test I did I found that if you give pandas AI the data frame with the calculation already made it can give you the right answer but if it has to make that calculation on its own and then make the plot it doesn't get the answer right and I'm going to show you what I'm talking about in the next example where we're gonna make a pivot table using pandas AI okay in this example I want to try to create a pivot table with the following prompt calculate the total spent on each product line by both the male and female gender so with this I'm trying to generate a pivot table that shows how much both male and female expand on each product line and I'm gonna run this and see the results and while Panda's AI gave us the answer but it gave us in a form of a text and we didn't get the actual pivot table which isn't bad but if we want to keep working with the with the data we need a pivot table for example I was planning to make a bar plot with a pivot table but if I get only this text I won't be able to do this so this is one of the drawbacks of pandas AI it's more for talking with a data frame but when it comes to generating something like a pivot table or a plot it doesn't always get it right or it doesn't get the result you expect but anyway pandas AI is still got the numbers right and to verify this I use the pivot table function and with this I can get the pivot table that I was expecting to get and we see that these 33 170 is correct it's here and with this we can verify that the numbers are right but we actually didn't get a pivot table but something cool you can do is use this data frame this pivot table data frame that I created manually to generate a plot because as I mentioned before if you give pandas AI a data frame that is already processed this data frame actually has the pivot table I was looking for pandas AI can generate the plot and to show you this much better I'm going to use again the pandas AI object but in this case I'm not gonna pass in the DF that we used before not this one not this DF that we started working with but that pivot table which is another data frame that we created that only contains the pivot table that we wanted and then I write prompt and here I'm gonna ask pandas AI to make a bar plot that shows how much money each gender is pants on each product line so now if I run this we should get the power plot with the values we wanted and here's the bar plot in this case the female ml bars are divided and we see for example that for female we have a peak in 33 approximately and the lowest point is in 17 and if we go to our pivot table we see that this data is right and well we can verify that again pandas AI can automatically generate your plot if you give the data frame with the data already processed and well with this we can see that pandas AI is a good tool to use together with pandas but it's not a replacement for pandas because it sometimes makes some mistakes and you still need to know how to write code to guide pandas AI which means that you still need to learn how to code if you want to become a data analyst or a data scientist and speaking of that a good app I used to learn data science is brilliant.org which is a sponsor of this video brilliant is the best way to learn data science interactively it has thousands of lessons from foundational and advanced math to data science with new lessons added monthly something I like about brilliant is that it helps you learn how to think for example in the computer science course you'll learn the fundamentals interactively this will help you develop your analytical thinking which is better than just memorizing a bunch of formulas or equations and it's necessary if you want to safeguard your career against artificial intelligence remember that AI tools will get better and better at writing code but if you develop your analytical thinking you'll always be one step ahead so don't wait and start today to try everything brilliant has to offer free for a full 30 days visit brilliant.org the pi coach the first 200 of you will get 20 off brilliant annual premium subscription okay that's it for this video I'm gonna leave in the description below the link of the GitHub repository so you can open this collab notebook and play with pandas Ai and let me know in the comment section what do you think about this Library alright that's it for this video I'll see you in the next one
Info
Channel: The PyCoach
Views: 16,898
Rating: undefined out of 5
Keywords:
Id: BtmMNZLxbuI
Channel Id: undefined
Length: 11min 14sec (674 seconds)
Published: Tue May 09 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.