Trying out Code Interpreter for ChatGPT

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hi everyone I'm Ben welcome back to the data literacy video channel in this tool tutorial I'm going to walk you through how to use code interpreter a new Alpha plugin for chat gbt by openai now I just got access to this plugin recently and people are still on the waitlist for it but I want to show you how we can use it to analyze data uh and so let's without further Ado Dive Right In okay so here we are in chat GPT as you can see now I've selected this model up here called code interpreter Alpha what is this model it's an experimental model that can use Python and handles uploads and downloads let's try it out we're going to give it a little test flight using no pun intended here air traffic passenger statistics from data.gov this is an open data CSV file that gives us passenger counts into and out of San Francisco International Airport for a little over I guess 15 years maybe something close to that and so you can see I just downloaded the CSV here's what it looks like if you open it up in Excel you can see the data goes back all the way to 2005 up until last February there we go that's the data we're talking about it gives us the airline whether it's domestic International the region the flight came to or went to whether people got on the plane or off the plane or just stayed on it uh whether it's low fare or not the terminal the boarding area and then crucially column L the passenger count how many people were on all these flights that month according to these attributes okay so there it is not a particularly dirty data set we're not going to do anything uh considerably advanced in terms of the analytics let's just give the tool the ability to play with the data and see what we find so how do we begin well we click this icon in the left of the send a message prompt line and you can see it's a little file icon with a plus in it so as soon as you click on that you're opening up a file folder here and you can select your CSV which I've done and then open it in immediately what chatgpt does is it recognizes the file name and it says great it looks like you've uploaded a file named air trafficpassengerstatistic.csv let's take a look at the contents and then it opens up this working window here and if you drop it down you can see that it's writing python it's importing pandas and it's loading the Cs violent uh into a pandas data frame and it's displaying the first few rows there we go and so this is what the outcome is and it's now giving me this response that it tells us what it includes and it goes about telling me the column name headers and then inferring some information about those columns and what data they contain note that I did not give it metadata it did not refer back to the source on data.gov it guessed about what these columns were all about which of course might be susceptible to error of note for example it noted the activity period is likely month and that it noted the format WHYY mm formats right I didn't tell it that all it did was it took a look at this column and it made that guess it happens to be correct but we don't know if it always will be at the end it says please let me know if there's anything specific now I've never really done this and I don't really know exactly how to prompt uh this particular plugin to analyze the data so let me just kind of describe I don't know how I would have thought about it and let's start with something very general and generic let's just say can you tell me descriptive statistics uh of the data and also creates some basic charts and graphs that help me visualize what is in it and any patterns and make it seem very very kind of non-specific request let's see what the heck it's going to do well it starts by input importing matplotlib dot Pi plot okay and then it's starting with giving me some variables it's going to create a bar chart with a number of passengers per operating Airline and filter it to the top ten it's going to make a bar chart showing the number of passengers by region okay so it's starting off with a lot of bar charts it's even going to be a pie chart here telling me the type of activity code okay and then it's going to Output all of that in the descriptive statistics okay so and now we see that it starts to create that output all right so we see you know a pretty good usage here of part to whole data in the form of a three slice pie chart now it's going to go the next level after it creates these charts I see that a lot of United flights a lot of us-based flights right and then it's going to give me a bit of a description here of some descriptive statistics rather telling me the period it's giving me the passenger count how much it varies what the mean and standard deviation are of those rows about passenger counts and then it's giving me a description of the charts what are they telling me a bar chart showing the top 10 operating Airlines by passenger count right and then it's going to save this conversation over here as air traffic data analysis okay so there we have it right so let's get a bit more specific what if we take this one right here let's modify it maybe I want to think about pivoting this 90 degrees to show these as horizontal let's see if we can do some different modifications to that as a Next Step so can you modify the top 10 operating Airlines our chart for me I'd like to see the bars horizontal make them gray for me please as a Canadian I'll say the police in there and can you add data labels to the ends hit of each bar making sure to add thousands separated so I'm gonna see if it's going to make a modified bar chart for me well it's going to start off by declaring that's what it's going to do all right and then we can see it's going in here creating our access labels creating our title now it's adding this for Loop in here that I think essentially adds the Thousand step separator okay and it's inverting it and here's what we get and then it's giving us my data labels over here okay so that looks pretty good but is it correct you know is this accurate I mean that's a key question for llms isn't it we're not exactly sure yet if it's right if we know that there's problems maybe with its ability to do math and it's not great at numbers but it is great at code so let's see if these numbers come out true I've uploaded the data set in tableau we're going to take operating Airline we're going to take passenger count and sort it all right we're going to add this passenger count to the label field and now let's compare it so 350 000 million rather 350 million 170 993 that's the same number we get over here for United Airlines at the top there's another United Airlines line here for a different time period I guess before July 1st 2013. that number comes out to 210 million 727 000 which is the same exact number we get here when we analyze the data ourselves with Tableau 118 million 110 million 94 million 118 million 110 million 94 million looks like it's checking out so this is certainly a lot better than what I saw some additional attempts that didn't include this code interpreter because again it's not using data I mean I'm sorry it's not doing the numbers itself it's creating code to to do that and so the analysis is being done within python with our data fully uploaded right now what if I wanted to combine these this happens all the time in the analysis workflow doesn't it you know we see things like this we get into the middle of it and we realize wait a minute maybe I want to see these combined can we do that let's try it so uh let's say can you I'm going to say can you combine two levels of operating Interline that's both include United Airlines and then I'd say I'd like to see those levels of the variable right mind so that it's just shown at United Airlines okay let's see if they can figure that out I mean I don't think my request is super articulate or eloquent you know I'm not 100 confident that I've clearly explained what I want it to do but let's see what it does um you know what I find interesting is it first starts by combining at the very beginning at the top here what is it doing it's combining United Airlines using this uh using this essentially this if statement right if it finds United Airlines in it then you know include it otherwise don't now it re after it does that what does it do re-uh calculates the top 10 and then it goes and makes the bar chart let's see what we can get here okay so now it has both United Airlines with 560 million up top up here let's see what happens in Tableau if we combine these two together we're going to see do we get 560 million as well let's sort it 560 million yes we do okay so it did that math correctly I find that fascinating you know it did that modification it cleaned up the data and combined them in that way here on the Fly using a very awkwardly worded command maybe I only want to see this for domestic not all um let's just say can you can you redo the bar chart this time only showing the top 10 Airlines filtered by domestic flights lots of international now I didn't specify that this is called Geo summary is the name of that variable right I didn't say anything about Geo summary what do I mean Geo summary that's right here Geo summary domestic International but watch it's smart enough to go in there and say okay well we need to filter by Geo summary only if it's domestic I didn't say geosummary I didn't tell it that it was able to recognize you know and parse that out and then do the filter based on the variable that it contains and so now we're looking at top 10 operating Airlines by domestic only still combining United right it didn't break those out it retained that we can see here if we come up here and filter by domestic that we're talking about 452 million of the combined United yes indeed that's what it is so you know it's getting the math right here pretty pretty interesting here now what if I'm going to do another analysis what if I just want to say great thank you so sorry I have to be polite uh let's see can you show me how passenger counts have changed over time okay let's just kind of break now from that previous line of analysis we're giving it a totally different question that has nothing to do necessarily with this line of of inquiry and so can it get it right can it break away and do a totally different kind of analysis it's interesting it's inferring some formats here on the activity period right it's then going about creating a Time series data and it's displaying it as a plot a Time series plot in Python and here we go now of course we know what happened here don't we you know this data plummets right around when 2020 we know exactly what happened almost goes all the way to zero isn't it but it doesn't really do anything we see the season seasonal pattern we see the huge drop off and then it kind of tells us a little bit about that and not only that it infers it actually goes a Step Beyond and says first of all they're a piece to be a seasonal pattern then it says I also notice there's a significant drop off most likely due to the covid-19 pandemic right wow I mean wow it's finding seasonality it's finding major outliers and doing research not necessarily research per se it isn't that it's researching anything it's that it's large language model that it has built from training data up to September 2021 is enough for it to go in and find the reason for that big drop in passenger count in 2020. so this to me is why this is totally groundbreaking I mean I have never seen anything like this I've been in business intelligence for over a decade I worked for Tableau for many many years I teach people all around the world how to analyze data and I have never seen a tool like this I can just interact with this in everyday human language you know using the power of NLP using the power of these Transformers put together to create after some training some amazing amounts of insights based on a massive Corpus of text also now critically using this code interpreter plugin that's able to import the data create python code that takes my everyday language converts that into code to respond to that and to create an output I can modify charts and graphs I can tweak them I can modify the data even combining values right I can show labels I mean I've just never seen anything like this I mean every step of the way it created all this code every step of the way it's output the data and I'm blown away let's try one last thing um let's say can you output this the charts um let's just start with something simple can you output the line chart as a PDF along with a title and a description of the key insight again I don't think the Insight here is particularly groundbreaking everybody knows what happened to travel in 2020 so I'm not sitting here saying like that's some kind of unexpected Insight I'm just saying we can do some basic descriptive analytics with remarkable ease creating code modifying code importing data let's see if we can actually output output the data as well into a file perhaps I don't know I've never done it I tried it before and it was getting hung up and had some issues but it seems to be telling me that it's done now and let's see what it gives us so you can see that it's having some a PDF file that is generated it's giving me even a hyperlink with it uh let's click on it and see what happens so here we go I get a little drop down here we're going to open it up and there's our chart and you know we maybe want to format this differently but it tells me at the bottom here basically what's happening I think I would need to go through and modify this but I get my output in PDF form I'm blown away I mean I was having this conversation on Twitter today with someone I really respect in the data visualization World his name's um Nick desperatz he's that I think uh practicalreporting.com and so we were talking about whether this uh technology let me make sure I get the website right yeah um practicalreporting.com exactly he's a fabulous author trainer great conversationalist in the data world I love interacting with him really refreshing you know interesting thoughts inclusive and and so anyways you know his take on this was well you know probably problematic and error prone not quite groundbreaking and I agree with him 100 this is gonna be error prone you're gonna have to cross check everything you're gonna have to really be careful I think it's groundbreaking though and I think the reason why and that I think that part of it I think we disagree with because like I said I've just never seen anything where you can have this conversation with your data like this involving creation of code on the Fly modifications involving the structure of the data itself you know modifying charts going about making inferences or assumptions about the meaning of the data uh you know the way it inferred the nature of the the plummet there being due to covid-19 and describing it their inputting outputting I mean just the whole the whole workflow to me I mean this just makes basic data analysis anyway something anyone can do anyone and that to me like I said I think that was the promise of SQL SQL back in the day in the 1970s when Jonathan uh Donald Chamberlain decided to create his partner decided to create you know that query language that would be close to Everyday language I know it was the the goal of the founders of Tableau when they were hoping to create a tool that anyone could play with to visualize their data those tools did a lot I mean let's be honest we can't take anything away from it but it didn't truly achieve this self-service you know this ability for anyone to interact with data I think this what I've just seen here this comes closer to being able to do that it really does and so we'll see where it goes you know I think the other side of this coin is that you know this isn't just remarkable and amazing functionality capabilities look it's terrifying I mean to think that a tool like this in Alpha mode can do this kind of work what else can it do how else is it going to evolve and grow are we situ are we dealing with a situation here very soon in which we have super Intelligence on this planet like we've never seen before what does that mean for the human species I mean there are very real concerns about that I don't mean to be Chicken Little but I think we need to take this seriously I think we need to have real conversations real quick about what are we doing with this technology the cat's out of the bag the Pandora's Box is opened it's gonna be really hard to put it back in so we have to figure out some guardrails pretty quick because I'll tell you what I've seen so far to me yeah it's groundbreaking it's beyond that it's mind-blowing we're gonna see where it's going to go well we've got some more probably attempts to play around with this we're going to see where else it goes I'm going to try some new uh tools and play around with it there's another one here I want to try out next where you can browse the internet using chat GPT so much more functionality it seems like it's coming out every week or even sometimes every day so remarkable stuff all right everyone take care I hope you enjoy this little tutorial I think you should be able to I'm guessing get access relatively soon we'll see how that goes stay tuned follow data literacy channels we'll keep you up to speed if you really want to kind of get up to speed with the basics we have a course called chat GPT Basics we launched uh a couple well let's see a little over a month ago it feels like it was already six months ago a year ago you know in technology time these days but chat GPT Basics walks you through some of the important warnings and caveats associated with the use of this new technology and also gives you some context into how to use it we keep updating it with different ways to use chat GPT or chat GPT clones like GPT for all there's a new one that came out recently that you can use not based on GPT based on llama the The Meta models family of models so anyway we're going to continue up there getting it I feel like I updated like every few days now I go in there and modify it and add something new so I'll keep doing that all right everyone that's all for now take care talk to you soon bye
Info
Channel: Data Literacy
Views: 39,410
Rating: undefined out of 5
Keywords: data literacy, data, data visualization, data science, data analytics, data analysis, data analyst, data literacy crash course, data literacy basics, data literate, what is data, ben jones, big data, analytics, data visualization software, data cleansing, data handling, data wrangling
Id: b9hSCuFGNRU
Channel Id: undefined
Length: 19min 52sec (1192 seconds)
Published: Mon May 01 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.