ChatGPT for Data Analysts | Best Use Cases + Analyzing a Dataset

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on everybody welcome back to another video today I'm going to tell you my favorite ways to use chat GPT for data analysis and then we're going to actually analyze some data [Music] now chat gbt has taken the World by storm it is Pretty stinking impressive I've been using myself and I've been pretty Blown Away with some of the results if you have not tried it yet I highly recommend trying it yourself what we're going to be doing today is I'm going to walk you through four of the top things that I've been using chat gbt for for actual data analysis and then we're going to take a small data set and we're just going to ask it some questions and we're going to prompt it to do a few things and see how it responds I'm going to tell you right at the beginning it's pretty incredible and this is just chat gbt 3.5 in future variations in future iterations when they make improvements I probably will be making more videos on a lot of the new functionalities in the enhanced abilities that chat GPT has so let's take a look at my favorite ways that I've been using chatgpt as a data analyst all right so I've got my computer in front of me if you have not seen this before if you know what this is this is the interface for using chat GPT so on this left hand side we have these discussions and these are previous ones that I've been doing just testing things out I've gotten rid of a lot of them because I've done this a lot um and this is the actual interface now this is what it shows when it's just prompting you hey here's some ideas that you can do and then down here is where you're going to actually write what you're doing so we're going to be feeding chat gbt something called a prompt basically telling it or asking it to do something so one of the first use cases for Chachi BT is that it can explain code really well so I'm going to ask it to explain some python code and we'll see what it says so let's go ahead and say explain this python code I'm going to come right up here this is just my jupyter notebooks I'm literally just going to take this this is from a project that I created just a unit of measurement converter and I haven't given it any information besides that it's literally just the code and it's going to have to take the string and the code that I've actually written and try to explain it to me so all we're going to do is we're going to click enter and right now I'm working with the completely free version so everything that's about to generate is based off the free version I'm not paying for any of this um I don't really believe in paying but let's actually take a look at what it's explaining so it's I'm not going to read all of it but it says this code is a simple unit converter converting between inches feet and yards it says the user is prompted to enter the unit of measurement to convert and the code converts the user's input it says the code uses lower function to convert the user's input to lowercase so the inputs are case insensitive and that's 100 right if we go back to the code is literally asking the user to input some unit of measurement what they want to convert it to and then this logic just converts it for it it's fairly simple may be a lot of code if you don't know python but this is actually fairly simple and Chad gbt does an extraordinary job just explaining it really in layman terms now I've been using chat EBT to explain codes to me because some people are sending me code and I just don't fully understand it a hundred percent I'm like okay I get the gist of it but when I ask it to explain it sometimes it'll go in even more detail and it's really really useful especially if you're first starting out and you're not fully understanding some code or a script that you're getting or trying to write now that's just python code you could use it for SQL or for R for any other programming language or even something like a Excel function it really could be whatever you want explained you just put it in here and it'll explain it to you now the next thing that I can do is actually generate code so I'm going to ask it to generate some python code as well as some SQL code and we'll see what it gives us I'm going to come right down here to the prompt and I'm going to say write a python script to scrape data oops data from Twitter without my IP address being blocked and let's see what it comes up with so the generating code is actually really amazing because oftentimes it'll give you somewhat boilerplate options or boilerplate responses but it also gives some context here because what it's explaining right now is that you really shouldn't do that or you know here's some reasons why you may not want to do this and now it's going to actually generate the text now it even has up here a copy code so you can just come in here copy this code and you're good to go right here you can see your consumer key your consumer secret this is using the Twitter API so it's saying you need to use the Twitter API but I could say not using the Twitter API and it would generate an entire other response without using the Twitter API so it's pretty incredible and it's still going um this is a lot of information and if you don't use any apis if you're not used to using um things like that if you don't know python at all um this is fairly straightforward it's fairly simple um this isn't I'm going to scroll up real quickly this isn't anything like super Advanced this is something that I could probably Google and get pretty quickly but it's generating it based off of my requests that I don't want my IP address to be blocked um so it's giving me some options to actually do that which I think is just fantastic it's really really impressive and it even explains it at the bottom how it helps you avoid this IP address being blocked from Twitter so again just pretty impressive stuff but let's go ahead and try it with SQL code so we're going to come down here and say write a MySQL stored procedure that automatically Imports data from a CSV at a specific path and let's see how we write this now I've worked a lot with this stuff with data collection importing data automatically when a CSV gets dropped into a file you want to automatically take that data into the database super super common um so let's see how it writes this now I just got a network error basically chat gbt is overwhelmed and it couldn't finish my response this is happening a lot with chat gbt I just want to kind of explain this they now have a paid version that you can pay for it's like 20 bucks a month or the more advanced version which is 42 a month this won't happen if you pay for it um but if you don't which most people don't want to do you're going to encounter these types of Errors so I'm just going to do is I'm going to refresh it even says we're experiencing exceptionally high demand that's completely true but I don't want to pay for it so I'm going to show you and hopefully it'll generate the entire thing without me having to pay for this because I really don't want to let's see what it's writing so it's creating the stored procedure and it's using this import CSV it's giving our file path and telling it to put it in this um in this table right here and so it's literally looking for a CSV and then it's going to actually write the code to load that CSV so that's what it's doing right here and you could use this you know you have to have certain settings in my SQL don't just copy this and paste it you have to input some information you have to have certain settings set up in my SQL as well but then it just kind of explains it a little bit for you and it's pretty great I've used some variation of this and each time I run this it gives me a different version sometimes it's more Dynamic sometimes it's more hard-coded so actually knowing this skill when it's generating code actually knowing SQL is really important I wanted to warn some people out there because a lot of people are like okay you know I'm just learning SQL I should just use this to learn SQL and I'm like I don't think that's a good idea because if you do not know this code already and it tells you to do something you may get a worse option or a not perfect option and you need to know it so just looking at this this is pretty good this is pretty similar to something I've worked with but in previous versions it's given me different options that I didn't like as much so be careful with this one when it's actually generating the code oftentimes you'll need to tweak it or change it I have found that chat gbt is pretty good but sometimes it just gets things wrong or it just gives me bad options that I'm like this is not good code or it's not perfect code and I have to tweak it quite a bit to get it to what I want it to do now the next thing that I really like chat gbt for is writing comments for my code so let's take this for example let's say I have this code I'm going to say can you please I like to be polite can you please write comments or this SQL code I'm just going to paste it in there and I'm going to ask it to write comments for it if you haven't written any code you may not know what that means but comments are just you know added information for somebody if they're coming in and looking at your code or for you in six months to come back and look at your code you know exactly what it's doing so it's going to describe every part of this code for you which can save you a ton of time if you're commenting your own code that can take a while and you can just go in here and add a little bit more information if you want to but this is like a fantastic starting place for comments and code which I personally sometimes forget um and so I've been using this for even more complex code like this is fairly simple code to comment I've given it even more complex code it's done a pretty good job there are times where it gets it kind of wrong um but this is pretty impressive for actually generating comments for code it's saying exactly what each step is doing the next use case that I've been using chat gbt for is for creating data dictionaries now if you don't know what a data dictionary is it's basically a table or something that describes what data is in your data set whether it's in a CSV or a SQL server or wherever wherever that data sits it's good to have that data dictionary to know exactly what data you have so I'm going to say can you give me a data dictionary or this data set now I'm just going to paste this in here I just copy this from an Excel so that's all I did and I'm going to hit enter now it's starting to generate this it's going to actually give me a table so it's creating this table of the column name that's in there the type of data that is in there and then an actual description of it now I haven't given it any previous information it's doing this purely based off of context so it's taking this name State and saying okay this description must be the state name it's doing a total population so that must be the total population then it's even taking in things like comparison operators and it's saying these rates dot age saying a rate for age less than 18. now this is fairly fairly simple I'm not saying it's giving you anything Advanced it's not giving you a ton of information beyond what it's reading in in the column name as well as reading in in the data so if you give it a more complex data set sometimes it even takes the data that's in there and it says okay these are state name but they're labeled something different it's labeled just one two three for the column name so then it will tell you this one two three column is state name so it even takes that into consideration it's pretty wild I mean it is really really impressive now again I'm just giving you like one or two examples of each of these otherwise this video would be like an hour long but the next thing that I've been using chat gbt for is optimizing my code now let's go right over here this is an actual SQL query that I have written I'm going to copy this it's just doing uh what I would consider fairly let's take a look at it a fairly simple um window function as a sub query and that's using a case statement to kind of use that information let's see how it would optimize this code can you please optimize this my SQL code and I'm going to give it the code now what it's going to do is literally take it in and rewrite it to properly run it faster to actually optimize that code now if you don't know what query optimization is it's basically where you're either changing a query or you're changing the database to run faster so what it's actually doing here is creating an index this is a way that you can optimize code although for this code that's not exactly what I was wanting I wanted it to optimize the query and not create an index which does speed it up that is a fact but I'm going to now say I'm just going to say can you optimize this in just the query itself so it's going to remember the previous thing that I asked it to do and now it's going to understand by context what I'm asking it because I don't have to actually input my query again it's going to know let's see exactly what happens so I've tried this twice and it gave me different answers each time last time it gave me what I hope it's going to give me now that time it gave me an index which it didn't give me before so it gives you different things it's just really really fascinating how this works now what it's giving me right here is the actual output that I was hoping it'd give you although this is a learning opportunity that's just how chat GPT works this right here is what I thought it would give you instead of the index but what it did here is instead of in this part of the code having this select everything what it is now doing is just selecting the actual columns that we need so now I was just selecting employee ID department and salary instead of all of the columns as well as this window function right here now it's going to even tell you why it does that or how it does that so by using a subgrade to only retrieve the columns that are actually needed in the final result you reduce the amount of data that is needed to be processed which speeds up the query so those are all ways that I've been using chat EBT it's been pretty amazing I will give the disclaimer though that not every time do I get great results there are oftentimes I have to reword things or I have to ask it to generate another response and that happens a lot but when I actually do get what I'm looking for it's pretty impressive like I usually am like wow how did it do that it's pretty cool now what I want to do is just take a small data set and do some data analysis kind of show you some of the things that it can do some of the ways that I would use it I'm just going to pull a data set from kaggle and use that keep it pretty simple not use any real data with any information I shouldn't be putting into chat gbt on it so let's go down here and in this Excel file and we'll be using this one later in this Excel file I have this data set right here now this is going to run really slowly because this is a lot of data but I'm just and I say a lot of data it's only 30 rows but chat gbt does not take in this data well so let's get rid of this and all I'm going to say is you are a data analyst so I'm giving it some context that's who you are that's what you do here's what I want you to do can you please give me some insights and recommendations on this data set and I'm going to input this data in here now when I hit enter it's going to analyze this really quickly now if you ask it to do other things which we're going to do in just a little bit it may take quite a while so I want to give it a second to actually write everything basically what it's saying is to actually provide insights and recommendations we want to take some initial steps and these are some pretty common things that you'll want to do with data like data cleaning exploratory data analysis product analysis geographical analysis and time series analysis so let's take a look at this data really quickly we do have a product we do have an order ID the actual product how many we have the price the order date and where it was purchased from so what it can say based off of just giving it the data set randomly it says you can offer discounts or promotions on the least frequently ordered products to increase their sales now what we could do is literally take this and say can you tell us more about this and I'm just going to ask it to explain it a little more so what it told us is you can offer discounts and here's why you'd want to do it because you'll attract more customers and it even tells us how to identify these products and I'm going to ask it can you identify these products for us and I'll say and output the SQL code as well so if we go back up so it's generating the code it says to identify the least frequently ordered products you can use the following SQL code and it literally writes the code out for you with the actual column names that it says you should be using that's the part that always kind of like I just think is really impressive because this kind of code is actually um fairly simple fairly generalized but it makes it not so generalized because it's actually using your data that's the part that always like blows my mind I'm like just really impressive that's using your actual column names or your actual code or generating your code using your common names that is the part that's usually pretty impressive to me so all this is doing and it even says we're using the product count so we're using an aggregate function in the orders table and counts the numbers of times each product appears and selects the top 10 least frequently ordered products again pretty simple but you know we've asked it to do a lot of contextual things take a look at this data set tell us how to offer discounts and actually write the code to identify what we should be offering discounts on it's pretty pretty cool now let's say we didn't want to use SQL because that's what we're using right now I'm going to say Can you take our data set and put it into a pandas data frame in Python so I'm going to see if it remembers sometimes it does sometimes it doesn't looks like this time it does depending on how far back the conversation is or if it remembers what our data set is sometimes it remembers sometimes it doesn't this times it looks like it is working now what it's actually doing is creating each row of that data and inputting it as data now sometimes it's going to error out it takes too much processing power and we have like 30 rows it's only on Row three so I'm going to let this run for a little bit we'll see if it actually works so it's looking like it aired out it just stopped it just stopped this is the uh I would say somewhat downfall of the chat GPT right now is that the servers are just overloaded there's like millions and millions of people trying to use it at this very moment so oftentimes when it's using these large data sets it just doesn't work in the future I think there will be add-ons to things like Excel and my SQL and all these different things where it'll be integrated with it and process it much faster but now it doesn't so we're just going to refresh and keep going from here and we're just going to go right back up here and this is our previous data set and we'll scroll all the way to the bottom so it saved our conversation thank goodness um but it just you know it just blanked out and luckily it saved us that was a good thing now because I refreshed it let's see if it still remembers um again we're trying to analyze data we're trying to get some information out of here but you know these things are happening so I'm just kind of going with the flow so what I'm going to say is what products in our data set made us the most profit let's see if it remembers so now it's going to say to identify these products you need to calculate the total revenue so this part is really cool because it's going to kind of based off of our prompt based off of us saying we want profit it's going to say we need to calculate the profit first to determine the profit so it's going to write it's using pandas but we can ask it to write this in my sequel but it's going to take it and say okay take the quantity ordered times the price of each one and this is our like potential profit and it's even giving us comments I mean it's really going above and beyond I mean if you were to integrate this in something like python or you know any other thing this could be a fantastic tool but this is pretty great this looks like pretty standard code it's even grouping it and giving us the sum so if we ran this if we took this and put it in Python and we had our data set in there then it would probably give us pretty close to the right answer now I'm going to ask it to write this in MySQL as well so I'm going to say can you write this in my SQL and there we go so it wrote it in my sequel as well and this looks pretty simple and very straightforward it looks correct to me now one thing I want to say is we're analyzing this data we're kind of prompting it I'm just giving some examples right we're not going to do like a full analysis I could do that in a whole another video but um what we're doing is just I'm giving examples of how you can actually ask it to analyze this data for you so what we're going to do next is we're going to ask it to categorize some information for us so I'm going to say can you categorize let me say the products that made us a lot of money versus a little bit of money in my Sequel and do my sequel so we're asking it now to categorize this data for us if it made us a lot of money we're going to want it to say it made us a lot of money if it's a little money we want to tell us it made us a little bit of money let's see how it does this and again this is all pretty you know generalize questions kind of open-ended questions we're going to see how it actually interprets that open-ended question um and so it's literally writing a case statement saying it's if it's over a thousand it's a lot of money if it's less than 500 it's a little bit of money so this um this is something that you know it's really interesting you know this is not something that I'd ever write in like a real SQL query um because is leaving that 500 to a thousand somewhere in there and it's saying it's somewhere in between um I would never actually do this in real code but that's how it's interpreting it and writing it so this would be a mistake to me this is just like a small thing that I would go back and change and you know customize um but we could give it more specific prompts you could say if they made us a profit of more than a thousand dollars give it this if it was less profit then this number give it less so we could even give it more specific and it would do it now if you remember from our Excel right here we have this a column and what we want to do is break it out by the street the city and the state as well as zip code so I'm gonna ask it to do that because this is something that I've actually done a ton in SQL python in any programming language is breaking it out so you can group on it and and you know um clean up that data better because in its current state it's not very usable so I'm going to say can you break out the last column into Street City State and zip code columns and we'll say in my sequel now you can do the exact same thing for python or whatever but let's see how it writes this now I post this on Instagram the other day and before it was using substring index let's see if that's how it does it again and it looks like it is and this is almost exactly how I would write this so this one really impressed me because I'm like writing this code to test this video or like write make this video this one really impress me because you know substring index I'm using substring and locates like this is you know it's not super straightforward it's not super easy but it doesn't um just very very impressive in my opinion very impressive how it's able to write this and this code if I'm just glancing at this this code looks correct like I could run this and it would take that column break it out into those four columns for us now what we would want to do because this is just in a select statement but I could literally and let's try it I could literally just tell it can you create new columns for each of those so again I'm being vague each of those I'm just kind of tested a little bit while we're in this video so you can see how impressive this is so now it's going to create a new column City State Street and zip code and then it's going to tell us how we can do that um I don't know it's just really really cool and uh now it's literally writing the code to use that substring index which we generated up here which look correct and now it's telling us in these um you know update statements how to actually apply that to those columns that we created with this code all you have to do is copy and paste this again it's blowing me away a little bit every time I run it it's just kind of like how does that work like on the back end what code are they writing uh or what code are they using to to generate this it's just really impressive now you can ask it to convert it to a different programming language or whatever you want and it would do it um but what we're going to do now is do something a little bit different I'm going to ask it and I'm gonna actually I forgot to create this beforehand I'm just going to say first underscore name and what I'm going to do is I'm going to give it all types of different inputs so I'm going to say sam and then I'm going to do um Cal Lee and then I'm going to do Josh one two three so I'm going to take this I'm going to ask it to clean this data for us just sounds like a small example can you clean this new data set and I'm going to give it to it just see what it does if I need to be more specific I will but let's see how it takes in that data and standardize it even says right here we're going to use upper to standardize the text and to remove numbers from the characters you can use regex replace which is 100 accurate that's probably what I would do if I had a large data set with all data like this in there I would be using those things to clean this up so this code right here is going to take regex replace and replace all those uh numbers and just keep the letters basically is actually what that's doing and that's going to put it all upper so I don't want an uppercase I'm going to say I don't want it in uppercase I want proper case could you please write that code as well and I'm going to see what it does I hope it makes it into more proper case because I don't like using it in uppercase just not always what I like visually seeing let's see what it does so it's basically going to use init cap and make it a more proper case instead of doing it all on upper or lower or something like that and now what I'm going to do is I'm going to say can you put that cleaned data in a table for me and it's going to actually generate this and create a table for you to copy and paste um and it's actually creating the table that's not what I wanted if I'm being honest I wanted it to create just a list for me to copy and paste so I'm going to say while it's generating it I'm going to say can you put this in a list so I can copy and paste this into Excel so let's see if it does better this time because I didn't want it to create the table in SQL but that's the context that I was understanding and now it's doing it in Python again not exactly what I wanted but it's doing something right it's definitely working hard to try to understand what I'm trying to tell it I just want it to literally give me like something I can copy and paste and it's not doing that exactly I could just take this if I wanted to and break it out but it's not doing exactly what I wanted although that happens with chat gbt the very last thing I'm going to do and I'm just going to go back here is I have the second data set and it's a lot of data so I'm just going to take like the first two columns and what I'm going to do is just ask it to basically explain what this data set is I'm just going to say can you please explain what is in this data set now I'm being specifically vague for a reason because you can be much more specific you can say explain to me what these columns are and how they correlate to each other you can explain ask it to explain almost anything but it's basically going to give us some context some overview of what kind of data is in here much like a data dictionary would it's kind of doing the same thing now this is going to take a long time so there's a lot of columns there's like like 30 or 40. but it's doing its thing it's going to keep going until it explains all of them unless it errors out because it's just the system's overwhelmed but while this is running these are some ways that you can actually analyze data I'm just giving you a lot of examples because I want you to go and test this out it's I don't know there's no other word to use other than just it's really impressive um I'll have an entire other video that I'm going to be making on this which I'm basically going to be talking about how it'll help with data analysis um the data analyst job market how it'll impact it in the future and some predictions that I have about its capabilities and things like that there's just a lot going on in this world um Google's about to come out I'm just kind of giving some filler for my next video on this but Google's coming out with its own version I believe it's called Sparrow it's using their Lambda model that they've built and I'm super excited for that one so I probably will do some comparisons between those as well this type of technology is just it's gonna change things I don't know how exactly I have some predictions I have some um you know information that I'll be providing this is going to change things this is just incredible and it can analyze data like this on these small data sets but when it's Incorporated in things like Azure like with Microsoft which is you know doing some big things if it's Incorporated with Azure and it can analyze these huge massive data sets a lot faster a lot of things could change so in this video I kind of showed you a lot of my favorite things that I've been using Chachi BT for I've showed you some of the functionalities as well and then we even analyze some data just kind of poking around and asking it questions and seeing how it interpreted those questions as well as asking it to write it in Python and SQL so you know I just gave you a few different variations and flavors it wasn't like a full analysis this is just like an example in another video I might take an entire real data set and see if I can get chat gbt to analyze it well um but not in this video so I hope that this was helpful and I hope that this was exciting for you this stuff is super super exciting to me I find it just incredibly interesting and I think it's just really impressive so go try it out yourself go see if you like it I will be making more videos on stuff like this because I personally think it's just incredible so with that being said thank you for watching this video I hope you learned something I hope this piqued your curiosity be sure to like And subscribe and I'll see you in the next video foreign [Music]
Info
Channel: Alex The Analyst
Views: 359,867
Rating: undefined out of 5
Keywords: Data Analyst, Data Analyst job, Data Analyst Career, Data Analytics, Alex The Analyst, Chat gpt, Chatgpt, Chatgpt for data analysis, data analyst using chatgpt, chatgpt 3.5, chatgpt for data analysts, data analyst chatgpt, chatgpt data analyst
Id: C75TROiiEa0
Channel Id: undefined
Length: 31min 7sec (1867 seconds)
Published: Tue Feb 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.