From Spreadsheet to Data-Driven Report Using ChatGPT and No-Code Viz

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Elijah notable we're so excited to see how people have been using our new chat GPT plugin but if you're only using chat GPT you're missing out on some incredible features that our platform provides today I'm going to show you how to use the built-in no code data visualization in our notable notebooks along with chat gbt to dramatically improve your data-driven work I'm going to work with this spreadsheet based on a survey of managers asking them how much they made where they work what kind of bonuses they've received how long they've worked and so on really interesting data set but it's rather messy so let's jump into chat GPT with the notable plug-in enabled I'm going to tell it to clean up that data for instance changing the string values of how long you've worked into numbers try to get it to reconcile some of the different ways that people have indicated the country they've worked in do the same thing for the states that they've worked in and their job titles and then also group together the various job categories into industry categories plugin will create a notebook load that data straight from the Google Sheets which for me is a really great time saving element and begin to work through the tasks that I've asked it to do and now we can already see it change the range for how many years of professional work experience do you have into a number that'll be easier for us to plot when we do our analysis it'll use a python Library called Pi country to try to reconcile the different ways that people refer to countries so to do that it'll have to install Pi country so you can see it doing a pip install of that Library and then it'll apply it on the data set you can see that every time it comes back in chat gbt because this is such a wide table it gives us a rather cumbersome response so this is one way that notable is really going to help you because we've got a great interactive table that shows distributions and allows you to filter dynamically within the table and read it a little bit better than what you might see in the chat GPT window and then you'll notice what happens is it stops and this happens with chat GPT all the time so when you've asked it to do a number of complex tasks eventually it'll it'll run out of juice and all you have to do in those cases is just ask it to please continue so now it's going into the job title column it's going to try to reconcile job titles and it's going to tell you that's a really challenging task so all I'm really going to do is make them all lowercase and remove any trailing or leading white space which will make it a little bit easier for us to navigate those job titles but it'll let you know that there are other ways you could say normalize job titles using machine learning and if you wanted to you could dive into that with chat gbt it's pretty good at it and the final task I asked it to do was take all of these industries and create some kind of higher level category for them it creates a bit of logic for that and builds out a top level industry category which is great when you're working with data like this and you've got huge numbers of categories you want to find some way to roll them up into larger super categories now when I look at the data I notice that it still has some problems with different ways of referring to countries like the United States referring to it and uppercase and lowercase using USA or U.S or United States or U dot s dot so I ask if there's any way it can further refine that standardization in the country column and what it does there is it creates a custom function that just tries to do a string replace now one of the things I've noticed with the data set is that the column names are way too long for me to use in charts as labels for axes and to use and controls to select the data that I want to work with so then I ask it to shorten those column names that'll just be a convenience for me to use the UI and it does a pretty good job of transforming say what industry did you work in into industry or if other please indicate the currency into other currency and things like that and here's where we get to do some more exciting work so every time that chat GPT has shown me the data within the notebook it's used the python head function when you see the code that says DF dot head parents what that's saying is just show me the very top of the data set so just the first five rows of the data set DF is stands for data frame that's our table loaded into python so instead of saying df.head I create another cell below it and I just type in DF that means I'm just calling the full data frame this is 29 000 rows and a notable notebook has a built-in no code data visualization tool that we call Dex and what debts lets you do is do classic data visualization so making bar charts or line charts or pie charts but it also has a built-in feature called Data prism what data prism will do is it'll look at your entire table of data and try to find some interesting views into that into that data so let's take a look at the views that data prism comes back with I've got this multi-axis bar chart I've got a scatter plot and a tree map and a violin chart and a few more charts underneath but immediately what jumps out to me is that in additional compensation I see this enormous outlier somebody reported in this data set that they received an additional compensation of 120 million dollars and that's blowing out the scale for the entire data set I also noticed that age is being listed as a dimension and so is annual salary and so what I realize is I need to come back and tell it hey annual salary should be a number and it easily transforms annual salary into a number so it's available for me to visualize now the first time there's some kind of error under the hood I just tell it to regenerate response and it runs the command fine and now we have a numerical value for annual salary so I tell it there's an outlier an annual salary of 120 million dollars so pull that out and then what I realize is hey chat GPT isn't the only one who makes mistakes I made a mistake here too it was actually additional compensation so I just tell it hey pull out additional compensation where the value equals 120 million and it does it fine now we go back into the notebook I run DF again in a cell and I get back a much more interesting set of charts now my multi-access bar chart which is plotting average experience biannual compensation shows some more interesting patterns my scatter plot is more readable and my violin charts and so on are all much more readable data prism is showing all of these individual charts if I want to I can zoom into one of these charts so I can jump into the scatter plot adjust some of the settings in the scatter plot maybe see if there's a different metric I want to plot maybe I can use marginal Graphics where I can see where the most dense data points are I can adjust the circle size and turn it into a graduated symbol plot and now I better understand the data using what's built into the notebook Dex and its data prism functionality and its individual chart Builder functionality and I can bring that knowledge with me back into chat GPT where I can use that to influence the analysis that I want to do on the data so I asked chatgpt to tell me about the trends and the anomalies in the data and it groups together the data into those industry super categories that I had already asked it for so we're looking at education Finance government manufacturing and so on and we're looking at some summary statistics of it based on the average compensation and the 75th percentile and so on and what's great about our notable notebook is that since this is a data frame so that this is another data set I could use data prism to look at that data set and see the trends in that data set visually very quickly to see if there's anything there that really jumps out at me I go back into chat GPT where it suggested some areas of analysis that we could pursue I tell it that sounds great and I make sure to let it know that I want explanation and charts as it runs this analysis it comes back with some box plots and some explanation of what those box plots mean and then I let it know continue the analysis but do the rest of the analysis on the full data set and not just the summarize data set chat GPT then creates some code to create another scatter plot it colors that scatter plot and adds some explanation of what it's showing within that scatter plot I ask it to then include the learnings from this exploration as a cell at the bottom of the notebook and at the end I have an entire notebook of analysis I've got interactive data visualization in there I've got explanations from chat GPT and I've used the built-in functionality in notable to make my analysis better to influence the way that I direct chat gbt to support me in building the data-driven report now no code data visualization isn't the only thing that notable notebooks have we've got a whole host of other features that we'll be showing off in the coming weeks but I wanted to take you through this process because I want you to be empowered not just by our plugin but by the full notable experience to find out ways that you can optimize and improve your workflows
Info
Channel: Noteable
Views: 9,278
Rating: undefined out of 5
Keywords:
Id: Myeay1SQ2zE
Channel Id: undefined
Length: 10min 43sec (643 seconds)
Published: Fri May 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.