Get your data into ChatGPT: CSV, JSON, Databases & more

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm alleged notable and today I'm going to talk to you about getting your data into chat GPT the number one thing that I hear when people are asking about chat GPT is and it's whether or not they're using the notable plug-in how do I get my data into chat gbt So today we're going to talk about seven different ways you can get your data into chat GPT whether it's Excel it's a database it's CSV or raw text or Json in past videos I've shown you how to get to data by pointing to the URL of the data set today we're going to talk about using the notable notebook to load your data into that notebook and then access it through chat GPT first off we're going to look at Excel data I'm going to open up a folder where I've got an Excel spreadsheet I'm going to drag that Excel spreadsheet into my notable notebook project and you'll see it pop up there rural atlasdata24.xlsx and then I'm going to go into chat CPT I'm going to make sure that I've got the notable plugin installed and then tell it to use the notebook I'm in you don't have to specify a notebook for notable to work it can create a new notebook or if you've already got a notebook created you can point it directly at that notebook I'm going to tell it to load this data rural atlasdata24.xlsx and I'm going to ask it for charts and Analysis of one of the tabs in the data so it's going to load that data up using pandas to import the Excel file it's going to specify the sheet name and it's going to show me the first five rows of that data set so it can better understand what the data looks like after that it'll access the people tab it'll come back to you and tell you how many columns and rows are in that data it'll give you some understanding of what the columns are representing and then it'll jump right to making charts from that data now I can go into the Excel spreadsheet and look at the column names and specify which columns I want to look at I could have just told it to uh give me an overview of the entire data set but in this case I'm going to pick out three calls net migration 2020 to 2021 population change rate 2020 to 2021 and the percent of people under 18 and it'll start doing some analysis of that data and sure enough come back with these distributions of population change rate under 18 and the columns that I've specified and that's as easy as it gets to load your Excel data into chat GPT using notable notebooks now a lot of folks are using CIA to load CSV data in as any other kind of data set I do the same thing where I've uploaded bloodpressure.csb into my project I tell chat GPT that the data is stored locally sometimes it gets it and sometimes you have to specify that and I just give it the name of the data set loads it right up pulls the first five rows to get an understanding of what the data represents and what I love about this is based on the name of the CSV and based on those column names even though the column just says BP underscore before it understands that that column is blood pressure before treatment and BP after is blood pressure after treatment and age grp means age group of the patient and so on it does some descriptive statistics of the data telling me how many records are in the data set how many different categories there are in the sex category in the age group category and the most frequent values in those columns then it builds another distribution a different kind of data visualization for the distribution than it did before drops that in my notebook shows me box plots by the before and after treatment assuming that that's the kind of uh information that I'm trying to understand there then it splits It Out by age group and it gives me some explanation in chat GPT of what it's found so I tell it explicitly hey include the learnings as a cell in the notebook because I love having this Dynamic document afterward that I can work with after I'm done with my chat GPT session so it's good to include that information directly into the notebook and there we have it a fully interactive data-driven document all from a few simple prompts now sometimes our data isn't structured we want to load things like raw text files so I so I can download an entire raw text representation of a Hamlet again I upload it into the project I tell chat CPT The Notebook I want to work with I tell it a little description of what this data is in case it can't figure it out and I say load this raw text and do some NLP on it natural language processing at first it tells me hey the nltk toolkit isn't installed so I can't do NLP on this without this toolkit installed and it tells me how to install it but instead of me entering that code I just tell chat GPT install it and continue with the analysis and sure enough it installs nltk and it does some analysis of the word frequencies named entity recognition and sentiment analysis and at the end of it I say please show me some charts of the results because I want to see data visualization on the data visualization guy and that's something that appeals to me and the funny thing is is that chatgpt comes back and says well data visualization doesn't really work in some of these cases and explains that to me and explains why which I think is an exciting feature of using chat gbt for your day to work because it's not always about getting the answer it's about getting educated along the way but it still does a sentiment score uh bar chart showing the distribution of sentiment score whether it's negative positive neutral or a combination of sentiments and that's how you load raw text into chat GPT using notable notebook the next one I'm going for is a Json file Json is a nested data format that's popular for storing data and it can sometimes be very hard to parse because there can be a lot of nested data in there so I've loaded this Json file into the project and pointed check apt at it I've said give me an analysis so and chart the data so that I can understand it better it gets to work comes back and you should be used to this by now with the first few rows of the data so it can better understand it I can go over to the notebook and look at data like this instead of in that raw table View using notables built-in data table viewer which is a bit more interactive and and easier to use and then it says well what kind of analysis do you want me to do and in this case I don't specify columns or specify a kind of analysis I just tell it literally go to town and it digs into the data and starts charting out the number of contests per year because there seems to be a data set about contests it plots the number of contests per month so it gets a better understanding of the distribution of the data and then it runs into a bit of an issue it says it seems there was an error trying to extract the nested Json data from one of these columns and the error indicates that it's not in the format that is expected and the great thing about chat GPT using the notebook plugin is it doesn't just stop there it starts to work on that to better understand that data and it works through it parses that data even though it comes back a few times and says that it's not quite what is expected and then it shows you this nested data structure within the Json so that you can understand the nested structures within your nested Json and then it plots those and now I've got a notebook that has that code in it that has the analysis and the the necessary process for digging into nested Json structure so I can either continue to work with chat GPT or I can jump into that notebook and now that I understand how that code is written I can change some of the variables so that it points at the parts of the data that I wanted to work on and that's it for loading Json data into chat GPT using a notable notebook but it wouldn't be enough if we only don't attic files that's great for doing some analysis but really when we're working with data we're going to work with databases so how do we connect chat GPT to a database like a postgres database and the answer is you go into your notable notebook which already has data source support and you add your postgres database using the data connections tab once I create a new data connection I select the kind of data source that I want to work with and you'll notice we support the number of different data sources Amazon Athena cockroachdb MySQL redshift along with postgres and and a few more in this case I'm going to say I want a postgres database I give it the connection details I'm not going to show you the username and password import but I had to put those into that and once the connection is made it's going to say hey we have to restart your kernel your kernel is what's doing all your computation under the hood and so once you have a new data connection that kernel has to restart to be aware of it so we just tell it sure restart it and then once I've made that data connection I want to tell it to refresh the schema so that it can connect to that database it can understand the schema and the tables that are in that database and once that's done it'll say schema refresh successful and I can jump into my chat GPT interface and tell it again I want to use this notebook and I tell it I want to look at the tables in a particular database in this schema so I don't have to tell it the particular database or the particular tables I could just let it look at the schema and work with it interactively but in this case I know which database I want to look at Lake underscore Summit and I wanted to talk to tell me about the tables that are in that database and then go through and do analysis and charts on that it comes back and tells me there's two tables in there gives me details on what those two tables have within them so the weather table has 19 different columns in it talking about wind direction humidity yearly rain and so on does the same thing for the water level table which only has two columns in it and once it understands that data structure it can proceed with the analysis first thing it does is plot the water level over time we can see that there's some interesting patterns and how that water level is being stored in the database maybe those are some data issues or maybe they reflect periods of drought we can go and explore that in more detail begins to analyze the weather table and it takes a little time for it to access the data so it lets you know hey we're still running this cell we're still running this cell and eventually it comes back and says here's what's in that table and then it does an analysis of that table too and it asks you do you want me to go on is there anything specific that you want me to look at in this postgres database and since I don't I just want to go and take a look at this great notebook that I have that has it connected to this database and you can see as you're scrolling through the notebook again the code that it's used to access the data you can see how it's plotting that data you can see the way that it uses not just python cells but SQL cells to select from the database tables and that gives you that power then to go and write your own SQL queries or write your own code or just adjust it a little bit to get at the data you want and analyze it in the way that you need to analyze it so let's say you want to work in the modern data stack and you've got a lot of data and a snowflake data source we can support that too you can connect to snowflake from your notebook just like we did with the postgres connection we put in the name and the account details and so on and again I'm not going to show you the password of the account details and we're going to connect to a snowflake sample data set just like I did with the postgres database I restart the kernel and refresh the schema and once that scheme is refreshed I can jump into chat GPT and query and snowflake is as easy as querying a CSV or an Excel spreadsheet or a postgres database I just tell it use this notebook and examine the snowflake database schema tpch underscore sf10 give me charts and Analysis this time it comes back and it lets me know that you're connected to several data sources it's got that postgres connection and that snowflake data connection and also we can use querying to query csvs and data frames if you want to using ducting B SQL it gets into the schema I've asked it for it lets me know that it's got columns like customer and nation and orders and so on and then it gets the structure of those the code that it tries initially doesn't work so it lets you know hey it didn't seem to work quite right so I'm going to try a different approach to access that that scheme in those tables that second approach works comes back and it lets you know what's in the customer table in this schema let me know what's in the line item table in the schema and I tell it hey stop generating you don't have to keep going I just want to look at the customer table so jump into the customer table and give me an analysis of that table because I don't want it to go through all the tables in the schema that's not necessary for what I'm trying to demonstrate today loads the data in the customer table plots that data explains what's going on in their plots the data in another way to try to show you some distribution that's interesting and as with every one of these data connections that I've been demonstrating at the end of it you don't just have that answer in the chat CPT window you have this entire Dynamic document a computational notebook that has all your code it has your text it has those charts and you can go back to it and you could schedule this as a job you could comment on it share it with your stakeholders share it with your collaborators and build your data analysis and data science collaboratively within a dynamic document instead of just leaving with a static answer for our last example I'm going to show how to connect to a bigquery database as I've done with the postgres database and the snowflake data you create a new data connection this time it's a bigquery connection you give it the name and the credentials and so on you create it you restart your kernel so that it registers it within the kernel you refresh the schema and once you've refreshed the schema back into chat GPT we tell it to use a particular notebook we tell it to query a particular data set and give me some data analysis of what's in that data set you'll notice in this case I actually said to limit The Columns to particular columns that I specify that they have non-null results and only show me 20 000 rows so you can include all of that in your description in chat qpt The Prompt that you build doesn't have to be the simple prompts that I've shown today they can be deeply inflected by your domain knowledge about the data comes back it gives me a chart of service Quest by neighborhood I can say to it that uh this is rather busy can you just show me the most popular service requests it does that easily and along the way it explains to you exactly what it's doing and it's updating the visualizations and again at the end of this I don't just have those static images and static text in chat gpt's window instead I have this Dynamic data driven document that I can go and extend and I can share with folks I can publish certain parts of it and use all of this collaborative Rich functionality that's within a notable notebook so I hope you found that valuable now if you want to connect your data to chat GPT I think I've given you a few different ways that you could do that and if you stay till the end and you've wondered Hey where's Google Sheets and all this take a look at our other video where we talk about moving from a spreadsheet to a data-driven document that connects directly to Google Sheets and does a similar sort of process to allow you to connect to Google Sheets and use the same chat GPT interface and the same kinds of prompts to get accurate data
Info
Channel: Noteable
Views: 29,392
Rating: undefined out of 5
Keywords:
Id: VV8dvip6N3s
Channel Id: undefined
Length: 18min 4sec (1084 seconds)
Published: Fri Jun 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.