How I use ChatGpt Code Interpreter as a Data Analyst

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
has rolled out something really really cool and you might ask yourself so what well if you stay here for a few minutes you will know why you absolutely need to know how you can use this tool it's called code interpreter and it's already live for all Church BT plus users and don't let the name code interpreter mislead you this isn't just about code this bad boy can do a whole lot more so guys welcome back to my channel if you haven't met before I'm Laura I'm a data analytics lead working in London and in today's video I want to show you the eight ways chat CBT code interpreter can completely revolutionize the way you deal with data and pretty much make everyone here a data analyst I will also cover the huge limitations of this tool that not many people talk about so make sure to stick till the end so first of all how do you get started with this crazy tool first off as we said you need chat CPT plus once you've got that shorter look for the three dots at the corner and head into settings under the beta section you will find the new code interpreter Fleet that switch to green and you're all set don't worry if you can't see it even if you got the paid version just give it a few days remember this is still the rollout phase and when you're ready to go hit new chart select gpt4 and make sure you got the code interpreter checked on now let's see what we can do with it okay so the first case will be a game changer for anyone preparing for data science interviews you know how these interviews go right they often give you a data set throw in a few questions and ask you to come up with a business case study it can be quite time consuming and stressful especially when you are racing against the clock but guess what code interpreter is here to save the day okay so here I'm opening the Chuck GPT plus chat and this is the new feature so you see this plus button where I can actually import any type of file so I import here a file that is called Data CSV and now I'm basically asking questions that I might have received during my data interview that are related to the data set and so these questions might be something like what are the top five Revenue generating products because we are working now with the e-commerce sales data set which countries being in the highest revenue how would you segment our customer base can you predict next quoted Auto's revenue and so on and so if you give it a few seconds the chart will actually provide you with a very comprehensive answer you will first of all check the data set and give information about what it contains and as you can see it's breaking down each of the questions and giving an explanation quite comprehensive for all of them and so now what I can do is actually going back to the answer and if I click on show work then I can actually see what's the what the chat is doing on the on the background and so this is for example a python code that the chat created to answer a specific question and you have that for all the questions as well so yeah just crazy and so easy to see here how this has the potential to change data interviews forever because basically what I'm doing in just a few seconds I already have the answer for my data interviews questions so we all know the saying garbage in garbage out in data science quality input is everything and that's why we spend so much time cleaning our data sets but what if I told you that code interpreter can handle this task without any efforts freeing up your time for more exciting aspects of data analytics and again let me show you here an example so we'll use the famous Titanic data set from cargo which is well known that the science Community for its need for cleaning particularly dealing with the missing values okay so here what I've done is uploading the data set which is called train.csv and I'm asking a few questions in the front so I'm asking assess the data set and provide me a summary perform necessary data cleaning to handle missing values and any outlines save the clean data to a new CSV file and then provide me a summary of the cleaning process what changes you made why they were made and the state of the data set after cleaning and so I run this one and so as you can see another very data answer and we can now check the summary at the end of the cleaning process so he's saying that he dealt with the missing value so for example the column age was filled with the medium age in case of missing values and then he also dealt with the outer lions for example the H column and so our clients were not removed but their impact was reduced by using a robust scalar so again like very Advanced method to deal with outliers now exploratory data analysis or Eda is like shaking hands with your data it's a crucial First Step where you get to know your data set its structure variables and perhaps even uncover a few hidden secrets and so with the code interpreter again it's as simple as typing in a few questions so a great data set for explorator data analysis is the red wine quality data set which again I found in cargo and so this data set contains different chemical properties of red lines like acidity sugar pH Etc and here's how you might use the code interpreter for Expo oratory data analysis for this data set and so after uploading the data I in my prompt I write load the red wine quality data set show a summary of the data set including column names date data types and account of missing values for each column and also provide descriptive statistics for all numerical variables and also show me the correlation Matrix for this data set Megan I'll run this prompt and as you can see it just gives me everything that I asked for Plus at the end this is building this very nice correlation Matrix that again would take quite a lot of time to to do it otherwise and now have you ever found yourself looking at a chunk of code scratching your head trying to figure out what the code actually does we've all been there and again the code interpreter here can be of a massive support imagine having a personal guide that walks you through the code line by line explain the purpose of each function unravel is the logic behind those nasty the loops and even comments on the efficiency of the code and this is exactly what code interpreter can do for us and so let's go for another example using this python file that I found on GitHub and so here I input the python file and then my prompt is to download and load the python file and then explain the code in the file breaking down its main tasks and also provide at the end any suggestions to improve the code efficiency and so here again I have a very good answer with the all the details about this code and also it's actually saying that the the script is fairly straightforward and efficient for its purpose and give me a few suggestions from now to improve it and actually here I can also ask to create a comprehensive documentation for the code that is also a task list so many times is very consuming and therefore overlooked by data professionals okay so this one will blow your mind I found this data set on cargo that basically gives you all the geocoding information for or cities in the US mainly the name of the city the state the latitude and longitude and then I asked the chat to use this data set to create an interactive map so even though the chat could not plot the map on the chat itself and I'm not sure if this is only a limitation of the beta version it gave me the python code that I ran on my jupyter notebook environment and this is the result a fully functioning HTML site and so here if I run the code I see this really beautiful map with all the cities in us I can search for region and then as you can see I can zoom in and see all the details for example here in San Francisco and then again I can zoom in even more and then zoom out on different areas and again this took me just a few seconds to create now another super cool use case is to ask the Chan to create data visualizations directly in chachibit itself so here I uploaded a data set about the Bitcoin price over time and asked to give me interesting insights and also visualize them and so here again I upload the Bitcoin stock market analysis CSV file and then I'm asking to analyze the data and visualize the five most interesting insights and after this is completed here are some of the charts looks a bit weird but overall this is a very good analysis that it generated in a few seconds and now what I can do is just save these visualizations as image files if I'm interested and also what I can ask is to personalize the chart that is created so here for example I'm asking for change the colors of this chart using only different shades of blue and here of course the use cases are endless as I can ask the chat to build different types of analysis and models like sentiment analysis time series analysis machine learning models 3D charts or images and so on this one again is mind-blowing I gave the chat a data set about Netflix movies and TV series and this is what I asked so based on these Netflix movies and DV series data set to create a PDF presentation giving me the most interesting 5 in science you can find the PDF should have the first page as the title of the presentation and having one inside the page from page 2 onwards and I want the question you're trying to ask involved then the summary of the insights and then the data visualization showing the inside okay so let's open the PDF file now and see how it looks so here I have the as I asked the first page is just the title of the presentation and then the second page how the movies and TV shows distributed on Netflix so I See This Bar chart and also the inside summary in the in the same page exactly as I asked then if I scroll down in the PDF file I see another bar chart with which countries produce the most content and again uh Insight at the end and then as the next page I see how has the amount of content release change over time and I see again a nice line chart with the summary of the inside at the end okay so let's go for the next use case so you probably know that performing analytics on a structured data set like an Excel or CSV is one thing but doing the same with unstructured data like word or PDF documents is a completely different story and sometimes it's just not possible to work with those type of files and so here I tried asking chachibiti to create a clean data set from a PDF file then again I found online the PDF shows a list of invoices and again the result is incredible so I had to spend a bit more time on this one to give more details on the information inside the PDF but this was the end result a clean CSV file created from a PDF and now obviously I can go on and create insights based on this now the last two skins is something that unfortunate I cannot show you as chat GPT as per the time I'm recording this video as restricted the access to web feature but when it becomes available again you can ask the chat to visualize data from publicly available sources and this means that you don't even have to spend time finding the data but the chat will do it for you which is again a very powerful feature and that's why I wanted to include this as well now apart from this amazing cases it is also very important to point down the limitations of this tool and I have a list phone to go through which I found online while doing my research so the first one is limited access to document content and so due to token constraints the code interpreter can only use part of the provided documents and this limitation means that for longer documents this model has to guess the content and interpret from the available context and so this means that the accuracy and understanding of the document may be compromised of course the second one is the lack of persistence for files and generated code so this code interpreter does not retain files links or code blocks beyond the single section and so as a result if the session times out or it's terminated any previous files or links or code blocks generated during the session may not be accessible and users would need to re-upload or re-enter the relevant information number three is the absence of the dedicated business intelligence tools like Tableau while you can interpret and execute code related to data analytics and visualization it may not offer the compliances features and capabilities of more specialized bi tools and then the last one is data security concerns so while the code interpreter is a handy tool it's not a secure platform for handling sensitive data and this applies both to your personal data but also the data of your company in case you are thinking to use it at work and on this point actually Bear in mind that there was a recent data breach involved in chat CPT just to demonstrate that this tool is still very far to be a safe environment and this is it for today's video this is what I managed to do with the code interpreter and I'm sure this is only the start of something that can truly revolutionize the data science and data analytics word as we know it today guys I'm really curious to see what are your thoughts about the code interpreter in the comments down below and if you have any questions leave them down there as well as always if you found this video helpful and you want to stay up to date with the latest on data science and analytics make sure to subscribe to my channel I will leave here in the screen some other videos that you might like and well enjoy the rest of your day ciao for now and see you in the next one
Info
Channel: Lore So What
Views: 801
Rating: undefined out of 5
Keywords:
Id: psXo54Av__w
Channel Id: undefined
Length: 13min 12sec (792 seconds)
Published: Mon Jul 17 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.