🔥| Effortless Data Science with ChatGPT Noteable Plugin: A Comprehensive Guide with Titanic Dataset

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome to my YouTube channel today I'm going to show you how to do data analysis with notable plugging in chat GPT this tool is a game changer making data science more accessible and interactive believe me if you want to do data science projects and earn money this tool can help you it helps with tasks like data pre-processing data visualizations and building machine learning models you can sign up on any freelancing website to find data science projects and start earning even if you have zero knowledge of coding you can start working with it let's dive into how this tool works so let's first try to understand what is notable notable is a platform that allows us to create and run python notebooks it is a fantastic tool for data analysis machine learning and much more the best part it is integrated with chat CPT enable us to interact with our notebooks directly through conversation let's get started so first of all we go to charge GPT and activate notable blanking if you have not installed the plugin you can go to plug login option so we can go to gpt4 click on plugins and then here I have already installed so I can select notable blanking if you have not installed it you can scroll down go to plugged in store here you will see all the plugins available for chatgpt so these are the most popular one so for me notable is already installed for you you can click on new or all and you can search out for notable plugin unfortunately there is no search bar here to search a plugin directly so once you install the plugin we can close this window and we can come back to gpt4 let's go to notable website and sign up so we go to notable website click sign up and it will ask you to sign up using Google account or your email account and at the moment it's free of charge so once you sign up you will see this workspace here so before we start using the notable in chat CPT we need to create a project here so you can click on create click click on project and write the name of your project let's say I use projects at CPT and if you like you can also write some description here so you can say this logic is about data analysis and then we can click create so you can see a chair TPT project has been created here once the project is created you can copy the link and paste it into chat CPT and say set my default project to this link so we copy go back to chat CPT and we say that my default project to this project so it's a your default project has been set to project chart GPT all the new notebooks will be created in this project by default before we start there are some rules like we want to make a very readable and presentable notebook please note that I'm not experienced with data analysis so you should explain each and everything to me and whenever you tell me about any analysis you made from execution result also make a markdown cell with the analysis so these are some basic rules we can specify okay so we can click enter it will understand our requirement so for data set I shall be using a real-time data set of Titanic survival our goal is to create a machine learning model that predicts survival of passengers based on various features like the passenger age location and much more so the question is how to load the data set into chat GPT using node table so we have two different options we can provide the link of the data set for example if we go to data here I want to use train.csv so I can copy this link go back and I can ask the chat GPT to access the Titanic data set from this link and it will access the data set from this file in this option make sure the data set is available openly no restrictions to the access but to access the data set from kegel.com we need to create an account so charge EPT will not be able to access the data using this way so I will be using the second method so the second method is we can go back to our projects at GPT and we can upload a CSV file so I will click upload and upload from computer so I have already downloaded the titanic.csv file in my compute so I can open it and it will upload this data to chat CPT project on notable website once it is uploaded we can close it go back to chat CPT so we can say please use the data set uploaded on notable with the path here you can mention the path so we can copy path and paste it here so it's titanic.csv simply the name of the file so I say load the data set into pandas data frame I will click so look at that our data set is loaded and ready for analysis so if you want to see The Notebook you can click here and it will take us to notable dot IO and you can see it has created a python notebook here uploaded the data and also provided some information about the data so in this data we have different columns passenger ID a unique identifier for the passenger survived whether the passenger was survived or not P class is a passenger class first second or third name sex age sibling spouse parent children number of parents like those passengers who also have parents and children on the ship ticket number Fair cabin and Embark Port if embarkation from which point they get onto the ship okay now let's get back to chat chip it so before we jump into building our machine learning model let's do some exploratory data analysis this will help us understand our data better so I will say perform the exploratory data analysis of the data to get some useful insights from the data so I will click and it will start doing exploratory data analysis on my data and you can see how simple is that you don't need to write any code you just need to provide a command and it will do the analysis for you so it says I apologize for inconvenience it seems there was an error while trying to execute the cell so it's trying again if you want to see what's going on we can come back here and also see what this plugin is doing so you can see it has completed the explorated data analysis we can go back and we can see it has created a python notebook here so it is starting exploratory data analysis from here so say Eda is an approach to analy data set to summarize their main characteristics often with visual methods so first of all it is checking the data frame dot info so this will provide a brief information about all the columns in the data set sender ID is say there are 400 non-null entries and it is okay integer type similarly for all other columns a brief information is provided so you say from the Bobby information we can see the data set contains 400 entries and there are 12 columns 12 features the features are of different types we have features with integers floats and strings some features have missing values such as age cabin and Embark so you're doing some other basic statistics so you said data frame dot describe so this will describe all the columns in the data set so they're saying that the sky function generate descriptive statistics so the range of Passenger ID is from 1 to 400 survived the mean is 0.39 which means that less than half of the percentages survived passenger class the mean is about 2.3 to suggesting that there are more percentages in the third class each same average age of Passenger is about 28.68 years the youngest passenger was 0.83 years old the child and the oldest was 75 71. similarly the details of other feature is given then it is trying to find unique values in each column and it is also providing the summary of each column here and then it start the visualization so you can see how easy it was I just give one command it has done each and everything for me so here in this plot on x-axis it has survived column zero mean those people who died one mean those people who survived so we can see most of the people who are died on the ship and very few people survived and then it plot the same feature survived with respect to male and female in this plot we can see most of the male passengers on the ship died and very few females died and on the survive side very few males survived and most of the females survived and then it also plot those people who survived with respect to passenger class so we can see a large number of people in the third class died and most of the people in the first class they survived and that makes sense because most of the people in the third class they do not have life jackets and other ships to get out of the ship okay and then it is showing the age distribution here so we can see most of the people on the ship were young around 20 to 40 years old and then it is also showing me the plot for a distribution with respect to those who died and survived so most of the young people between 20 to 40 they died and very few they survived similarly this is the fair distribution so the histogram about shows the fair distribution of the passengers so we can see that the distribution is highly secured to the right which means that majority of passengers paid a low fare that is they were in third class and very few people they pay the high fare that is they were in the first class so you can see it is doing analysis and providing all the details and here it provides the fair distribution with respect to those people who died and survived so it also providing the details so the majority of the passengers who who did not survive paid a low Affair the passenger who paid a high fare had a higher survival rate so this is the exploratory data analysis to get insights from the data now it has finished the exploratory data analysis and it's saying please let me know if you would like to perform any other analysis so let's do data pre-processing so I will say perform data pre-processing of the data to make it ready for building machine learning model because we want to build a machine learning model to predict the survival of the passenger on the ship so first let's do database so I click enter and it will start doing data pre-processing so what is data pre-processing data preprocessing is about checking missing values in the data and replace those missing values with the mean mode or median and if there are any categorical features in into Data convert them into numerical features and sometimes we also scale our features using standard scalar or min max scale to bring them into a certain range so it has started doing the data preprocessing and it will complete in a while and remember depending on the size of your data it may take more or less time if you want to see the progress you can go back to see it started data pre-processing so say data processing is a crucial step in data mining process so it will do handling missing values convert categorical data to numeric and removing unnecessary features means the ticket number has nothing to do with the passenger whether he survived or not so it will remove that feature so if you don't understand what's going on you can always see the details provided by notable so this command it is using to find out the missing values so you see there are 78 missing values in age column and there are three zero nine missing values in Cabin only one missing value in embarked caller so now here it is trying to handle the missing data so for the Embark it replace the missing value with the mod and for cabin most of the values were missing so it dropped that column and then for age fill the data with the median value and now it's a for missing values again and there is no missing value in the data and then it is converting the categorical features into numerical so sex male and female and embarked the port from which they get onto the ship is also categorical so using the pandas dot get that means it is converting them into numerical features and then it dropped these columns the passenger ID name and ticket number has nothing to do with the prediction so it will move those columns from the output we can see that passenger ID name ticket column has been removed now the data set only contains feature that might be useful for predicting the survive okay let's get back to the chat CPT so we say I have completed the data preprocessing of Titanic data set here are the steps we took it handled the missing values as we see from the python notebook convert the categorical data into numerics and also remove unnecessary features so it's a the data set now is ready for building a machine learning model please let me know if you would like to proceed with the modeling or if you have any other question so now we want to build a machine learning model so we can say build machine learning model on the data set to predict the survival of percentage we click on it and it will start building the machine learning model and again if you like to see the progress how it is doing all of this one you can always go back to python Notebook on node table dot IO and you can see the progress so it says building a machine learning model let's say now the data is pre-processed we can build a machine learning model so first of all let's say I will do these steps so you see it is explaining each and everything it is doing to complete the analysis let's say we will split the data set into training and testing which is the requirement of machine learning model and then it will train the model and it will also evaluate the model check the performance how better the model is performing so here we'll use the train test plate using scikit-learn split the data into training and testing and then it say now let's train a logistic regression model on the training side so you see if we write down the code for training logistic regression on the training data set so the logistic regression is already trained so according to this accuracy the model is giving 81.2 percent crack performance let's get back to chat CPT so it say we have successfully built a logistic regression model to predict the survival of Passenger of Titanic the model achieved an accuracy of approximately 81.25 on the test data this may then model correctly predict the survival of about 81.25 percent of the percent please note that while accuracy can give us basic ID of the model performance it might not be best meeting especially for imbalance data set other Matrix like Precision recall and F1 score can provide more comprehensive review okay that's our app for this video so as you can see the notable plugin in chat CPT makes data analysis and machine learning model more accessible and interactive it's a powerful tool that can elevate your data science projects give it a try and let us know what you think don't forget to like share and subscribe for more Tech content until next time enjoy machine learning

Info

Channel: Dataverse

Views: 9,137

Rating: undefined out of 5

Keywords: #ChatGPT, #NoteablePlugin, #DataScience, #DataAnalysis, #Python, #MachineLearning, #DataVisualization, #Coding, #AI, #ArtificialIntelligence, #OpenAI, #DataScienceProjects, #DataManipulation, #DataInsights, #DataScienceTools, #DataScienceTutorial, pythonfor

Id: YtGCxd0Pp7c

Channel Id: undefined

Length: 16min 1sec (961 seconds)

Published: Wed May 31 2023