Automated Data Profiling using Python Pandas (pandas profiling)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello welcome to this awesome video on pandas tips and tricks and this video is going to be on pandas profiling now this particular neat package helps you profile your data on python pandas in just one line now to install this package all you need to do is take your anaconda distribution and fire up your terminal and then type Condor install pandas profiling now if you have not used anaconda as a distribution package then what you can do is you can do simply pip install pandas profiling you can also go to the not install section and then look for pandas profiling it should be available on the list that is there with anaconda I've checked that it's already available on our Condor so you can safely into a condi install and have it on your anacondas distribution now there are a lot of things that the pandas profiling gives us such as unique values missing values quantal Statistics descriptive statistics most frequent values histogram correlations and missing values so let's look at one example I basically taken pandas as PD and then from pandas profiling I have imported profile report now this is one of the key methods that we can use to profile the data and it's just simply this one line that will help you do the profiling now I can just import it at a set train dot CSV this is available on Kaggle if you want this data set but you can use any other data set and then simply run this profile report so it may take some time depending on the size of the data set you have since this is a small one it ran quickly now as you can see the report ran very quickly and then it basically gave us all the profile report that we usually have to do it in multiple lines within Python pandas or Python you can see the number of observations the number of variables missing value percentage size the number of numeric variables scary categorical boolean date text and all of that then it also gives us the top things that is how many have them have missing values and it's also giving the percentages along with it that's really really neat this is something that we usually do when we look at a data set and it's basically giving us first already now there are the variables that are available if it is numeric then it's going to give you the mean me minimum maximum and all of those details along with the distribution but similarly it's also going to give you the cabin information which is a categorical variable and then it sort of gives you the you know sort of categories and how many of them are missing right there are also some repeated cabins which are given here but you know if it is distinct then it's just going to group it on the other values and show it then you have embarked if you have the missing values here fair you see this zeroes percentage which is also crucial to know sometimes and then you also have name which is the first and last three unique values since most of them are unique this is how the district played it then you have part which is again numeric passenger ID numeric we don't need this one we have P class again which is numeric we have gender which is a binary variable boolean sort of a variable but it's categorical because it has male and female within that particular variable similarly other variables that's what we have here and you can continue to see all of the variable names there it also gives the correlation between the variables which is which will be the obvious step if you are doing a regression and then it also gives sample of the data set within this report so so many cool things available within just one line of code for using pandas profile hope you like this tip guys thank you for watching this video if you like this video please don't forget to hit the thumbs up button and please don't forget to subscribe to the channel
Info
Channel: Kunaal Naik
Views: 6,080
Rating: undefined out of 5
Keywords: data profiling using python, pandas-profiling, pandas profiling, pandas profiling report, data profiling, python, pandas, dataframe, profiling, automated, data science
Id: vsL8osE_0HM
Channel Id: undefined
Length: 4min 26sec (266 seconds)
Published: Mon Oct 14 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.