Easy Spreadsheet Data Analysis Methods - Python Pandas Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome to the channel today I wanted to show you how quickly we can analyze Excel spreadsheet data using a Python package called pandas we'll be looking at a few methods that will do detailed insights into your spreadsheet data or just a few lines of code let's look at the data that we'll be using throughout this video the data that I have for this video is called electric motor data I got this data set from a website called Kegel goggle is just a website that is very useful for pulling datasets if you want to follow along with this example entirely the data set that I'm using is called electric motor temperature like I said our data set contains information about electric motors we have ambient temperatures coolant temperatures and a lot more we can see that this data set is very large and there's no telling how many rows are in it find that out using pandas in just a moment we'll open up a text editor I'm using Adam for this example and the first thing we'll need is to import pandas as PD if you don't have pandas just open up a terminal or a command prompt if you're on Windows and you can use pip install pandas if you're like me and you have two different versions of Python you might need to type in pip 3 install pandas but for now assuming that your pain is installed we'll need to read in that data set wherever you downloaded it that's the file path that you'll need to put here mine was on my desktop so I'll say Excel file path will be equal to just the name of the CSV file mine is called electric motor data dot CSV so electric motor that of the CSV now we need to create a data frame using the spreadsheet we can do that by reading in the CSV file so PD so we're referencing pandas and we'll type in read CSV as a method we'll pass at excel file path now we'll have a data frame that has all the information from that excel sheet within this one data structure let's go ahead and verify that we've read this in successfully we'll do that by just accessing an attribute of our data frame so let's say that we want to pull out the columns we'll save this and open up a terminal or command prompt and execute our script mine is just called example dot pi should get back all the columns our CSV file which we do so now that we've accessed their excel sheet successfully let's go ahead and pull out some more information in just a few lines of code to see how quickly we can analyze this data using pandas the first method I want to show you is a method called info let's go ahead and create a DF info and this will just be equal to our data frame and then we'll apply the method info well print this to see the result of what info does is just gives us general information about our data frame well execute and we see that we get back general information about our data frames so now we know that we have almost a million columns in this data frame the type of data within those columns and that each of the columns actually contains information and is not null this is a good starting step to see how many rows of data you have and what type of data type each of those columns contain but let's look at something that is a little bit more insightful or use a method called describe so let's print D F dot describe and see what this returns well execute again instead of just getting back the columns and the type of data within those columns the describe method will give us a lot more information about each specific column we see that we're given account which is the same information that we had before and the info method but now we have things like a mean a standard deviation min and Max values and then percentiles of that information there's 13 columns within the spreadsheet and that's a lot of information to display in the terminal so that's why some of is hidden with these dots but if you wanted any specific information about any of those columns and you wanted to see it in your terminal you could just index this data frame using that column so let's go up and we'll pull a column out that we can currently see let's say the ID I want to take this and we'll index it out like that now we'll execute again and we should get back just the ID column being described which is exactly what we get now let's say that we wanted to narrow down our data frame a little bit more and we only care about certain columns let's say that we only cared we'll scroll back up to the very top let's say that we only cared about the tort in regards to the profile ID so let's say that we want in the maximum torque number for each of the profile IDs we can do this using a group I method will put grouped data frame will be equal to the beginning data frame will use the method and group by and we just need to pass on the column that we want to group on for us we said that we wanted the profile ID so I'll take that and paste it here we need to apply some type of function to it so let's say that we wanted the maximum torque and now we just need to print group data frame indexed on the torque so this should return us the maximum torque grouped on the profile IDs so for every profile ID will have a maximum torque value well execute again and now for each of our profile IDs we get a maximum torque value so now we know how to get general information we know how to describe every column in our data frame how to pull out certain columns in our data frame and then how to group on those columns in regards to another but now let's say you only wanted certain values within a column so you wanted to only pull out values that match a certain criteria we can do this with boolean indexing let's say that we wanted to access only certain values in our data frame where the data from column profile ID was equal to 4 so if we're doing this using boolean indexing we're saying we only want our data frames values or the data frame column a profile ID is equal to 4 let's save and execute this this returns us all the rows where the profile ID is equal to 4 but this is a lot of information to analyze just in your terminal so let's say that we wanted to create a new spreadsheet with just this information we can do this by creating a variable here so let's say profile ID for data frame is equal to this statement and then all we need to do is say profile ID data frame 4.2 Excel and then we'll create an output xlsx file now when we execute this we should be returned an excel file to the same directory that we're working in once that finishes up let's go access that new output dot xlsx file which is been written to my desktop we'll open that zoom in separated only the information that we wanted so profile ID equal to four and put it in a brand new spreadsheet there's tons of ways that we can do data analysis using Pena's so if I missed your favorite method be sure to post in the comments so I can see it in others as well if you have any questions or comments please let me know until next time [Music]
Info
Channel: Derrick Sherrill
Views: 36,774
Rating: undefined out of 5
Keywords: Python, Python Automation, Tutorial, How To, Derrick Sherrill, Pandas, Excel, csv, spreadsheet, data analysis, code along, python pandas, pandas 0.25, information, info, read_csv, to_excel, coding, programming, python 2019
Id: AW9Kh3UIo9E
Channel Id: undefined
Length: 8min 10sec (490 seconds)
Published: Tue Nov 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.