Clean Excel Data with Python and Pandas - 5 Minute Python Scripts - Full Code Along Walkthrough

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey guys welcome back to the channel today I want to go over a subscriber request that we have we have a subscriber here that has a very large excel sheet and he needs to be able to clean the data before you uploads it to a different program we'll talk about how we can use Python and pandas to do just this let's jump over to a text editor to get started so let's just cover the requirement so I will need to do to cover this problem we need to import Excel data into Python then once we import this excel data we need to be able to clean it so we'll type in clean Excel data lastly for him to upload this to a different program we'll have to write this excel data to a different excel sheet so right back to excel I've created a sample excel sheet that we can use for this example there Excel she probably has a lot more entries in this but this is the power of Python we can code it once and then it can be applied to everything in the excel file not just one entry so here we have a column that has first-name and lastname combined into one column we have an important number column and then we have a location column back in our text editor the first thing that we need to do is to import the excel sheet we can do this with a package called pandas so we'll import pandas as PD and then drop down a few lines next we need to define our Excel workbook so let's create a variable we'll say Excel workbook then type in the name of your excel file so mine is clean Excel dot xlsx now we need to read in the sheet that we want to use as a data frame so let's say sheet 1 is equal to P be read Excel as a function we'll pass in the Excel workbook that we defined in the previous step and then we'll say sheet name is equal to sheet 1 next we'll go ahead and print sheet 1 dot head and then passing the number of lines that we want to print this is just a check to make sure that our excel file is loaded in correctly save your file open up a command prompt or terminal depending on your operating system we'll type in Python 3 mine is clean Excel example dot pi we'll hit return execute and then we should be printed our data frame which we are this just returns the number of lines that we specify right here now let's say that you want to clean the first name and last name column which is right here and that you need a different column for the first and last name let's go ahead and create a first names list so we'll say first names list and then we'll pass in empty brackets then we'll do the same thing with the last names so last names list and then empty brackets this is just creating two lists that we can add to later but with no items in them right now now we need to create a for loop so we need to say for every entry separate the first name and the last name and add them to these lists but first we have to pull out this column so we can work with it so let's type in a new variable will say Excel names and then we'll pull out that column by specifying the data frame that we want to pull it from so sheet 1 square brackets and then we'll pass in the column name I'm going to go down here and copy and paste and then we'll print Excel names to make sure that we pulled the column successfully once we execute again we see that we pull all the first and last names in that column let's go ahead and comment out this and comment out this that way it doesn't interfere with their terminal now we need to create a for loop so for every name in this column we want to separate it into a first name and a last name so for name in Excel names and put a colon we'll save first name last name equals name dot split and then we'll put in what we split on so we want to split on the space in between these names then we need to pass in one what this is doing is it's splitting the first name from the last name by the space value now we just need to append these names to the list values first names list dot append and then first name next we'll do the same thing with the last name so last names list dot append and then last name well drop down a few lines and get out of the for loop by decreasing our indent and then we'll print first names list and make sure that we're appending these successfully well drop down and execute the file again we see that we now get all the first names in a list we can easily add this back to your data frame we can do this by using an insert function so we'll say sheet 1 dot insert to the zero position the call name of first name and then we'll pass in that list so first names list now we need to do the same thing for the last name so sheet 1 dot insert into the second position or the first index position the column name of last name and then we'll pass in the list so last names list now since we have two separate columns for the first and last name we no longer need a single column that includes both the first and the last name we can use the delete keyword and then we can call out that column of the sheet we do that the same way that we did before so we'll say delete sheet 1 first and last name column now we should have one column with the first thing one column with the last thing and no columns that contain both the first and the last name we can it print sheet 1 dot head to see this when we execute it we see that we now have two columns one with first names and one with last names and we also were able to delete the column that had both the first and the last name now on the request from the subscriber we wanted to change all of these to an upper case we can do this by using an upper command so wherever you designated the string we can put an upper so first name dot upper so we're changing the string to the upper case value we'll save this file again go down and execute it now instead of having all lower cases we have all upper cases using the upper command to format that string we're also able to clean numbers so let's say important numbers equals sheet 1 and then we'll pass in this column important number this is just pulling this column out so we can clean it sometimes when using Excel you may have a different format than what you think you may have while these may look like numbers they could actually be strings so let's use PD to numeric and then important numbers now we'll print important numbers to make sure that we're pulling these in successfully which we see that we are next let's create a new variable for these will say edited important numbers here we can do whatever we want to this variable and pass it back into our data frame I'm just going to multiply it by 2 but the idea here is that you can do more advanced stuff - these numbers - and then we'll override it to our data frame by pulling how we pulled it out before and setting that equal to our new edited important numbers when we print this back to the terminal we see that everything has been multiplied by 2 which is exactly what we wanted now that we've covered how to clean Excel data let's talk about how we can write it back to an excel sheet to do this all we have to do is specify the data frame so R is a sheet 1 and then type in the function - Excel now we just put in the value that we want for Excel files name so let's put output dot xlsx we'll save this execute again now when we execute this it should write a new excel file that has our data frame opening up the excel file we see that we were able to clean all the data from the original excel sheet and write it all back into a new one and that's pretty much it for this video I hope this is a good baseline on how we can clean excel data and write it back to excel if you have any questions or comments please let me know until next time [Music]
Info
Channel: Derrick Sherrill
Views: 39,140
Rating: undefined out of 5
Keywords: tutorial, automation, python automation, python script automation, python scripts, scripts, python code, cool python scripts, useful python scripts, automate excel, python excel, python tutorial excel, code along, code tutorial, python code tutorial, clean excel data, write excel files, write excel, python write excel, excel automation, python and excel automation, clean data, python data cleaning, five minute python scripts, Derrick Sherrill
Id: o8uGCFZfBmM
Channel Id: undefined
Length: 7min 41sec (461 seconds)
Published: Fri May 03 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.