Combine Excel Files with Python | Beginner Friendly | Excel Python Automate with Pandas

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
i'm so excited to be sharing this tutorial with you today because automating my excel files with a python script have not only saved me time at work but it allows me to reduce any error had i done this manually so if you find yourself dealing with a ton of excel files that you manually and repeatedly have to put together then this is the video for you my name is laura and let's get started for two python scripts that will output two excel files and i want to show you what those outputs look like uh right now so you know what to expect so let me first show you the data i'm on my desktop and i created a folder called data up here i'm gonna click right into that and if you see here i have from december to august this is my sleep data from fitbit if you don't know what fitbit is it's a watch i'm actually wearing it right now that records all of your activities one of those activities are your sleep so i got this from online um you can use your own fitbit data if you really want to do this or your own excel files and if you want to follow along with me i'll make sure to link them down in the description below so you guys can follow along so let's go ahead and open one of these up just to see what they look like so this is my uh sleep data from december you can see that i didn't record every single day that's alright um it has the start time the end time and all these different metrics all the way down to deep sleep and all of the months have the same data for whenever i recorded my sleep so we are going to create is on all sleep data where it shows all of the months on top of each other on all one excel file and then the second thing we're going to want to do is something similar they're going to be in the same excel workbook except in different sheets so you see over here i have the september october november and so on uh so let's get right to it the id i'm gonna be using is pycharm make sure you have an ide installed or wherever you can run this and make sure you have python installed so i created a folder called automate excel file i'm gonna create a new python file all sleep data and the first thing you're going to want to do is import all the libraries so you're going to put import pandas as pd pans is the library that you're going to be using in order to manipulate all the data to really glue together all the excel files and we're going to import glob before we go any further if you don't have pandas make sure to install it you can either install it in the terminal or with pip install or you could go to file um settings then you're going to see the project click on the interpreter and then there's going to be a plus sign over here or over here depending on the version of python that you have you're gonna click this you're gonna go to you're gonna search for pandas and it's gonna be a lot of different things don't get overwhelmed just click the first one and install the package i'm gonna go ahead and close this up because i already have installed i don't want to get an error i'm just going to put okay close that up so uh we have import pandas import glob globs already um with python um this is going to help us retrieve the file name paths which i'll show you in a second so the first thing we're going to need is the location the location that i have here is called uh sleep data so i'm going to click right in here i'm going to copy that and i'm going to put that in quotes and if you see here you're gonna have to put a double slash otherwise it won't work and then two more flushes at the end and we're gonna put dot asterisks.xl and essentially what this is doing if we look over at my sleep data it's taking all of the files with the extension.xls dot xls is a very old excel version of fitbit this is the excel version that they provided yours may be a newer version of excel so put xlsx so whatever you need and this is really important because imagine you have a file this doesn't have anything else other than excel files but let's say you have csv here's your pictures um it's really useful because you're not going to have to move all these excel files to a different folder if you don't want to so we're going to close that up we're going to put excel files equals dot location and i want to go ahead and print that out for you so you can see what that looks like excel files um if you've never used pycharm just uh right click and then run all sleep data or you could do control shift f10 which is what i'll be using and then toggle this up and now you see it creates a list of all the different file path names so we have august december november um and so on october september over here somewhere so let's uh delete this and now what you're going to want to do is create a data frame if you don't know what a data frame is it you could think about it like an excel file but in python uh an empty one so you're gonna put uh data frame one i'm gonna set that to pv dot data frame parentheses and now we have our um you could think of a spreadsheet to put in all the other excel files so we have multiple excel files so we're going to need to iterate through a loop so we're going to vote for excel file for each excel file in excel files um df 2 equals pv dot read excel you're going to see that it has a whole different options like excel csv that's the great thing about pandas that it reads so many different um so much data from different places so essentially this is reading the excel file and putting it into a data frame format so we can put it into this new one that we created here and uh for each excel file and then here's really where the magic happens we're going to put df1 which is the empty data frame and we're going to go pd.com cat where it's going to glue them all together and we're going to put this in a list so df1 is the empty data frame and then we're going to concat df2 before we send it into excel i want to show you how this looks like print df1 and then we're going to put ctrl shift f10 and you see that well this doesn't have all of our data it's in between these little dots here um if you want to see how the data looks like because i do um you could put an option here pd dot set option wouldn't put display max rose there are 91 rows according to over here and then we're going to run this ctrl shift f10 and if you open this up now you have all of our data we can do the exact same thing for columns and set this to max columns and then i think that there's nine columns here but it's become very messy for me so this is good enough so now what we're going to want to do is look at it and you see that over here it has from 0 to 31 and then 0 to 8 like how many days like i recorded that and i think that that's the august and then here's the december one if you wanted to be chronologically from like one to 90 91 what you're going to want to do is put over here um index ignore index i'm going to equal to true run that and now we have all the way down to 90 from 0 to 90. so perfect so now what we're going to want to do is send it over to excel and we're going to want to do that by putting df one that's the one that we're going to send over we're going to put da to excel and then uh we have to put in the location or you could just simply put the the file path name why i want to put it in this empty folder called combined leap data and just click in here copy this over and i'm going to put this in quotes again make sure to put your flashes and then two more lashes and then the name of the file you want to create so i want to put all sleep data dot xlsx i'm switching it over to sx the newer version of excel because i want the newer version there run this now so ctrl shift f10 and then what we're going to do is we're going to look at the empty um combined sleep data and it's here so let's look in there now you see that we have all of our sleep data almost we almost there if you see here we have this extra index here and we don't really need it because we already have the one on the left side so we want to get rid of that and then if you notice here we have some blanks and if you look at the previous excel file let's look up december instead of blanks there's n a's here you may want that to be blanks but i wanted to say n a just like so we could follow the pattern of our file and so what we're going to do is fix those two things take away that index and put an a's where they need to be so the way you take away those that extra index you could put index equals false okay this will work equals false and then fill in all those lengths with nas or whether appropriate uh setting would be df1 dot fill n a i'm gonna put that into parentheses and then we're gonna go value is equal to n a and in place equals true so it can run in place so we're going to go ahead and run this again ctrl shift oh also you're going to want to make sure to close your the file like let's say that we have combined all sleep data make sure to close that otherwise you're going to get an error we're going to now run that cool it runs here and there's no errors let's go look at our combined sleep data it's still here it just overwrote it now you could see that we don't have that index anymore and the blanks have become n a's just like how we wanted to alright cool so we're done with the first part let's create the multiple sheets so now that you understood this a little bit um creating the other one's gonna be simpler so we're gonna create a new python file we're gonna call this multiple sheets of sleep oh boy and we're going to import pandas spv import glob and import os and os is going to also help us retrieve our file path names so again we're going to want to just use this up so we're going to accept the location we already have it there might as well use that again and then what we're going to want to do is create an excel writer so we're going to do this by writer equals pd dot excel writer and we're going to create um one called multiple sheets so we're going to take this because i want it in the same place but i'm going to call it a different name i want to call this multiple sheets okay and essentially what accelerator is it's a class for writing um data frames and the reason why we're doing this and now and we didn't do it in the other one is because um since we're putting this in sheets sheets is something that's very particular or unique to excel and if we do it the way we did it before it's gonna overwrite them and it's gonna it's not gonna create multiple sheets so we're gonna put four excel file in excel files and what we're going to want to do is create a sheet name um essentially i wanted to be the name of december october november december um so if we just put excel file just so i could just remind you all again and we're going to just print the the sheet just to show you how that looks let's run this i just want the august or like the december or the november i don't want all of this information because it's going to be too long i think that it'll actually cause an error because i don't think she means could be that long uh so what we're going to want to do instead is use um os so we're going to use os.path dot base name excel file and then i just want to show you how that looks print sheet run this and now we're almost there i just want to get rid of the xls but we're going to split it sheet equals sheet dot split split it by uh the dot that it has there and then we're going to get the first item of the list because now it's creating a list and let's just print that over again print sheet i shouldn't have deleted that like three times but it's all right um now we have august december november october september perfect so that's what the sheet name is gonna be called so we need to do is again create a data frame we're going to it's going to read all the excel files read excel i'm going to put excel files perfect and now we're going to want to do is also fill in all the endings that are going to turn into blanks so df 1 dot fill n a value equals n a and then in place okay cool and then we're going to send it over to excel so df da 2 xl is going to be writer because now we have a here the location that we're going to send it to writer then we're going to write the sheet name each sheet name i want to call it sheet and then index equals false so we don't um get that index that we don't need and then we're gonna put writer dot save and we're gonna put control control shift f10 and for some reason we got an error okay right here [Music] okay so now we have it there let's go see if we have it in the combined sleep data we do we have one called multiple sheets let's open this back up and now you see that we have all the data the august december november october september and yeah so now we created the two excel files that we wanted to please let me know if this helped you please let me know if you guys use this i really would love to know your feedback and please like comment share whatever you must do and i'll create another one like this again you
Info
Channel: The Friendly Coder
Views: 9,445
Rating: 4.9731545 out of 5
Keywords: #automatewithpython #exceltopython #pandas
Id: x_k1N1ZjJFM
Channel Id: undefined
Length: 15min 42sec (942 seconds)
Published: Sat Jan 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.