Python Pandas Tutorial 4: Read Write Excel CSV File

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
dear friends in this tutorial we are going to look into reading and writing CSV and Excel files in pandas and here is the list of items we are covering in this video okay let's get started what we are going to do first is start a Jupiter not book I prefer Jupiter not book because it is great with data visualization you can use whatever IDE you prefer py charm not that plus plus it really doesn't matter I have started my not book and the first thing I am going to do is create a new notebook by selecting new and python first thing we are going to do is read a CSV file into a pipe pandas dataframe I have this CSV file which contains stock market data I have tickers earning per share revenue price and the people associated with that company CSV file is nothing but a comma separated value file so if you open it in plain not paired plus plus it looks something like this where you have value separated by comma these files are lot similar to Excel files so whatever methods and properties we are going to cover they pretty much apply to Excel files as well okay so let's begin by importing pandas as PD as usual and then you will create a data frame by saying DF equal to PD dot read CSV and in the bracket you supplied the name of the CSV file which is stock underscore data dot CSV okay and then you print your data frame when you do this it will successfully read this CSV file into a data frame so let me now keep this csv file open on the sides so that you can see what's going on so i'm going to arrange my windows here so that you can see csv file as well as the not all right this looks good okay now here you will notice that it read the whole CSV file into a data frame and it successfully imported the column names and all the rows okay now what can happen is sometimes you might have an extra header in your CSV file so I'm just going to insert an extra header here and let's say you can have like one row you can have couple of extra rows and what you want to do is really start from start reading from here okay so let me first just read this as is and see what happens so let me save it and run it so when you do this what will happen is it will treat this particular line as a header now what you want to do is you want to really skip this row okay so the way you do that is by supplying skip rows arguments so they skip underscore rows and when you save one it's gonna skip one row from the top so when I run this I'm getting and so looks like let me see what is the excitment method name so if you google pandas read CSV it's gonna list all the methods that you have so if you look at here and if you search for skip so here it says skip rows okay so I provided underscore which was wrong so now you can see that it skipped this first line and started reading from second line you can also do the same thing by saying header equal to one and it will produce the same output so it's saying that my header is located at row number one okay all right now we can have a reverse situation that so that instead of having this let's say extra header you don't have this header at all so let's see your CSV file is something like this where you don't have header but you somehow know that this is sticker this is EPS and so on so what you want to do in that case is you want to say header is equal to none okay so what it will do is it will automatically generate these columns names but these column names are not good it says 0 1 2 3 4 which I don't like so the way you supply these column names is by providing names argument into your real CSV method and you can provide the column names in sequence so your first column is sticker your second column is let's say EPS your third column is let me see what it was it is revenue revenue price and people okay so revenue price and people revenue price [Music] people when you run this you will notice that it will create this column so always use header equal to none and names equal to column names whenever you have a missing header in your CSV file okay now let me revert back my CSV file to its original state which is this and I am going to this extra thing here okay now what can happen sometimes is your CSV file is let's say pretty big and you want to read only few rows from it let's say you want to read only maybe three rows okay so the way you do that is by doing n rows equal to three so n rows means number of rows that you want to read into dataframe and when I say three it will exactly read three rows excluding your header okay so sometimes if you have millions of live lines in your CSV or Excel files and you want to limit how many lines you want to read then you you can use n rows argument okay the next use useful argument is any values so here you can see that in my data frame there are a couple of problems couple of cells I have as not available any etc okay so I want to read these things as not in number values n a n okay so that you can do by saying any values equal to not available and and dot a what this means is whenever you are reading CSV file and if you encounter not available so not available and any turn that into n a and value okay so if you do this you will notice that okay I was missing dot so that makes a good one so not is that it replaced n dot a with an A and it also replaced not available as and then and here also there was so if you look at this file here this this and this these three cells are now replaced by n n okay so this is useful in cleaning up the cleaning of the messy data that you get from your CSV or excel file okay so we took care of not available and any values but there is one more problem with this data frame if you notice it carefully I have revenue column and I have minus one as a revenue now any company's revenue cannot be negative it is either zero or greater than zero so this is clearly a problem and I want to convert this to n a n now if I put minus 1 here then what's gonna happen is it's gonna convert this to an end but at the same time it is going to convert this guy also as any n now earning per share could be negative companies don't report negative owning per share but in theory it could be negative so you don't want to convert this guy to n n so how do you do that if you put minus 1 directly here then you notice that this guy got converted to an end well in that case instead of supplying a list you can supply a dictionary so what you can do is you can supply a dictionary and you can save this you can say that in my earning per share column whenever you find n dot a or not available convert that to an end ok but in my revenue column you can convert both of these things - ok you can convert both of these things to any end but in addition to that convert minus 1 also into an N because I that revenue cannot be negative so negative value is always NN okay and same thing now with people you can say not available or or any both are and and when you run this you will notice that -1 still stayed as minus 1 and my revenue which was negative here got converted to n a n so this is pretty powerful feature this any underscore values could be useful in cleaning up your messy data and this allows you to do data munging or data wrangling ok alright so that was about read CSV now how about writing back to CSV so for writing back to CSV what you can do is so I have already this data frame with me whenever I want to write this back to CSV the way I have read CSV file there is it to CSV file so I have to CSV file where I can say new dot CSV so I am writing this particular data frame which is this guy here to a file called new dot CSV and when I run this it executed fine so if I go to my directory I will find this noon new dot CSV file and when I open it ok let me minimize this file when I open it you can see that it wrote this CSV file ok now the one observation I have with this file is it wrote this index also so by default it writes the index if you don't want to write this index then you can say index equal to false and when you run this ok permission denied cause this CSV file is open right now so let me close it do not save and when I run this again it worked this time so if I open new dot CSV now what happens is I now don't see that index being written to this file okay it has a couple of other arguments also sometimes you don't want to write let's say all the columns let's say I want to write only two columns tickers and EPS and I want to skip exporting these three columns to my CSV file and the way you would do that is by providing columns argument so first first let me just print the DF columns so I have these many columns and what I want to do now out of these columns I want to write nu dot CSV and export only two columns which is tickers and DPS so DF not to CSV the name of the CSV file where you want to export and that column names when you run this looks like it worked open new dot CSV file and you will notice here that this file has now only two columns to it excellent okay now sometimes you want to skip exporting the header so to do that you can provide header argument and say header equal to false so when you do this let me close this guy here and run this looks like it worked again and if you look at this thing now here you don't see any header so it skipped a writing header to my CSV file okay that was all about to writing to CSV file now let's look at read Excel okay so I'm going to clean up all these cells and I have an excel file here so let me just open the excel file it is basically same as the CSV file that we had okay so I'll just put it here you can see the same stuff I have this sheet one here and all the same columns so import pandas SPD and the way you read excel file is by doing PD dot read Excel and the name of the excel file is doc data dot XLS X now I have my file in the same folder as my Jupiter not book if you have it in a different folder then you need to provide the complete part okay all right the next argument is the sheet name here it is sheet 1 so she'd one here ok and print EF when you do this you can see that it created data frame successfully okay now sometimes you want to do conversion of your cell content from excel ok so here you can see that I have any in Walmart's people column now I know that Walmart's key person was Sam Walton so if you want to convert this any into Sam Walton then you can use converter so the way you use your converter is basically this so what you do is you will define a function so let me first define a function you will say convert people okay convert people sell and this is your cell content and you will say if cell is equal to n dot e okay then what you want to do is return Sam Walton otherwise return self whatever you got okay and then here in read excel file you can supply converters argument which is which takes converters argument will basically take Python dictionary okay and in Python dictionary you can supply the name of the columns so for people column I want to use this particular function so whenever it is reading people column is gonna call this function for every single cell for every single cell that you're reading in this column is gonna call this function and do this conversion okay so let's read this so when you read this what happened is you can see that instead of Walmarts key person to be an a now it replaced it with Sam Walton okay now this is not a very good example because because I'm always converting any to Sam Walton I should be looking at the ticker but this is just to give you a demonstration on how converters exactly what okay you can also do the same thing with with other columns let's see if you have some kind of logic to convert EPS cell then you can do the same thing you can write a converter routine for your earning power she'll unimportant share column and you can say okay whenever you find not available you want to return none okay and returns out back and when you're done this you will see you got any n okay so this is how you use converters to convert your masse data into something meaningful that can be post processed later on easily okay last thing we are going to cover is a writing excel file okay so now I have my data frame here and if I want to write it back to an excel file I can do something like PD dot to excel okay to excel and the to excel function takes the name of the file as an input so my name of the excel file is xlsx new dot xlsx and you need to supply the sheet name where you want to write to so you can see stocks okay and this should be data frame when you run this looks like it wrote to this file successfully so if you open the directory AHA you find this new dot xlsx file and when you open it you will see that it wrote this file successfully now again same stuff if you don't want to write your index you can supply index to be false also let's say in this excel file you want to start writing at certain offset let's say you want to start writing at okay so start so there is a there is an argument called start row you want to start writing at row number one and column number let's say - okay so let me close this don't save run this again and when you do this your new dot excel file hat is now returned from column number two so 0 1 2 so it started from column number 2 and row number 2 so 0 & 1 row number 1 okay so this is how you can write at certain offset or location in your excel file also same stuff if you don't want to write this index column to excel then you can always skip it by saying index equal to false okay let me close this and run this again and if I open new excel file you will notice that I don't have that index column being written to my excel file okay now if you have let's say two data frames and you want to write these two data frames to same excel files in different sheets how do you do that well there is a class called excel writer which you can use so let me first create two data frames so I'm going to erase all this self and create two data frames so these two data frames are stocks and weather okay so I have these two separate data frames which I created successfully and what I want to do is write these two data frames into one single Excel files as two different sheets okay so you will in that case use this excel writer class so in panda's data frame in pandas module you have this excel writer class where you can say with PD dot excel writer whatever is the name of your excel file writer and you use that writer to write your individual data frame so I am writing my stocks data frame in sheet called stocks and my weather data frame in a sheet called weather okay and when I run this it works and I have this stocks underscore weather file created when I open this file you will notice that this file has two sheets talks where you can see all the stocks information and weather where you can see all my weather information and if you look at your data frame here you can see you have your weather data 30 to 35 30 28 okay and if you look at your stocks you have your stocks data you can see all these values which is same as this particular data frame okay so that was all about reading and writing to excel files I didn't cover all the properties that reading and writing excellent CSV files offers if you google panda's documentation on less read CSV or write CSV you will find all these useful properties so for example for read CSV you have so many other important properties that you can use for example you have separator and Eliminator okay so now in my case I had a comma separated file where a separator is comma but sometimes you might have pipe or some other character as a separator but using read CSV file you can handle those characters by using this separator or delimiter argument you have other useful properties as well such as true values false values where you can say that if I have this particular value then consider it as or a false value you can also use a footer for skipping your footers you can also use skip blank lines so sometimes if you have blank line in your file you can just set this to be true and it will just ignore them next tutorial is going to be how do you handle missing data in pandas so stay tuned and have fun learning pandas if you liked this tutorial please give it a thumbs up below and also share and comment on whatever feedback you have for me okay so that was all I had thank you for watching
Info
Channel: codebasics
Views: 477,551
Rating: 4.9049788 out of 5
Keywords: pandas excel, python pandas excel, pandas csv, python pandas csv, pandas excel tutorial, pandas csv tutorial, pandas tutorial, pandas read_csv, pandas to_csv, pd read_csv, pandas write csv, python read excel, python csv tutorial, python write csv, python csv module, python excel, pandas read, pandas write, pandas read excel, pandas write excel, pandas read csv, how to read csv file in python, read csv file in python, csv python, pandas read csv file
Id: -0NwrcZOKhQ
Channel Id: undefined
Length: 27min 3sec (1623 seconds)
Published: Sat Feb 04 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.