Python Excel - Reading Excel files with Pandas read_excel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello my name is xander and  welcome to the python excel series   in this first tutorial of this series  we take a look at panda's module   read excel function which reads the  excel sheet data into a data frame object   let me quickly take you through what's  going to be covered in this tutorial   so first of all we take a look at the process of  actually getting an excel file into a data frame   once we've done that we can discuss converting  that data into different data structures   and then having a look at some of the other  parameter values of the read excel function   so we know how to work with the readout  excel function to meet our needs   at that point we're focusing really on utilizing  data frames panda data frames so we go ahead and   inspect the new data that's been moved into  our data frame and then go ahead and select   and extract data from the data frame we'll finish  up with just looking at how to select multiple   individual sheets because most of the  time excel we're utilizing multiple sheets if you have just started thinking about  or needing to utilize python with excel   it's worth directing you straight away to  the open pi excel which we will be installing   and what pandas will be utilizing here in this  tutorial so this is a python library which is   probably one of the most popular libraries  to work with python and excel and as we go   through the series we will dip in and out of  pandas and working directly with open pi excel if you are completely new to python there is  a little bit of setup here i want to take you   through if you are already familiar with python  and virtual machines just skip ahead to the next   section in the video timeline all right so  let's go ahead and just make a new folder   i've made a folder on my desktop i'm going  to select open folder and open that folder   i've gone ahead and opened this folder  which was on my desktop and i've made this   new file called example.pi which i'll  be utilizing throughout this tutorial so because i'm going to be installing some other  modules some libraries i'm going to create a   virtual machine or virtual sorry virtual  environment so i just drag this up here   and i'm going to type in py for pi so i'm assuming  you've got python installed and then we're going   to use the m flag to then i'll create a new  virtual environment folder called vemv so this   virtual environment just allows me to create  a new environment to work within so that when   i install new modules new libraries it means it  doesn't update the main um python insulation on my   computer it just works within this small virtual  environment here so now i've done that let's just   go ahead and activate this so vent script activate  so i can tell it's activated because it has   the word vamp here so i've activated now i'm  working in my virtual environment so in order   for us to work with pandas we're going to need to  install pandas so let's go ahead and pip install pandas now if you're working on a are working if  you're working on a um a computer at work it might   be that this might be blocked in your firewall you  might not be able to install pandas for example   so if it does come up with an error that might  be the case so you can see we've installed that   pip install pandas there we go right so pad is  installed so we can now go ahead and import pandas as pd pandas relies on some other tools  that's needed and this is where we're   going to need to pip install the open pi excel  library so let's go ahead and just copy that   and i'll go ahead and also install that  and there we go so we're now ready to start   so we're going to need some data so i've  got two files here which you'll find in   the code repository there's a link in the  video description to the github repository   there you'll find all of these examples as well  as this data so this is just uh a simple set of   uh books see if i can open this up okay so we've  got two sheets here books and books underscore two   you can see that um we've got a  number of different titles here   headings sorry and then we've got ten  books here and four books here right so let's go ahead and import our file and convert  this into a data frame so let's go ahead and   import books.xlsx and we also do the same for  csvs just so you can see the fact that it's not   just excel file sheets that we can work with  there are other file types too and we go ahead   and convert that so pd from pandas read excel  so we use a read excel function from pandas   and then we pass in the excel file in this case   and then we go ahead and create a new function  here called dfcsv again this time using read   csv to read in the csv file so that's how we're  going to get this started by importing in our book so let's go ahead now and create our data frame  from these files so first of all the excel file   so that's the books.xlsx and then we just do  the same thing you might already be familiar   with this with cfv files so we do the same  thing again there and then we go ahead and   now create a new variable df and then we use  pandas and then the panda function read excel   and then we pass in the first parameter which  is going to be the actual file that we want to   then convert that data into a data frame so we  do the same thing again here for the csv file so we can now access this data straight  away by referring to df or df csv so i'm   just going to remove this because this isn't  going to feature in the rest of a tutorial   right so let's just go ahead and  let's just print this out so df   for example now we need to run this  so let's go ahead and run this file   so you can see the output here is 10 rows you  can see we've selected all the data um all   the headings all the data is there and we're  outputting the 10 rows from our book dot xlsx so from this point we can now start manipulating  if we like or selecting different parts of this   data from our data frame so it might be that you  prefer to utilize this data this new data as a   dictionary or you might want to utilize it as csv  or potentially convert it to json so that's easily   done at this point so we can use uh again df to  dick to csv or to json so we can convert that   that data frame into these different data  structures and then we can go ahead and utilize   it however we want from there so let me just go  ahead and print out for example the dictionary so we'll just go ahead and run that so now have  a new detriment of that dictionary of the data   that's been passed in from the excel  file if you take a look at the video   description there is a link there that  will take you to a github repository   all the code that's in this tutorial will be  there for you to have a look at as well as some   reading resources and one of them being the pandas  read excel function api reference documentation   so here you can take a look at all the  different parameters that you can pass in   to the function which may come in use for  your particular use case so we can see here   that the read excel takes in essentially the the  file that's the first parameter but we can add   additional parameters here so there's multiple  parameters we can add so i will be talking a   little bit further about the the sheet sheet name  because you probably already seen from the excel   file that we looked at there were two sheets so  we want to work with different sheets potentially   now notice that earlier we printed let me just get  rid of some of this we printed out the data frame   and what we saw originally was the data from  the first sheet only so we do have two sheets   um in our xls file but it's only showing the  first sheet here so the question being how to   access other sheets we'll look at that at the end  of this tutorial but that is just one parameter   just to give you an idea of what we can pass in  so here for example i could um select the sheet by   the actual index so the first sheet  in your excel file will be sheet zero   the next one one and so on or else i can go  ahead and just select by the name of the sheep   sheet okay so that's a possibility there or one  of the parameters that we can utilize so second   to that is we can also for example specify what  column we want to use so use calls equals and   then we go ahead and just select the column that  we want to actually utilize so now for example if   i print this out again you and now i've now only  selected the title heading or the title column of course i can go ahead and expand that if  i wanted to also include multiple columns let's just give that a go so now return the  title and authors so now we have a better idea   of how to get our data from our excel file into  a data frame let's just go ahead and think about   inspecting this data frame so far we've  seen that we can utilize uh or just call   df function which is going to then show us all  the data um within our we're in our data frame   with from our excel sheet so that can be a little  bit overbearing sometimes um overwhelming sorry   we've got a lot of data so obviously we can go  ahead and start manipulating this so for example   we can utilize head which will allow  us to select the top five or top 10   or the first sorry 20 items or rows columns rows  rows in our data so let's go ahead and have a look   at that so obviously i don't have 20 so let's just  go ahead and just select the first two there we go   so we just select the first two rows and you can  see that we've got 12 columns of this data so that   gives us an idea there so we can do the same  thing again if we just wanted to see the last   two items or the last 10 items by utilizing tail  so let's go ahead and have a look at the columns so that's going to give us a  list of all the columns here   and then finally we can have  a look at the data types so now we have a better idea what data we have or  data we're working with let's now have a look at   some of the techniques to select parts of the  data or some of the data from our data frame   there are various ways of selecting data from  our data frame for example we can directly from   our variable df here we can then select  for example the headings that we want to   display so we'll have a look at that first  and then we're moving to add so here we can   select or access a single value at a time  for example or move into and utilize lock   which can select the multiple rows and all  columns have left a link in the github repository   which will take you to a resource where  you can read through this in more detail   let's go ahead and select some data um  yeah so we're going to print this out   print this out t f uh so let's just select for  example the the heading title so we just want to   display this column let's go ahead and just print  that out okay so here we have the title column   now we can go ahead and select more columns  but if we do this we're going to need double square brackets so let's go  ahead and do that so no horses so now we've selected multiple columns and they're just thinking about the tools that we  learned earlier we can then go ahead for example   and maybe we just want to display a  certain amount so let's just go for   five so that will return the top  five items from our data frame here   if we wanted to be a little bit more  specific and select a specific piece of   data or maybe a specific row from our data  frame here we can utilize this index here   so notice this index starts from zero so we could  say that this row here is referred to index zero   for example so what we can do now is we can  utilize add and let's go ahead and select   that so position zero or row zero and let's  go ahead and also just output the the title and there we go so we've outputted just the  title so that's the the first item in our data   frame here if we wanted to for example select a  range of data well maybe we might want to choose   lock so df lock and this time we might want  to select a range so we select a slice zone 0   3 for example and in addition to that  we may also want to select multiple   columns so let's try that out author horses  so print that and there we go so we've now got   our slice and we've now also selected title and  office so now we can see the general principles   of selecting data from our data frame of  course like we alluded to earlier our excel   file may have multiple sheets so we may want  to work with these multiple sheets individually   in a later tutorial we'll have a look at merging  data from multiple sheets but let's just see how   we can work with initially multiple sheets  that we might have within our excel file   we can select what sheet we want to  work with by passing in the sheet name   uh so that can be for example by index or the  actual sheet name so what we're referring to   here by index for example uh is if i go into  my excel sheets here we can see that we have   two sheets so this will be referenced  as sheet 0 this would be sheet 1 and   so on and so on alternatively we  can just use the name of the sheet let's try and access the second  sheet so that would be sheet name one and now let's go ahead and select one and  let's go ahead and now print out that so   let's just double check um so when  go back here we've got hitchhikers to   bill's book and you can see that that is the  case here so we asked and we have now selected   working with this sheet one now let's go  ahead and i think that was called books   underscore two so if we do that now you can see  that that isn't going to work so now of course   we can just specify the name here so this is going  to be books two so that's the data we want to work   with and there we go so you might want to be  working with multiple sheets so let's go ahead   for example and we can just use the index of the  sheet and so on or we could just use the names   so there we can just specify the sheets we want to  use and then select the sheet we want to use and   then in this case just output some simple data so  for example if we wanted to select all the sheets   we could just place none here for example  and then from here this is still going to   work absolutely fine or for example we want to  get the data from the first sheet so just use the   index reference here which isn't going to  work so that won't work with the reference   so we could do that with one but if we  were for example going to utilize a name   which was just books that's the name of  the first sheet books and there we go so hopefully that gives you a general or  initial feel of selecting multiple sheets so i do hope there's some value there for you  and don't forget there is a github repository   and use that repository to start making some notes  or kind of a quick guide for yourself so that you   can start to remember some of these different  tools now this type of tutorial is bound to get   some thumbs down simply because there's just so  much more to tell you and i've just tried to kind   of skim the fat to get to that kind of initial  starting point and hopefully i've given you a good   grounding of initial information which  will help you now move forward in this area   this tutorial will act as a starting point for  many of the other tutorials that we feature in   this tutorial series so i had that in  mind also when i built this tutorial   i didn't want to show you too much so that  we can progress further in other tutorials   so thank you very much for watching hopefully  you stay tuned to watch the other tutorials in   this series thank you very much for listening  and i hope to see you in the next tutorial
Info
Channel: Very Academy
Views: 71,122
Rating: undefined out of 5
Keywords: python excel, openpyxl, pandas to excel, pandas excel, python xlsx, to excel pandas, pandas xlsx, python and excel, reading excel in python, read_excel(), read_excel, pandas read_excel, read excel pandas
Id: bI68wnoINwc
Channel Id: undefined
Length: 19min 33sec (1173 seconds)
Published: Wed Aug 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.