What is Pandas? Why and How to Use Pandas in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is pandas why and how should you use it keep watching to find out hi my name is Giles McMullen and on this channel I talk about everything related to python there's a set of Python tutorials where you can learn Python for free from scratch I also review other learning materials like courses and books we talk about data science machine learning and everything you can think of that's related to Python if that sounds interesting then do consider subscribing but in this video we're going to talk about pandas what it is what it's used for and how to use it pandas is a Python library that gives you a fantastic set of tools to do data analysis in fact if you're going to work with data using Python then you're gonna need to learn pandas and that's data analysis data science machine learning if it involves data you'll need to know how to use pandas trust me though once you know pandas you won't want to use anything else you certainly won't want to go back to excel and pandas is free with pandas you can load prepare manipulate model and analyze data you can join data you can merge data you can reshape data you can take data from different data bases and put it together and analyze it you can do pretty much anything you want to with data and it all revolves around a structure called a data frame let's take a look at some examples I'm gonna show you how to use pandas very briefly on this Titanic XLS file so this is a file an excel file that shows the passengers on board the Titanic it has what class they were in whether or not they survived their names their sex and age and so on so there's a lot of information about the passengers it's the entire passenger list and there are just over 1,300 entries and this is quite a famous data set so let's explore it we're gonna go to about Jupiter notebook and we're going to import numpy as NP we're going to import pandas which is what we're interested in here as PD so we've created a variable called Titanic DF this read XL is a panda's function and there's a read CSV and other functions that will read different file types we will run these two cells and then using this method here we can at the data that we've got so we've now created a data frame and this is a really important structure in pandas and when you learn more about pandas you'll learn all about data frames and these are the data frames that you can merge and join and do so all sorts of things with it's a really really useful tool and so we can see here that we've got the first five records of this data set and the next thing we can do is we can describe the data set so it'll tell us the count we can see we've got 1046 age records and so a little short of the full amount we get the minimum age and the maximum age and then the quartiles there so that's quite useful and if you look at the affairs that's quite interesting so the mean Fair was thirty-three pounds I guess but the maximum fare was 512 now using this drop command we are going to get rid of some of the data from our data frame because it's not going to be that relevant so we're going to get rid of the ticket column the cabbing column the boat column and the body column and then we're going to have a look at what's left so let's have a look at that we now have a new data frame with less information but it's more relevant information so we've we've trimmed the data frame a little let's carry on so we're going to have a look now at doing a plot so what I'm going to do is I'm going to use this value count function on our data frame but I'm going to do it just on the survived column and then I'm going to plot it using a bar plot let's run that and there we have the results of that plot very quickly you can see that where we have a zero those are the people that died and where we have a one of those are the survivors so data visualization with pan is it's very quick indeed let's have a look now at the proportion of people that survived let's get that a figure so we run the mean command on the survived column and we get 38 percent so you can see with pandas there are a lot of tools that allow you to do statistical investigation into your data very easily indeed now what we're gonna do now is we're gonna group the data in a different way I want to group it by the sex of the passenger to see how that affected the outcome okay so what we have here is we have everything now grouped by male and female I also want to see whether the class of the traveler played a role in their likelihood of survival so now we do group by but this time we do it on sex and the class and we're gonna get the mean figures of both so let's have a look so now you can see we've broken this down into female male and then the class of travel and this is really revealing isn't it so females in first class had a 97 percent chance of survival whereas men in third class or males in third class had a 15% chance of survival and finally let's see what effect age played because they said didn't they women and children first so perhaps we can see whether that was true so to do that we do the same command as before but this time we do it only for ages under 18 and these are the results so in first class those under 18 eighty seven and a half percent of them survived in second class all of them survived out of the females and in third class 54% of females under the age of 18 survived if we look at the males in first class eighty-six percent of male survived in first class in second class it was 73 percent but in third class it was only 23 percent so in a very few lines of code we've managed to really examine our data very well indeed and that's really one of the strengths of pandas you can do so much with so few commands another strength of pandas is working with type series and it's used a lot for this in academia I want to show you an example now of pandas using time series with an example from the stock market this is a short example just to show you a few of the things that pandas can do with the time series but obviously it can do so much more than this we're going to have a look at some stock price day so we've got Apple Microsoft and I've used Quon Ville to import the data that's already loaded so let's just have a look at the header of Microsoft the share price just to see what we've got here okay so we can see the first five entries the data goes back to 1986 we've got open high/low close we've got the volume we've got the ex-dividend information of the split ratio we've got the adjusted prices as well so we've got a lot of information there now I just want to plot the adjusted closed promise from Microsoft so let's have a look at that and see see how we do that as you can see it's very easy we just take the data frame name we choose the column name that we want and we just type plot next to it and if we do that we get a nice graph going back to 1986 up to the present and showing us the price of Microsoft okay now let's have a look at the index of Microsoft because this is the key bit if we type ms dot index that tells us that the index is a date time index so this data has been loaded in to our data frame with the index being the date and that's really useful and what that means is we can do some very interesting things so for example if we just wanted to see the price of Microsoft in 2018 we can just choose 2018 just like that and panda says the rest for us so let's have a look and that's the price in 2018 if we just want to have a look at the price in March in 2018 then we just put in 2018 stash oh three and pandas does the work for us and there it is that's the price in March and if we wanted to do a range say from the beginning of 2018 to the end of March again we just put in the range that we want we put the column that we want to plot on the type plot and pandas does that for us to what I want to do now I want to combine both stocks into one data frame so I can see that information plotted together and here I'm going to join m/s price with Apple price and if we run that now we have this data frame and now we just want to plot it and there you have the plot of the two stocks okay now what if we wanted to just look at what happened to the price in 2017 for example well we do what we did before we can have a look just at the 2017 data and then what other things might we want to do well you can do a rolling average let's have a look at the rolling average and there it is and if you wanted to do a rolling standard deviation to see how much the stock price is move on a daily basis when you can do that too and you can see there straight away that Apple it's a little more volatile than Microsoft this example really is just to show you how powerful panders is now that should give you some idea of the capabilities of pandas obviously we've only scratched the surface and there's no way I can possibly show you everything that pandas can do but I hope I've shown you enough to whet your appetite to go on and learn pandas if you want to learn pandas then start up the pandas website I put a link to the website in the description of this video it's free and they every aspect of pandas in a lot of detail there's even a sort of tech Minik Quick Start Guide which you know is the best place to start also I would recommend a book by where's McKinney now West McKinney is the man who wrote the pandas library so he knows all about pandas and his book is fantastic there's a review of his book on this channel and you can take a look at that too if you've enjoyed this video then please do press that like button also consider subscribing to the channel there are over 100 videos on this channel all about Python there are Python tutorials there are reviews on books about Python there are reviews on courses there are videos about why python is good for data science and there are videos about machine learning so if that sounds interesting to you hit the subscribe button check out the videos on the screen now that you can see and I will see you in the next one bye bye
Info
Channel: Python Programmer
Views: 393,146
Rating: 4.9559884 out of 5
Keywords: python, pandas, what is pandas, pandas python, how to use pandas in python, learn pandas
Id: dcqPhpY7tWk
Channel Id: undefined
Length: 10min 7sec (607 seconds)
Published: Thu May 24 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.