Python Pandas Tutorial 8. Concat Dataframes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
dear friends welcome to core basics coding tutorial in this tutorial we are going to learn about pandas concatenate now concatenate is an operation that you do when you want to join two or more data frames without talking much about it let's jump straight into it so I'm going to run Jupiter not book as usual now if you're not watch my Jupiter not book tutorial before then I would say you should pause here and watch that tutorial first okay so I have my are not book up and running I'm going to click on new and start a new not book and the first thing that we do or always is import pandas as PD now I'm going to create my data frame first okay so we are going to have weather data stored in two different data frames so let's say I have India weather and for India whether I have name of the city okay so let's say I have three different cities in India and for these cities what I have is a temperature data so this represents an average temperature throughout the year in these three cities and another thing I have is humidity so the humidity levels in Mumbai I know they are pretty high so I'm just putting some numbers here Delhi is pretty dry so the humidity would be low and again Bank lower it should have a high humidity and now you're going to so what I'm going to do is just create a data frame out of it okay and so this is how you create can create a data frame by giving a JSON object as an input and when you say control enter its gonna show you that data frame so I have this data frame right now I'm going to create second data frame which will be let's say weather in USA so this will be us weather and I'm going to put some dummy data here so let's say I have New York Chicago and let's say Orlando okay so these three cities and some average temperature so which temperature let's say in New York is this much Chicago I know is pretty cold whereas Orlando is hot and let's say these are the humidity levels in these three cities okay now I have two data frames and I want to now join these two so that I can get a single data frame which has weather data for all the cities in India and in USA so pandas provide PD dot concat function so in the period concat function the first argument you pass is the data frames that you want to join together now I have two data frames here India weather and us whether you can actually pass more than two data frames here and it should work okay all right and this should return back a data frame object okay so when I execute this you can see that it created a single data frame which has data from all the cities okay now here what it is doing is it's using the index from original data frame so you can see that the New York's in is still zero okay so sometimes let's say you want a continuous index so you want to have 0 1 2 3 4 5 in that case you can what you can do is you can pass an argument in concrete function call ignore index ignore index is equal to true and when you do that you will see that now you got a continuous index okay so if you want to know more about all the arguments that pandas concat function can take then just go to pandas website documentation and here type in concat and it should give you the documentation on various arguments that it can take we just used ignore index which was by default false we made it true and then it will just ignore the index in your original data frame okay now you can also pass keys let's say I have this data frame here now I want to retrieve the weather data for Indian cities from this data frame so the way you can do that is when you create this data frame right here what you can do is you can pass an additional argument argument called keys okay now what this keys is gonna do is okay let me increase the font size a little bit so that you can see better okay so what this keys is gonna do is you can associate a key with each of these data frames that you have passed in this list here okay so my fourth data frame is India weather so I can pass India as a key and US as a key for the second data frame and when I run it alright so this is what I get I think something is wrong let me eat this okay so looks like if you have index ignore index then it doesn't work so you have to remove that so now you can see that it created an additional index so you have this numerical index for each of the rows in addition to that it created this additional index for your subset of a data frame and the way you can use this index is you can simply say DF dot log hello systems for location and if you say India then now you can retrieve a subset of your data frame similarly if you say US you will get back your US data so having this index is useful when you have merged your data frames into big data frame and then from big data frame now you want to get your original data frame back for specific criteria okay so that's the use of keys okay now until now what we did is we appended two data frames on top of each other but sometimes you might have a case where you want to append second data frame as a columns instead of appending just as rows ok so let's look at such a use case so let's say you have a temperature data frame okay and this data frame contains only temperature data so let's say it looks something like this okay and when you print it it gonna be okay I made a mistake here and when I execute this I get a temperature data frame and and then you can have data op wind spin in into let's say another data frame so what you have is data from same cities but instead of temperature now you have wind speed okay and I'm just going to put some number here and this needs to be dataframe it can't be a plain dictionary and your wind speed looks like this so now you have these two data frames and when you append these two data frames ideally what you want to see is you want to see wind speed appear as a column in your original data frame okay so you want to get a final data frame which has city temperature and wind speed so how do you do that if you just do PD dot concat and pass in the argument these two data frames so let me pass these two guys here and then you okay this should be PD okay and save your result into this DF then what happens is is gonna just append the second data frame as rows here what you want is wind speed to appear as a column so that this part right here should go here okay it should not create additional rows in order to do that you can use X's argument so when you say X is equal to one okay now let's look at the documentation so when you see the documentation by default access is zero means it will append second data frame as rows but when you change X is to be one now it's gonna append them as column so you can see that now you have temperature of Mumbai is 32 and wind speed is seven okay now what might happen if let's say the order of these cities is different so what I'm going to do is data is always not perfect so let's say you're missing data from Bangalore and the order of city is different here so first you have Delhi and then you have Mumbai okay and now when you execute this see what happened it just went by the rules so first rule was Mumbai and first row was lily here so Mumbai and Delhi here so it just upended them now this doesn't look correct you want Mumbai to be here right so in order to do that you can use index argument so in pandas dataframe you can while creating a data frame you can always pass an index okay so let's say my index here is 0 1 2 so 0 corresponds to Mumbai Delhi is 1 and Bangalore is 2 I just pass 0 1 2 even if you don't pass it it's still taking the right index okay and then here in my index now see Delhi's index is 1 so I will first pass 1 here and Mumbai is index is 0 so I will pass 0 here and when I do that I get the correct index now you see Delhi has 1 so here also Delhi has 1 and Mumbai has 0 so here also Mumbai has 0 okay now when I run this see now it's gonna work okay so Mumbai has 32 as a temperature well as a wind speed so here you can see 12 is my wind speed so now this is working correctly so remember to use index index is a way to align rows from different data frames while using concat operation last thing we are going to cover is we can also come join our data frame with a series so let's say again I have this temperature data frame as you can see here and if you have let's say a series okay so let's say you have pandas series here okay and pandas series let's say the name of the series is less a event okay and let's say you have so what this series contains is an event like overall how is the weather in these three cities so let's say Mumbai it is often humid so humid is the main event in Mumbai in Delhi it is often dry and sunny so I'm going to put try and in Bangalore it often rains okay so this is how I created my series and now I want to append this series into this data frame so again you can use pandas concrete operation here and then in the argument you can pass your temperature data frame and your second argument is your series and here now you are going to say axis one because you want to have event as a column you don't want to happen as new rows okay and then you get your resultant data frame so when you execute this you can see that you just added your series as a new column into data frame okay so that's all we had a we had for this tutorial in the next tutorial we are going to look into better way of joining two data frames and that better way is March so for example in this case here when you join two data frames you had to explicitly mention this index wouldn't it be nice if it can automatically detect these values of the cities and it just joins them in a correct way so that you don't have to pass the index that better way is given by maj and that's something we are going to talk about in our next tutorial until then goodbye and thank you very much for watching the that not not book created and this tutorial is available in the video description below I have a github link it is of free to download so you can download it and play with it alright thank you very much for watching
Info
Channel: codebasics
Views: 110,419
Rating: 4.933661 out of 5
Keywords: pandas concat, pandas concat function, concat pandas, pandas tutorial, pandas python tutorial, pandas dataframe, pandas dataframe tutorial, pandas tutorial for beginners, pandas python data analysis, python pandas tutorial, python data science, python data science tutorial, python concat function, concatenation in python, string concatenation in python, python pandas, data science
Id: WGOEFok1szA
Channel Id: undefined
Length: 15min 13sec (913 seconds)
Published: Sun May 14 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.