Basic Guide to Pandas! Tricks, Shortcuts, Must Know Commands! Python for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone today we will talk about pandas and we will not just cover the basic commands but also the tricks and all the shortcuts now pandas is a python library that gives structure to data it also makes it very easy to access this data as well as manipulated and after today's tutorial you guys will be using pandas for almost everything i guarantee so are you ready let's roll and we will begin by installing pandas with pip install pandas then we can go ahead and import it into our python file with import pandas spd and we will begin with some very simple data let's say we have a list of names where the first name is maria the second name is batman and the third name is spongebob it doesn't really matter we will then assign this list to a variable called colon and we will then print column let's go ahead and save this file and let's run it now whenever we print a python list all we get in return is a bunch of plain text with no special structure and this is where pandas comes in so let's take this list and let's insert it into something called a data frame we can do this right below our column definition we will type pd dot data frame with a capital d and a capital f and inside a set of round brackets we will pass our column list then let's go ahead and assign this expression to data and then instead of printing column we will print data let's save it and let's rerun it and there you go our list is now organized into rows and columns which makes it much easier to view but the only problem is zero is not a very informative name to our column now let's go ahead and take care of that so right below our column definition we will create a brand new variable called titled column and this time we will create a dictionary where the key of name corresponds to our list column and then instead of passing our column into our data frame we will pass our titled column we will save our file we will rerun this and there you go it is no longer zero our column now officially has a name and let's add some new columns to this data frame so inside our dictionary we will create a brand new key which we will call height now this key will equal a list where the first value is 1.67 meter that would be my height in the case of batman that will be 1.9 and in the case of spongebob on a good day he's probably a quarter meter so let's say 0.25 we can then add a brand new dictionary key called weight which will correspond to another list this time in my case my weight is 54 kilos well at least i hope it's been a while since i checked now in the case of batman that's probably 100 kilos because it's big and muscly and in the case of spongebob let's say he's a one kilo well i don't know what's what's the weight of a wet sponge i'm not sure whatever and let's go ahead and change this uh titled column name to titled columns because it's a bit more accurate now we can go ahead and save this file and let's rerun it and there you go we have added two brand new columns to our data frame good job awesome but how do we select values from this data frame let's say we would like the entire column of weight we will simply type data in the key of weight and that's it we can assign it to select columns or column we will print it there you go we are selecting the entire column of weights but what if we're only interested at batman's weight how can we do it we simply add an index value to our data in the key of weight so in the case of batman index would be one if we now save this file and if we rerun it we are getting only 100 kilos in return but that's not all we can also select with a row so to do this we will create a brand new variable called select row and we will assign it to data dot ilock which represents a row and then inside a pair of square brackets we will specify the index of the row so in the case of batman that will be one now let's go ahead and print select row instead of select column and there you go we are selecting all the values of batman but we're only interested at the weight value so how can we do it and as you may guess we simply add the key of weight to the end of our select roll command we can now save this file we'll rerun it and there you go we are extracting 100 kilos instead of the entire row of batman now let's see how we can manipulate this data so we will create an empty list called bmi which stands for body mass index and if you guys are not familiar with it it's just a measure of body size so basically we have a formula of weight in kilos divided by squared height in meters and what we would like to do is we would like to take this formula and apply it to all the people in our data frame then we will take the results and we will store them in an additional column so to do this we will create a for loop so for i in range the length of our data we will do the following we will create a brand new local variable called bmi score and we will assign it to data in the key of weight in the index of i divided by a set of round brackets where we place data in the key of height in the index of i but this time we are looking for the squared version of our height so we will add a double asterisk and two to the end of this expression and that's it we are done with the formula but the last line of our for loop would have to append our bmi score to the empty list we have created earlier so we will type bmi dot append bmi score and then outside of our for loop we will simply create a brand new column inside our data frame called data in the key of bmi and we will assign it to bmi and then lastly in the very last line of code we will print our data instead of the select row and we will go ahead and save this file and let's run it and there you go we have just created a brand new column based on data from other columns which is amazing and is exactly what i mean when i say data manipulation now let's say our data frame is complete we are happy with it how can we save it into a file to do this we will type data dot 2 underscore csv and then inside the round brackets we will select a name for our file in my case i'll call it bmi.csv we will now save this code we will rerun it and once we do that we have a brand new file popping up in our file system let's click on it and there you go here's all our data entries separated by commas we call these type of files comma delimited but we don't necessarily have to use a comma we can select any other character as a separator so we will go back to our python file we will go back to our two csv command and we will add an additional argument we call this argument sep as in separator and if we'd like to create a tab delimited file for example we can do this with a backslash t and that's it now if we save this code and if we rerun it we will go back to our bmi.css file and there you go we are now separating all our data entries with a tab character so this file is officially tab delimited now we are not exactly limited to csv files only if we'd like to change it to txt we can now save this code we can rerun it we have a new bmi.txt file popping up with the exact same information so just because this command is called two csv it doesn't mean that it only produces csv files awesome so now we know how to create manipulate and save data frames but how exactly can we load them from an existing file to do this we will first import pandas as pd because this is a brand new file and we will then use the pd dot read underscore csv method and we will pass the name of the file we would like to select into it now in our case we will use dot csv and if you guys remember this is a tab delimited file which is not the default comma delimited file that's why we will need an additional argument as you may guess that will be sep which will equal in our case to backslash t as in tab then we can go ahead and assign this expression to data and we will then print this data let's save it let's do it and there you go here's our list of data from the previous file which we have loaded from bmi.csv now if you'd like to get rid of this unnecessary column we can simply go back to create data frame we'll scroll down and whenever we save our file we will specify index equals false as an additional argument and now when we rerun this file we can see that our unnecessary column is gone as well but what if instead of a csv file we would like to load a database file for example our gta database from the sqlite tutorial in that case we will first import sqlite3 we will then connect to the database by typing connection equals sqlite3 dot connect and inside the round brackets we specify the name of the file in our case.gta dot db and then right below we will simply use the pd.read underscore sql method to which we will pass the sql command of our choice in my case that would be select all from gta where gta represents the name of the table inside the database not the name of the database in our case these two just happen to match but yeah keep that in mind and as a second argument we will also need to pass our connection variable so connection and that would be the connection to our database itself now we can assign this expression to gta data just so we don't get confused and we will then print this data so print gta data let's save it let's run it and there you go that looks quite familiar eh no we don't always have to print the entire data frame sometimes it makes more sense to print only the first five rows we can do this by specifying gta data dot head and a set of round brackets as well we can now save this code we can rerun it and we are only printing the first five rows if we'd like to print the first two rows we will simply specify the number two inside our empty set of brackets from earlier and there you go we are only printing two rows now what if we want to print the last two rows we will simply replace head with tail we will save it and there you go we are only printing the last two rows so whenever we do not specify a number inside tail or head we are going back to the default five and this comes very handy when you're dealing with enormous amounts of data alrighty now let's go ahead and filter our data frame entries let's say we only want to select the rows where the city equals liberty city we can simply do this by typing gta data and opening a set of square brackets and then inside the set we will type gta data in the key of city that it must equal to liberty city now with the first instance of gta data we are specifying a filtering command with the second instance of gta data in the key of city we are selecting an entire column and if the value that is found within the column equals liberty city only then this command returns the entire role so let's see how it works we will assign this expression to filtered row and we will print it right below and we are getting two different data frame rows in return both of them are happening inside liberty city perfect now what if we want to replace every instance of liberty city with new york how can we do it so right below our filtered role command we will create a brand new variable called replaced underscore city let's call it that way and we will assign it to gta data dot replace and inside the round brackets we first specify the string we would like to replace in our case liberty city and next to it we specify the string we would like to replace it with in our case new york and now if we go ahead and print replaced city instead of filtered row we are getting our data frame where every instance of liberty city is now equal new york now let's remove some data so let's say we would like to get rid of the entire city column or we can do this by typing gta data dot drop and inside a set of round brackets we will specify city as well as an additional argument which would be axis equals one so axis one is our columns and axis zero is our rows we can assign this expression to remove column and we can then print it and there you go our city column is gone now what happens if we want to get rid of two columns so let's just copy this release here column name and then instead of passing a string into our drop command we will pass a list of strings which includes city and release city now when we rerun this code we are only getting our release name as a as a result okay so this is very handy but how can we get rid of rows so right below we will create a brand new variable called remove row and we will assign it to gta data dot ilok which represents rows if you guys remember and then as index we will select all the values starting from row number one which is our second row and ending at row number four now let's go ahead and print this remove row variable and there you go we are only selecting the second third and fourth row and that's it now one thing i forgot to show you is how to add new rows to our data frame so this is actually quite annoying and i am yet to find a good shortcut here so if you have one definitely let me know in the comments below so what we'll do is we will create a new variable called row we will assign it to a dictionary where the first key would be release year and this has to be a hundred percent match to the name of our column otherwise it's not gonna work so we will assign this dictionary key to 2021 and then our next key as you may guess would be release name and in our case that would be natural vision evolved which is not an official gta release you didn't miss anything this is a gta mode i've been playing it for quite some time it takes my gta 5 and it turns it into gta 6. it has ray tracing technologies it's insane okay so the last dictionary key would be our last column in our case city and that would be los santos or actually los angeles now let's go ahead and add this row into our data frame so we will type gta data dot append and we will append our row now as a second argument we would also specify ignore underscore index equals true and then just in case we will reassign this append command to a new variable called new row data let's then go ahead and print it and we can see our brand new role entry was added at the very very end of our data frame and the last trick for this tutorial actually came very handy just in the past few days so imagine you're organizing a code jam and you're asking people to fill up a form and then all of a sudden somebody that calls themselves zero day x decides to attack your form with a classic denial of service attack so long story short instead of receiving 207 form entries from real people who actually want to join our jam and want to participate we have received 99899 bot generated responses minus the 207 legitimate ones so how exactly can we get rid of the bot entries without getting rid of the human entries there's actually two ways we can either do it the long way we can make a for loop we can iterate over each of our data entries we can evaluate some data and this will take quite some time and it will take up quite some resources or we can use the pandas drop duplicates method so right after we load our csv file into a data frame called jam data we will type jam data dot drop underscore duplicates we will open a set of round brackets where we will pass an argument of subset which will equal to a list where we include the name of the column from which we'd like to get rid of all the duplicates so for example we will go back to our data we see that all the entries of zero day x they have the exact same name which is a great feature we can target now if we scroll up we need to look for the column name in which this zero day x lives in the case of my particular form that would be the name slash nickname optional column name we will copy this and we will paste it inside our python file within the subset now let's just go ahead and reassign jam data to jam data dot draw click drop duplicate now let's save it let's rerun it and that's it zero day x is gone now i have two additional pandas tricks with their very own tutorials so definitely check out the read html method which allows you to web scrape uh html tables like this it's insane and there's also the plot method which allows you to create graphs and charts from your data frames definitely check those out now thank you guys so much for watching i really hope you found this tutorial helpful and if you did please leave me a like maybe leave me a comment subscribe to my channel turn on the notification bell and share this video with many many many many many many many many people thanks again i'll see you soon
Info
Channel: Python Simplified
Views: 85,638
Rating: undefined out of 5
Keywords: pandas, pd, python pandas, python data science
Id: zN2Hua6oII0
Channel Id: undefined
Length: 20min 51sec (1251 seconds)
Published: Sun Jan 23 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.