SQL Queries For Pandas DataFrames

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on guys welcome back in today's we're going to learn how to use sql statements to query data from pandas data frames so let's get right into it [Music] all right so obviously for this video we're going to need pandas which is an external python module so we need to install it and in addition to that we're also going to need pandas ql which is basically pandas sql but with only one s so pandas ql which is also an external python module so in order to install those two we open up a command line and we type pip install first of all pandas if you don't have it already and then pip install pandas ql to enable to allow us to do uh the sql querying with pandas data frames and then what you need is you need some data frames so you can either use something like pandas data reader to get some financial data from the web you can create your own data frame by creating a dictionary parsing it into a data frame or you can import just some csv file which is what i'm going to do so i have here this simple csv file uh where i have the id name age height gender of a bunch of artificial people here so we have one to eight for the id we have a bunch of names here we have ages and heights and then we have mf for male female um that is just some sample data you can create whatever data you want you can put it into a csv file and then you just import it so the focus is not going to be obviously on the data the focus is going to be on querying using sql statements and for that we're going to start by importing pandas spd which is the usual alias and by importing pandas qlsrp i'm not sure if that is like the uh the go-to best practice alias or if you would use an alias at all but we're just going to use ps for this video here and now we're going to say data equals pd read csv and we're gonna get the data.csv file here and we're gonna say data set underscore index is gonna be uh id and we're gonna assign that to data and then we're gonna take a look at it by printing data to see if that works and now you can see we have name age height gender as the columns here we have the id column which is the index and this is basically what we have now and now we might want to find for example all persons all people that have an age above 30 for example now before we go into the sql querying we're going to do a simple pandas query to see how it would work without sql statements so what we can do here is we can say print data dot query which is a pandas function without using pandasql and here we can just specify a string expression that is the query that we want to have for example we can say okay the h has to be above 30. and then what happens is we get only the entries where the h is above 30. so you see one three five six eight are the ids where this is the case we can also of course say greater or equal to 30 and then we get i think an additional record here anna because the age of anna is exactly 30. so this is how you do it without sql and this is fine you can also do stuff like for example gender equals then single quotations f to only get the women as you can see here so this works and this is the simple way to do it uh with pandas with only pandas now we can use pandas ql to do the same thing with sql statements and for that what we can do is we can say uh print or actually let's first define the sql query in a statement here we can do something like select everything so select star from and then essentially what we have to provide here is we have to provide the data the data frame so in the string we're going to say data and this name here has to be the same as this name here so this query will be used on this uh data frame here so the data frame will be treated as a table which it is essentially so this is a database table now i'm going to say select everything from data where and then age is greater than 30 for example and then what i can do now is i can say print ps sqldf so ps.sqldf and then we execute the sql query and we pass the locals function here as well and then you can see we have the exact same output so now the difference here is that we get now um again the id as a column here so we have now a different index we could also just store the data frame set index to id again and drop the index that we have here but essentially we get the entries that we're interested in and of course we can do the same thing here with for example gender equals i think we need to provide it here again in single quotations right there you go gender equals f uh and i think we should also be able to do and so h is above 30. i think this should work as well there you go we only have one entry left here so this works uh and we can do even more complicated stuff like for example let's say we have now uh data two which is going to be a second data frame we're going to just create it now from scratch we're going to say here the pandas data frame of a given dictionary and the dictionary is going to have an id list and it's only going to have one id in that list id2 and then we're going to say that we want to specify a job here and we want to say the job list is just programmer so this is essentially just a data frame now we can actually print this data to let me just oh come on two let's comment this out here this is now just a simple id job data frame that we have here now we can take this and merge it we can join it via an sql statement to make an advanced query with pandasql so what i can do here now is i can go ahead and say the sql query that i want to make is the following it is select everything from data and i can also provide an alias like d so select everything from data d left join um and then i can say data2 d2 is the alias now on d dot id is the same as d2 dot id and of course you can play around with more details you can make it more professional everything but we can now go ahead and say print ps dot uh sql df sql query and then locals again and then you will be able to see that we have now ignore the first column here but that we have here now um anna which is id2 and she now has a job all the others do not have a job this is because we did a left join which means that we keep all the entries even though the job is none for all of them um we could also drop the second id column here but essentially uh since we joined on the id we now have for anna which is which has the same id here the job programmer this is just some very basic example but that's how you do it this is how you can use sql queries to get data from data frames and you get a resulting data frame from that so you can get a smaller version of the data frame where certain conditions are met you can do joins you can do a lot of things you cannot do everything i think stuff like i try to do i think a full outer join and i think the error message said it's not supported as far as i remember there you go um what was it right and full outer joins are not currently supported so you have some limitations here but essentially this is how you can do it in python so that's it for today's video i hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free other than that thank you much for watching see you next video and bye [Music] you
Info
Channel: NeuralNine
Views: 25,876
Rating: undefined out of 5
Keywords: pandas, python, pandas sql, pandasql, pandas sql query, python sql query, pandas query, pandas filter, pandas database
Id: oPuVYSC_kpo
Channel Id: undefined
Length: 8min 28sec (508 seconds)
Published: Wed Aug 03 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.