Level Up Python Pandas GroupBy | Five Minute Python Scripts | Subscriber Request

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone i hope you're all doing well in this one i wanted to tackle a few subscriber requests that we have around the group by method and pandas they're asking how can we conditionally do different functions on other columns based on a column value for example say we have the data that looks like this where we have an order id column a sales amount and an order type however we want to go through and find where order ids are the same such as this entry right here of 4702 do a sum of the sales amount and concatenate two strings of order type that way we can view one entry for this order id see a total amount and see that was a downgrade and an upsell let's see how we can do that and power up our group by function in pandas let's get started [Music] starting out we'll be using pandas so go ahead and import that if you don't have this installed go ahead and pip install it then we can read in our data frame as usual so we can read excel and then pass in the path to our excel workbook mine is called order history.xosx so order history.xlsx going back and viewing that workbook we see that the order ids are all over the place so let's go ahead and sort those we'll do that by saying df is equal to df sort values then we'll say buy and then we'll do order id then we'll pass in a keyword argument of ascending and set this equal to true finally we'll print df.head to view the first five lines and make sure that we've read this end successfully opening up a terminal or a powershell we'll say python and then execute our script once we do that we see that our order ids are now ascending and value all right great so we have our data read in and now we can get to simplifying our data frame so remember what we're trying to do is simplify all of our data to where if the order id is the same we want to combine or sum this column here and then concatenate this column here let's get started doing that we'll say simplified df will be equal to our data source of data frame dot group by and here we could pass in multiple columns or just one our use case means that we just need this order id column and this is great so far however we need to determine what happens to the other columns and since we want multiple operations to happen what we can use is a function called dot a g g this function's short for aggregate and pretty much this is what we want to happen with the other columns we can use a dictionary to do that we'll say ag functions and then we'll create that dictionary and what this takes in is a column heading so we'll say sales amount and then the function that we want to apply to that column so here we're doing a summation and we can just pass in sum for our second one we have a little bit more complex of an example we have order type and this will actually be a function what we're doing here is concatenating so we'll use a dot join and let's add in a comma and a space after each one so we'll say comma space and then dot join getting this level of control with a group by function is pretty awesome to have in your toolkit we could also pass in a list here instead of just one value so if we wanted a sum and also a mean we could do it in this example we'll just use a sum because that's all we need but in your own case you may need more now that we have these ag functions let's go ahead and pass it to the ag method down here great i'm doing this in multiple lines here because i feel like it's easier to teach however you could just copy this right here and paste it within your add function to get rid of this variable completely great so now we have our simplified df and if we wanted to we could just write it to excel so we could say simplify df to excel and then pass in a file path we'll say simplified order history dot xlsx this will write it to your desktop and let's go ahead and view it so we'll run our code and we'll open up this excel simplified order history workbook right here now we see that we have entries where we have multiple order ids and we have a concatenated string for the order type the sales amount has been summed as well great so we got to our solution but let's take this one step further we'll go ahead and open our editor back up and let's say that we only wanted the entries that had multiple items on the order we can do a conditional index to get those we'll say something like item orders and then we need to say simplified df so we're accessing the data here and then we can put in our condition inside of it our condition is where our field of order type and then we can use a string method so dot string dot contains and we can pass in whatever string we want in this case we know that if multiple items are on the order we're creating our data frame to include a comma in the order type there's definitely a lot cleaner ways that you can do this but for me this works then we could say something like print multiple item orders we'll save this and execute it when we execute we see that we got a permission denied and this is common if we still have this workbook open we'll go ahead and exit out of that and then we'll save and execute again this time we'll get the result that we expect we get only the entries by order id where they have multiple items on the order and that's it for this subscriber request if you have any questions or comments about it please let me know if you have your own requests that you want to make please drop that in the comments and maybe i'll be able to do it in the next video until next time [Music] i
Info
Channel: Derrick Sherrill
Views: 7,440
Rating: undefined out of 5
Keywords: Python, Python Automation, Tutorial, How To, Derrick Sherrill, pandas, groupby, method, functions, excel, excel automation, subscriber request, codewithderrick, python pandas, python automation excel, python excel automation, agg, aggregate, python pandas aggregate, dataframe, dataframe code along, data
Id: F1KYHydTetc
Channel Id: undefined
Length: 7min 23sec (443 seconds)
Published: Sat Apr 17 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.