26. GroupBy agg() function in PySpark | Azure Databricks #spark #pyspark #azuredatabricks #azure

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi friends welcome to offer studies YouTube channel this is part 26 in by spark playlist in this video we are going to discuss about aggregate function which is available on top of group by object Group by function we have discussed in our previous video that help you to group the rows with identical values like it is similar to SQL Group by function so if you haven't watched the previous video please watch it and then watch this video so that you will get most out of it all the videos in my entire python playlist are in a sequence order so please watch them in the same order to get most out of it so let's discuss about this aggregate function so this aggregate function will help you to calculate more than one aggregation item at a time on group data so that means let's assume you have group data based employee data based on Department maybe so now on top of that in my previous video I shown how to apply aggregations like count of employees or maybe minimum salary or maybe maximum salary all that but we can apply everything at a time we cannot apply I mean one after another right we cannot apply all these aggregation functions at a time on a same group of data so that we have seen in my previous video but this aggregate function will will help you to apply all these aggregations at a time also let me explain you this with a practical demo so that you will make more sense of it so let's go to browser here I have already opened my databricks workspace let me go to workspace users under my name here let's try to create a data frame so let me try to right click and create a notebook sorry let's try to create a notebook here so let's name it like aggregate notebook here python is a default language cluster and let me hit create button to create the notebook so once the notebook created you will see this kind of tool tip if you see our pop-up you can close this pop-up once it appears and here let's try to create one hard-coded data frame so in my past video I have already created one data frame with some sample data the same data frame I am creating now with the interest of the time I have that logic already written so let me copy paste that if you see here data is a variable which holds list of two pools every Tuple represents one row in data frame and for this rows we are creating a schema and holding that schema in a schema variable so schema is ID column name column gender column salary column Department column so this create data frame function will help you to create the data frame all this we have discussed even the create data frame function we have discussed in our PI spark playlist so that's the reason I always increase please watch the videos in the sequence order from the playlist one after another so now let's try to use DF dot show function to show this data in a tabular fashion you can see my command is running here once the command execution completes you can see the data frame created in a tabular format as well so let's wait for this spark job to run here now if you see this is the data frame which we created and you see ID column name column gender salary and Department okay so far it is good now as I shown in my last video we can apply groupie on top of it right so what I can do DF dot I can use this group by function to group a and maybe I want to group my department so this code will give you group data object on top of that you can apply aggregate functions such as maybe count so this count function will give count of employees per department and let's use a show function here and let me hit shift enter you can see the result by comparing here itself now if you closely observe the result it says we have totally three ID employees two HR employees two payroll implies so like this you can perform group a but maybe I want to get the minimum salary of it employee what I need to do I need to remove this I need to use this Min function and apply it so so minimum of salary right so I need to supply this minimum of salary column also since I have it passed which column for every integer columns it is calculating the minimum function so let me get only minimum of salary column okay so let me hit shift enter and now if you see I am getting so I want to apply that count function also minimum function also maximum salary function everything I want to apply at a single data them at a single group data object how to do that so to do that you can use this aggregate function okay so if you want to know the documentation or the details about that aggregate function you can use a help function like this when you hit shift enter it will give you a documentation here the entire documentation of the aggregate function I am not going in detail about it you can go for it but I'm just giving you a hint so now to this aggregate function we can pass what functions you want to apply so what aggregate functions you want to apply on top of the group data object this code will give you a group data object so let me do one thing from from PI spark dot SQL dot functions let me try to import count function and also Min function there is something called Max function as well so let me import these three for now and what I can do here to the aggregate function I can pass this count function and this count function will take column name as well so if you want to know what you can do you can apply help function on top of it so let me copy this to another cell here and here if I execute help of count and when I say shift enter it says for the count function you have to supply the column on which you want to apply the account so what I will be doing it here is firstly let me close this cell what I just now executed which is Command 2 and here to the account function let me try to say like count of all columns so that means star and I want to give Alias name for this as maybe count of emps Co count of employees then finally let's try to use a show function and see whether this will work or not now if you see this is my data frame and I am getting count of employees as well correctly so here I am applying only one aggregate function but I said we can apply multiple aggregate functions also right so let me do that here so comma then apply the second aggregate function so I don't want to write in a single line because that will not much readable so let me use slash here to break the lines of code in Python and here I will be applying another function called maybe minimum then I want to get minimum of salary then I want to give Aliyah's name for it I also like min salary okay then comma then line break then maybe Max of salary also so for each department what is the count of employees and what is the minimum salary and what is the maximum salary that's what I am trying to calculate here now if I hit shift enter and if you will see the results here you will make more sense of it you see my data frame created and I grouped data using this department column and I was able to apply three aggregate functions count minimum maximum everything on a single group data right so this is a you can use this aggregate function on top of the group with data I hope you got an idea thank you for watching this video please subscribe to my channel and press the Bell icon to get the notification whenever I add videos thank you so much

Info

Channel: WafaStudies

Views: 12,898

Rating: undefined out of 5

Keywords: PySpark for beginners, PySpark Playlist, PySpark Videos, Learn PySpark, PySpark for data engineers, dataengineers PySpark, PySpark in Azure Synapse Analytics, PySpark in Azure databricks, Understand PySpark, What is PySpark, PySpark in simple explaination, PySpark Overview, synapse pyspark, spark, pyspark, azure databricks, groupBy(), groupBy() in dataframe, agg() groupby in pyspark, group by agg() in pyspark, apply multiple aggregate functions on group by in pyspark

Id: wRHfkdh4s60

Channel Id: undefined

Length: 8min 18sec (498 seconds)

Published: Fri Dec 02 2022