SQL Window Function | How to write SQL Query using RANK, DENSE RANK, LEAD/LAG | SQL Queries Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys in this video let us understand how to write sql queries using window function window functions are also referred to as analytic function in some of the rdbms if you're planning to attend an sql interview then you can almost be certain that there will at least be one question related to window function so it's not just about clearing interview but by knowing how to write sql queries using window functions you should be able to solve some of the most complex sql queries pretty easily for example let's say you have been asked to write a query where you want to fetch the top three employees from each department in your employee table who earn the max salary or let's say you have been asked to write a query where you want to display the minimum and the maximum salary in each department corresponding to each employee record then this kind of queries would almost be impossible to write without using a window function the syntax to writing sql queries using window function is slightly different from writing any other sql queries so i'm going to explain you the exact syntax and also the different types of analytic or window functions that you can use in sql in this video the window function that i'm going to cover are rank dense rank row number lead lag and i'm also going to show you how you can use aggregate functions as analytic function in sql before we can start writing queries if you find this video useful then please make sure to like it helps me a lot also i regularly post videos related to learning sql python and database concepts as well as data science related concepts so if you are interested in these kind of concepts then please make sure to subscribe to the channel thank you and let's begin the best way to understand window function is by first going through an aggregate function to do that i have already created an employee table as you can see here so it has four columns employee id employee name department name and salvy under department name i have four different departments so i have admin hrit and finance under each of these departments there are a few employees totally there are 24 records and the salary details of each of these employees is mentioned in this column now let's say if i wanted to find the max salary earned by an employee in this table then i can just write the query like select max of salary as max salary okay from employing now if i execute this you can see that i'm able to extract the max salary here okay but let's say if i wanted to extract the max salary earned by an employee in each department then i can do that by using a group by clause so let's say i just add a group by department name and let me just show it here as well now if i execute this query you can see that i'm able to extract the max salary corresponding to each department in the employee table this is fine but let's say if your requirement is to extract the max salary in each department but along with that you also want to display all the other details of the employee tables then we would not be able to do that just by using aggregate function or group by clause maybe we can use a with clause or write some other sub queries to do that but the best way to form this kind of query would be using a window function or analytic function now let me write a query how we can do that so i'll just say select star from employee okay i'm going to give an alias like e so i just say e dot star and now i'm going to use max but this time i'm going to ask sql to consider max as a analytic function or a window function how i can do that is by using a clause called as over okay i'm going to give an alias here saying like max of salary now if i execute this query you can see that i'm able to extract all the records in this table so even all the columns from this table are fetched along with that i have an additional column here that is the max salary what this column is having is for every record it's having the same value that is the max salary that it was able to find from the salary column in this table now let's look at this syntax in detail so basically i'm using a max of salvy which basically is an aggregate function but also i'm using an over clause so since i'm using an over clause what sql will do is it will not treat max as an aggregate function it is going to basically treat it like a window function over is basically used to specify sql that you need to create a window of records so if i don't specify any column like i have done here i have not specified any columns inside this what happens is sql will create one window for all the records uh in this result set so basically there are 24 records it's going to create one window and for that one window it's going to apply this particular function so in all of these 24 records it's going to look for the max salary which in this case is 11 000 and that value is going to be shown corresponding to each record in this result set okay now let's say if i wanted to extract the max salary corresponding to each department then what i can do is i can use a clause called as partition by so let's say i just mentioned partition by and then i mentioned the column name so in this case let's say department name what it's going to do is for every distinct value that is present in this column it's going to create one window okay and then it's going to apply this function to each of those windows okay now if i execute this query the max salary that you can see here is different for each of the department for example for admin the max salary that is being shown here is 5000 because the maximum salary that is earned by an employee who belongs to the admin department is only 5 000 hence 5000 is displayed here but when it comes to finance the maximum salary earned by an employee who belongs to the finance department is 6500 and 6500 is displayed for every record in the finance department record okay now the same thing happens for hr and the same thing happens for the id department as well so this is basically the syntax of how you can use a window function in this case i'm just using an aggregate function that is max i can also use min sum average count or any of the aggregate functions i hope this was clear now let's look at some of the functions which are specifically used as window functions in sql and these are common across most of the rdbms like mysql microsoft sql server oracle postgresql or any other popular rdbms let's start by looking at row number as the name suggests it's just going to assign a unique value to each of the record in the table i can write the query like select e dot star from employee i'll give the alias as e and i'm going to use the row number so i say row row number over will not pass any column for now and i'll say rn okay as alias now if i execute this query you can see that for every record in this table i'm able to get one unique uh identifier now in this query i have not passed any column inside the over clause and this is why sql will treat all of the records in this table as one window and for that whole window it's going to assign a row number okay so there are 24 records so 24 numbers have been assigned now let's say if i wanted to assign a number based on different departments then what i can do is i can just say same like the previous query so partition by department name okay and now if i execute this query you can see that i have the department admin it has four records and for this uh records the role number is assigned as one two three four and after that for the records belonging to the finance department the row number gets reset so it again starts from one and then goes up to four and then for the next department hr again the row number will reset from one and again go up to the number of records that is present in the hr department okay and the same thing happens for it department as well so as you can see what's happening is because i used over and i i mentioned the department name as the column inside my partition by clause for each unique record that is present in the department name column it's going to create one window and inside that each of this window a row number function will be executed so for each window based on the number of records one unique value will be assigned okay i hope this was clear now you may ask what is the actual use of having this kind of functions where can it be useful to showcase that let's say if i had a requirement where i wanted to fetch the first two employees that joined the company in each department let me make an assumption that the employee id of employees who joined previously would be lower than the employee id of the employees who joined uh later on okay so to do that i'm just going to copy this whole thing here and i'll paste it here so if i execute this query again you can see that i'm able to get the row number uh that is assigned for records belonging to each department but then if you look carefully the the employee ids are basically not sorted right so if i if i if you just see the records in the finance department you can see that the first employee id is 118 then 116 106 and 104 so this is not very sorted so it looks like the first employee to join was darwin that is 104 and then 106 and so on okay now to sort this data in the proper order we are going to use the order by clause so after this partition by the department name i can just mention the order by here saying that order by let's say i want to order it by based on the employee id okay now if i execute this query you can see that now if i go back to finance the data is now sorted uh as per the employee id what my requirement is that i want to fetch the first two employees from each department okay so i know that i want to fetch these two employees that is one zero one and one zero eight and then i want to that is from admin and then from finance i want to fetch these two employees and then i want to fetch the first two employees from the hr department and so on uh for the id department as well now how i can do that is i can just use a sub query okay so i just say select star from this one and i'll just move it ahead and i'm going to give an alias to to this like let's say x and i say where x dot rn is less than three okay now if i explain in this query what i'm basically doing is this inner query that i'm having here if i execute it's basically fetching all the records from the table and for based on different group names that is mentioned here it's going to assign a specific row number okay now i i'm only interested in the records where the row number is one and two okay because these are my first two employees in each department so i'm passing this whole thing as a sub query and i'm giving an alias to it as x and i know that the row number one and two is what is the record that i want to fetch so i just say x dot row number less than three now when i execute this query you will see that i'm only able to extract eight records because it's two records from each department and i have four departments here so the first two employees from the admin department are extracted here the first two employees from the finance department are extracted here and so on for all other apart for all other departments so this is one use case of where you can use row number to extract a specific record from your table okay i hope this was clear now going ahead let's look at another window function that is rank so let's say you have a requirement where you want to fetch the top three employees who earn the max salary in each of the department so basically what you need to do is you need to rank the employees based on their salaries and then you need to fetch the top three employees from each of the department okay so to achieve this kind of results the best way to write a query is using the rank window function okay or the dense rank based on what you need but let's first look at rank so let me write a query for that so let's say select e dot star from employee alias e and i'm going to use an rank function here so let's say rank okay i don't need to specify a column name here and i just say over and i'm going to because i need to rank employees based on their department i'm just going to partition them by the department name so i just say partition by department name and now i'm going to order by their salary so i want to find employees who are earning the max salary right so i say order by salary but i need to make this order in the descending order so i say desc so this will order uh by salary but in the descending order okay and i'm going to give an alias saying like rnk now when i execute this query you can see that i'm able to extract all the records and there is a rank that has been given to each of the record corresponding to each department okay so let's first look at the admin department under admin department you have four records so you have the employee with the highest salary is 5000 that is at the top and it has got the rank as one next there are two employees who earn the same salary that is 4 and both these records have been given the same rank that is 2 000 okay and then the next record is the next employee who is earning 2000 is actually given the rank as for this because whenever we are using rank it's if it finds a duplicate value it's going to assign the same rank to both of these records but for every duplicate record it's going to skip a value so it finds a duplicate record here so it gives the same rank but for the very next record it's going to skip a value okay when it's assigning the next rank this is what rank does now let's say if i wanted to find the employees with the top three salaries okay what i can do that is again i'm going to place this inside a sub query so i just say select start from okay and here i'll just right ahead and i'm going to give an alias to this sub query as x i say where x dot r and k is less than four okay so if i execute this uh query now as you can see only 14 records have been extracted here and it's only extracting those records where the rank is less than four so basically we were interested in only employees who had the salary rank as one two or three so you can see in admin i only had uh one and two because uh i think the there are four records now if i just check back here you can see that under admin i only had the rank as one two and four so there was basically no rank as three because these two had the same rank and the next rank got skipped so when i look at the whole data here this query is going to extract me all the records of employees who are earning the max three salaries in each department okay so i hope this was clear now this is how we can use the rank function to achieve this kind of uh requirement now there's another uh function that is called as dense rank which is almost very similar to rank and is just one difference so i'm going to talk about uh that right now so for that let me just copy this uh query and okay so what i'm going to do is i'm going to use the same query but i'm just going to copy the same thing and paste it and just make it a dense rank okay so i just say dense rank and now if i just execute this query you can see that now there's another column added here which is dense rank and if you see the output here what it's doing is for every salary basically for every record in your department in this case so let's say admin department i have four records uh the first record is the one with the highest salary so the rank is one and the dense rank is also one this next two employees are having the same salary so the rank is 2 and the dense rank is also 2 but then the next employee who is having the salary as 2000 is having a rank as 4 because 1 value got skipped because the previous record had duplicate rank but in dense tank as the name suggests it will not skip any value so this is the only difference between using a dense rank and rank rank will skip a value for every duplicate value that it found previously whereas a dense rank will not skip a value okay so i hope this was clear next uh let me just add another um column here so just to explain you how different row number is now you might also ask why we are not passing any column inside any of these functions because rank dense rank and row number are just functions which will assign a value for every record which has been detected or identified by what you have mentioned inside your over class so now you can see the difference here the the record with 5000 is all having rank has one dense rank one and row number as one but then the next two records uh which have the same salary the rank and dense rank are having values as two whereas a row number does not care if there are duplicate records or not it's just going it's just going to add a unique number to every record so this is basically the difference between a row number rank and dense rank okay finally let's look at the last two window function that we are going to cover in this video which is lead and lag so let's say you have a requirement where you want to check if the salary of the current employee is higher or lower than the previous employee then we can use the lag function to do that so let me write a query for that so let's say select e dot star from employee and i'm going to use the lag window function here so i mentioned salary as the column that we need to check and then i mention over and i do a partition by the department name and order by the sorry order by employee id i want to check for the previous employee right so the data needs to be in the order of the employee id so i say as and i am going to pass a previous employee salary okay something like this now if i execute this program or this query you can see that what's happening is in this new column the previous employee salary it's actually going to fetch the salary of the previous record so in this case i have grouped the data or partitioned the data based on the department name and then i have sorted it based on the employee id now in the first record the employee id is 101 and the salary is 4000 but there is no record previous to this so by default the lag function will return as null because there was no previous record when we come to the second record it's going to look for the previous record in this case so the previous record the salary was four thousand so in this record the previous employee salary will be displayed as four thousand now when we come to the third record you can see that the salary of the previous record was four thousand hence that is what is being displayed here and when we come to the fourth record the same thing continues basically the value of salary that was present in the previous record will get returned here okay so basically as a lag name suggest it's basically the record from your previous rows now there are a couple of arguments that we can pass here for example let's say if i pass 2 and then if i pass here as 0 what this will do is by default it will always look for the previous record that is one row prior to the current row okay but i can mention how many rows prior you want to check so if i mentioned two what it's going to do is it's going to look for two records previous to the current record okay and then the last argument here is basically a default value so in this case since there was no previous record it returned null by default but here i can specify a particular value what needs to be displayed here okay so if i execute this program now you can see that it's basically checking for two records previous to this so two records previous to the first record basically there is nothing so it's going to return the default value that is zero the same thing happens on the second record when we come to the third record it's going to look for two records previous to the current record that is this one so it has it's four thousand so that is what is getting displayed here and so on and so forth okay so this basically what a lag does now almost similar to this is lead and as you can already imagine lead is basically the rows that are following the current record okay so i'll just remove all of this and i'll also remove this now and i'll say this is not previous but it is the next employee salary okay so if i execute this now you can see that in the first record it's going to look for the value of salary in the next record that it is 4000 so that is what is getting displayed here now when we come to the second record it's going to look for the salary in the next uh record here this salary so that is 2000 and that 2000 is what is getting displayed here and when we come to the third record it's going to look for the salary in the fourth record that is 5000 and that 5000 is actually being displayed here okay so this basically what lead and lag window function can be used there are a few use cases where this kind of queries can really be helpful for example let's try to execute a query which will return this kind of result where we want to fetch if the salary of the current employee is higher than the previous employee then we display something like higher or we display something like lower okay so to do that let me just copy this query here and i'll remove this i don't need this and what i'm going to do is i say case when salary right e dot salary is greater than this whole thing okay so greater than the previous uh salary so i just paste it here and i say then if it is greater than i say higher than uh previous employee okay and now let me just copy this whole thing and i just paste it here and i'll i'll paste it here again and i say end and i'll call this as salary range okay now if this is greater i display higher but let's say if it is lower then i say display as lower and if it is equal to then i say same as the previous employee now if i execute this program you can see that okay so you can see that the previous salary here is null so basically it returns me as null maybe we can add another condition where it is null for now let's ignore that and since the current salary and the previous salary are same i'm just printing it as previous the salary is same here the salary of the current employee and the pr is lower than the salary of the previous employee hence it is printing lower and whereas in the record number four the salary of the current employee that is 120 employee 120 is higher than the salary of the previous employee that is employee number one one three so it's displaying higher than previous employee so this is a just a simple use case where we can use lead and lag to do certain analysis i hope all of this was clear there are a few other window functions as well such as nth value first value last value ntil etc so if you would like me to make another video covering all the other window functions then please leave a comment below i'll try to do that if you like this video or if you found it useful then please make sure to like and also subscribe to the channel for more such learning sql and learning python or data science related videos thank you and see you soon in the next one
Info
Channel: techTFQ
Views: 18,151
Rating: 4.9773159 out of 5
Keywords: SQL Window Function, SQL Window Functions, Window Function in SQL, SQL Analytic Function, Analytic Function, Window Function, How to write SQL Queries, RANK, DENSE_RANK, LEAD, LAG, ROW_NUMBER, PARTITION BY, OVER clause, SQL OVER Clause, SQL Partition By, Parition by query, rank function in sql, rank function, dense rank function, lead function, lag function, row number in sql, SQL query tutorial, sql queries tutorial, sql tutorial, tutorial, sql query writing tutorial
Id: Ww71knvhQ-s
Channel Id: undefined
Length: 24min 54sec (1494 seconds)
Published: Fri May 21 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.