SQL Window Function | How to write SQL Query using Frame Clause, CUME_DIST | SQL Queries Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys a couple of months ago i posted a video about sql window functions where i explained how you can write sql queries using window functions such as rank dense rank row number lead and lag now a lot of people liked that video and i received a lot of positive feedback a lot of you guys reached out to me in the comment section of that video and also on my linkedin and instagram stating that the video was really helpful and they asked me to make another video covering the remaining sql window functions and that is why i'm making this video where i want to cover everything that i was not able to cover in the last video so the contents that i'm going to cover in this video are i'm going to talk about an alternate way of writing sql queries using window function and then i'm going to talk about the frame clause how you can use a frame clause when writing sql queries using window functions we'll then see a few other window functions such as first value last value nth value end tile percent rank as well as qm dist hopefully with this you will get complete idea about all the window functions available in sql as always before i can start writing queries to explain all of these concepts if you like what i do then kindly do consider subscribing to the channel also like this video and leave a comment below with your feedback it really helps the channel to grow and also motivates me to make more interesting and useful videos for you guys thank you and let's begin okay so i have created a new table called as product and this table has four different columns so i have product category brand product name and price under product category i have five different products so there is some data related to some phones and then i have data related to laptops then some data related to earphones headphones as well as smartwatches okay and then i have a column product name where it indicates what what is the product name so i have all the phone names under the category phone under laptop category i have all the laptop names and so on and so forth and the company to which these products belong it's mentioned under the column brand and then the price of these products is mentioned under the column price okay so we are going to use this data in this table to explain all the different contents that we are going to cover in this video okay let's start by looking at the first value window function now as the name suggests first value can be used to extract a value or a column value from the very first record within a partition okay so let's say you have a requirement where you want to write a query which will fetch the most expensive product under each category and it and it should display this data corresponding to each row of your result set okay then in that case using first value window function would be a good choice so let's see how we can write this query so we start by writing select star from the table name in this case it's product so star will fetch all the columns from this table so that is okay and after that we want to also extract the most expensive product or basically the most expensive product name under each category so to do that i'm going to use the first value window function so i just say first value now first value window function will accept one argument so i need to pass one argument and this argument is basically the value that is going to get displayed in that column so i want to display the product name so i'm just going to use the argument as a column that is product name and as you already know when you are using a window function you need to use the over clause so it's mandatory to use over clause whenever you're using a window function it's not mandatory to specify anything inside the over clause i can leave it as it is but the data will not be as what i want so if i understand this requirement properly i want to extract the most expensive product corresponding to each category so for each category so here i have five different categories so i have phone so for phone i have around 10 records here so under this category i want to extract the most expensive product based on its price and then under the category laptop again i want to extract the laptop product which with the most expensive price and same way for all the other categories so i need to partition this result set based on the product category so that is why i need to use the partition by clause okay so i'm just going to say partition by product category now if i did not use this partition by clause then what will happen is all the result set of this particular query that is this table in this case would be considered as one partition so totally this table has 27 records so if i did not use if i skip this partition by clause then all of these 27 records would be treated as one partition or one window and then any window function that we use will be applicable to that particular partition so in this case since i am creating a partition by product category what sql will do is sql will create five different partitions for this record based on the distinct values that are present under this column product category okay so i have partitioned my data now for each of this partition in this case for each of this product category i need to find the most expensive product so in order to do that i can use the column price so i just need to sort the data based on the price in s in descending order so to do that i can just say order by price and then in descending order so i just mentioned this and so this should do the trick and then i'm going to give an alias to this column naming this like most expensive product so that's all now if i just execute this query you can see that now i have got all the 27 records so all the records from this table is still showing up but there's an additional column that is added that is the most expensive product now let's look at each partition in detail okay so we have partitioned the database on product category so the first product category here is showing is earphone so under earphone i have four different records and if i look at the price i know that the record with the price 280 is the most expensive uh earphone so and this 280 is corresponding to the product name airpods pro and that is what is getting displayed in my new column most expensive product and this is because first value what it will do is within a particular partition it will just go to the very first record and extract whatever column that you have specified inside as an argument so in this case i have mentioned argument as product name if i mentioned something else maybe let's say brand then it would display here as apple but since i have displayed mentioned here as product name it will go to the very first record of this partition and it will take the name the value that is stored in the product name so similarly if i come to the second partition here headphones it will see that there are totally three records and under this partition the very first row belongs to this particular row that where the product name is airport's max and this is what will get displayed in all the corresponding records of this partition okay similarly if i go to the next partition which is of laptop i have i think six records here and very first record has the product name as xps 17 and that is why it's printing xps 17 corresponding to every record in this partition and so on and so forth for all the different product categories okay so this is basically how a window function works and this is basically how you can use first value okay so now let's talk about last value so as the name suggests last value will basically fetch a value or a column from the very last record of a particular partition okay so so let's say you you are asked to write a query where you want to extract the least expensive product under each category corresponding to each record so almost the opposite of what we did in first value okay so let's try to write the query but i can basically use reuse the same query that i wrote for first value okay so i'm just going to type it here so i have this query so this query is basically having just the first value here now i'm just going to write another column which will display the last value okay so i'm just going to say last value and then here i need to pass an argument same like first value last value also needs an argument and this argument is basically the column or the value that needs to be displayed so again i'm going to use the value as product name and then again we need to use the over clause because window function over clause is mandatory and then again we need to find the least expensive product under each category so we need to make sure that this window function will execute for each each set of uh product category so i need to partition the whole result set based on the unique records present in product category so i'm again going to use the partition by clause so partition by product category okay so after this again i need to sort the data based on its price because i'm trying to fetch the least expensive product so what i want is the least expensive product should be at the bottom of my partition so i need to sort the data in the descending order so again i'm going to say order by price in the descending order okay and i'm going to name this column as least expensive product okay so now if i just execute this query you can see that the query is executed successfully and there is a new column added that as least expensive product but if you see properly this least expensive product is actually not correct the data is actually not correct so if i look at all the data corresponding to this product category earphone i can see that the most expensive earphone is airpods pro it its price is 280 dollars and the cheapest earphone is of 170 dollars which is galaxy but's life so what i wanted this query to output is that under this column least expensive product it should have displayed galaxy buds live in all of these records but it's not doing that the reason why it's not doing that is because of the default frame clause that sql is using now what exactly is a frame clause but before that what exactly is a frame whenever you are using a window function the the most important thing that window function does is it creates a window okay or it creates a partition and it applies that window functions to each of those partitions inside each of these partitions we can again create some subset of records which is called as frames so basically a frame is a subset of a partition so in this case the this particular partition which is the earphone partition has four records but the last value window function when it's going to process all the records in this partition it's not basically going to use all the four records at once it's going to use all the records which are within its frame now i'm going to show you the default frame which sql uses so this one is a default frame in postgresql because i'm using postgresql here but i'm sure it's the same for any of the other major rdbms like oracle mysql server mysql or any other now i'll modify this query in order to showcase how the default frame clause looks like and then we will change that frame clause so that our last value function can fetch the data that we want okay so this line is already too long so let me try to split it into few uh lines so it becomes more easier for you to read the frame clause that i am talking about is actually mentioned inside the over clause and it's mentioned at the end that is after you mention after you specify your order by clause okay so after you specify the order by clause inside the over clause you can mention the frame clause now the frame clause can be mentioned in a few ways i'm just going to tell you what is a default frame clause that is used in sql okay especially in postgre sql but i'm sure it's the same in other rdbms as well so the default frame clause is range between unbounded preceding preceding and and current raw okay so this whole statement here which reads like range between unbounded preceding and current row is the default frame clause that is used in sql so even whether i mention this or i don't mention this sql is actually considering this statement okay now i'll explain you what this statement exactly means so i have something like a range here range basically tells what is the range of records that this last value window function needs to consider okay so it's telling the range must be between that means consider the rows between unbounded preceding and current row so unbounded preceding means it's basically the rows proceeding to the current row the and unbounded basically means from the very first row of the partition so let's say if your query is executing and if it's processing the very first record for whether for the very first record it's going to see that it's going to consider all the rows within the unborn preceding and the current row now unbounded proceeding for the first row would again be first row because there is no rows prior to first row so it will just consider the first row and the current row again is just the first row so when sql is going to process the first record the unbounded process proceeding will also lead to the first record and current row also is the first record so the frame will only be pointing to this first record so basically last value will only have access to this particular record that is this that is the first record and from this first record it's going to fetch the last value since there is only one record the last value will also be the same as the first value because it's just one record and that is why both the first value as well as the last value is returning the same output okay now let's say when sql is trying to process the second record okay now what happens is the unbounded proceeding will point to the first record of the partition so this is one partition that is this basically this whole earphone product category is one partition under this partition the very first record which represents the unbounded proceeding will be this record where the brand is apple and the product name is airpods pro and the current row when sql is processing the second record would be the second row so when sql is processing the second record the last value would have access to these two records and from these two records it's going to fetch the last value and the last value between these two records will be the value corresponding to the last record of that partition or that frame which in this case is the wf 1000xm4 and that is what is getting displayed here now similarly when we come to the row number three when sql tries to process the third record the frame clause would let the last value to access the first three records okay because unprecedent unbounded proceeding will point to the first record and current row is the third record so from the very first till the current record that is three records and within these three records the last value is uh the galaxy bus pro and that is why it's printing galaxy buds pro and the same thing happens for the fourth record and so on and so forth for all the different partitions okay and this is why the frame clause becomes very important whenever you're using the last value not every window function will really be impacted by this default frame clause so it generally impacts the last value and also the nth value and also all the aggregate functions most of the aggregate functions so in order to fix this what basically i need to do is you can see that the the reason why last value is not able to extract the proper value that is the least expensive earphone in this case is because it does not have access to all the records of this partition so basically what we need to do is we need to modify this frame clause in order such that it will be able to access all the records that is from the very beginning of the partition till the very end of the partition so this unbounded proceeding is fine but we just need to change the current row to something that can point to the very last record of that partition and that we can represent by saying unbounded following so basically following means all the records following the current row and preceding means all the records prior to the current row so and unbounded means either to the end or to the beginning okay so now if i execute this query just by changing this current row into unbounded following and you see the output here now and when i execute this you can now see that the least expensive product this particular column is actually displaying the correct values so under my category earphone the most expensive is airports pro as you can see it's 280 dollars and the least expensive is galaxy but live which is 170 and that is what is getting displayed here same way if i go to the next product category which is my next partition the least expensive here is the surface headphones 2 which is 250 and that is what is getting displayed here and same thing happens for the laptop category the least expensive laptop is macbook air that is at the end that is at the last row and that is what is getting displayed here as the least expensive product so just by changing the frame clause we are now able to extract the value that we wanted last value to extract i hope this was clear and it was not too much confusing and there are a few other modifications also that we can do so whenever we are specifying the frame clause we can either specify it as a range or we can specify it as rows so instead of range i can also specify rows and if i execute this basically what rows will tell is it will consider what are the different rows that needs to be considered within a frame okay and a range is also something similar to that but the difference between a range and rows will come when there are duplicate records okay so so let's say if i am looking at phone okay so i must be having some duplicate data here uh yes i have some duplicate data here so so let's only consider this phone okay or let me just put a filter condition here we are saying product category equal to phone just so that it i can explain you better and i'll just run this so you can see that instead of range i'm using rows and it's still printing fine the most expensive phone is galaxy z fold tree and the least expensive phone from this data you can see which basically cost 300 dollars is the one plus not and it's displaying properly now i'm using rows here but let's say if i instead of unbounded following i just mentioned current row okay and now if i just execute this i just want you to focus on this last three values that is five six and seven these three values three records here is basically having the same price that is one thousand dollars okay and you can see the least expensive product here is different for each row so for row number five where we have the product as galaxy z4 flip 3 it's printing the least expensive product as the same product for iphone 12 it's printing again iphone 12 and galaxy s21 it's printing galaxy s21 now instead of rows if i just change it to range and if i execute this and just focus on this 5 6 and 7 line okay so let me just highlight that again you can now see that the least expensive product is printing as galaxy s 21 for all the three records okay so this is basically the difference between range and row what row will do is whenever we say current row in rows when i am saying when i specify here as rows and if i use the current row it will it will consider the exact current row but what range will do is if that particular row has some other rows with duplicate values so in this case this 1000 is repeated in three different rows and why it's considering this 1000 is because we have ordered by that price value okay so since these three rows are having the same price range will consider the last row with that price okay so when sql is trying to process the line number five the last value function will have access to all these seven records okay so all these seven records will be within its frame and within this frame it's going to look for the least expensive product which is the galaxy s 21 and that is what is getting printed to all these three records but if i had used row here rows then and if i executed this when it's trying to process the line number five it will only have access to these five records under its frame and within this frame the least expensive is galaxy z flip three and that is what is getting printed here okay so that is the difference between using rows and range the only difference would be when you have duplicate data range will consider everything all the duplicate data also will be considered until the last row when it is when you're using rows it will only stick to that current row okay and there are a few other modifications i can do as well so i can i can specify instead of unbounding i can specify 2 3 or any other number and instead of here also i can just say let's say two following something like this so what this will do is it will consider the rows two rows prior to the current row and two rows after the current row so if you are processing row number three then it would get two rows prior to the current row and two rows after the current row so all these five rows would be under its frame okay so these are the different things that we can do when you're specifying a frame and it's very important to specify the proper frame based on what is your requirement especially when you are using the last value function and the same thing will happen when you are using the nth value function okay and also some of the aggregate functions like average sum min max etc i hope this was clear and not too confusing okay so let's proceed further the next thing that i want to talk about is i'm going to show you the an alternate way of writing this window function so you have seen this query here so i have used two different window function first value and last value and under each of this i have mentioned this over clause right now if i had to write a query where i was using multiple window functions then repeating the same over clause again and again would simply be taking a lot of lines in our code as well as it would not be the best way of reading a query an alternate way of writing a window function would be so what i'm just going to do is okay i do i will just copy this whole query okay so and i'm just going to print it here and here okay i don't want this so i'll remove the where clause or i can keep it as well no problem and what i want to do is instead of mentioning the whole window function here with its over clause and everything i can use a clause called as windows and inside the windows clause i can specify everything that i have mentioned in the over clause okay so how i can do that is i after my from after my from as well as after my where clause at the end but it should be before the order by clause i can just mention a clause like window okay and then i give a name for that saying that windows w i can you any name i want and then i specify what i need to mention within the over clause okay so i'm just going to copy this whole thing so i'll just copy this whole thing and i'll just put it here okay so this everything that i was mentioning in the over clause i can just mention it here okay and then i don't need to specify everything under this over clause here but i can just mention this as w what's happening is this window clause is where i'm going to specify everything that was supposed to be defined inside the over clause and i'm going to give it a name like w and this w is what i'm going to specify here so whenever you i'm using a window function and when i need to specify the over clause i can just specify this alias here okay so let me just remove all of this spaces and so i'll just bring it in single line and i'll do the same here so i'll remove this over clause and you can see that i did not use the frame class here but i can use frame class here as well it will not cause any uh difference to the data so instead of this i'm just going to say w and then i'll just remove all the spaces okay so now you can you can see that i had this above query where i had this first value and then the over clause and then i had the next window function then again the over clause and if i had 10 different window functions then i would have specified 10 different over clauses and in most cases this over clauses would just be the same you'd just be repeating the same code again and again but by writing the same window function in this format what i'm doing is i'm using a window clause and inside the window clause i am specifying everything that i wanted to mention within the over clause and then when i'm using the window function after the over i just give the alias to that window clause okay and this will automatically consider everything that was mentioned here okay so even if you are using 10 different window functions you can just use this w to represent whatever you wanted to mention within the over clause so this will save a lot of your lines within your query and also it becomes much easier when com when compared to readability so let me try to execute this query and see if the data is the same and you can see that the data is exactly the same okay i just need to make this change so i'm just going to say unbounded proceeding to unbounded unbounded following okay and if i just execute this you can see that it's basically working fine and it's fetching only for phone so let me remove this filter condition let it fetch for all the different categories and now you can see that i'm having all the 27 records and the most expensive product is displaying the most expensive product under that category and the least expensive product is displayed here okay so this is basically an alternate way of writing a window function hopefully this was helpful okay so the next thing that we are going to talk about is the nth value so i'm just going to print this so what is nth value so as the name suggests it's very similar to the first value and last value first value would fetch a value or a column from the very first record of your partition last value would fetch the value or a column from the very last record of your partition and nth value can fetch a value from any particular position that you specify so we can specify a position and from that position or from that row a value will be extracted okay so let's write a query to do that and in fact i'm actually going to copy this same query so i'm just going to copy this and i'm going to paste it here and here i already have first value and last value so i'm just going to create another column here saying that nth value okay and what i'm trying to do here is so i just want to extract the second most expensive product under each category so this particular first value is extracting the most expensive product last value is expecting the least expensive product and by using nth value i want to extract the second most expensive product okay so to do that i'm just going to say nth value and i need to pass two arguments here okay the first argument is basically the column that i want to display so in this case again i want to display the product name i can display any other column i want but this is more meaningful for this query so i'm just going to stick to product name and then i want to specify from which position i want to fetch this okay i'm interested in the second uh most expensive so from each partition it's going to look for the second row within that partition i'm going to fetch the value that is of product name from that particular row so i'm going to say 2 and then here i just mentioned the over clause again i'm going to use w and then i just give an alias to this column saying that this will be second most expensive product okay so if i just execute this program you can see that now there's a new column added here and this column will always print the second most expensive product from each category so if i look at the category earphones you can see that airpods pro is the most expensive product and galaxy but slide is the least expensive product and the second most expensive product is wf-1000xm4 and that is what is getting displayed here okay so and the same thing happens here in headphones so the second expensive product here basically we have three records only is this wh 1000xm4 and that is what is getting displayed here and when i come to the category of all the laptops i can see that all the laptops here the second most the second record for this partition is xps 15 hence the second most expensive product is printed as xps 15. okay so this what nth value does now instead of two if i mentioned three then it would go to the third record of each partition and from that third record it would print the value so in this earphone the third record is galaxy buds pro and that is why it's printing galaxy bus pro okay and this is what happens under each category now let's say instead of three i just mentioned five and in my first two partitions i don't have five records so what happens is in this case it's just going to return null value okay so it's going to find it under each partition nth value will look for the fifth record and in that partition if it does not find five records it's just going to return null so under my partition earphone i don't have five records so it's returning null under headphones i don't have five records it's going to return null under laptops i do have five records and under this partition it's going to look for the fifth record the fifth record here is this xps 13 and it's going to print that xps 13 and the same thing happens for other partitions so this basically what nth value does and one thing to remember here is for nth value as well specifying the proper frame clause is important so let's say i did not specify this okay i'll comment out this and i'll just close the bracket here with the semicolon and in this case by default sql will consider the range to be unbounded preceding and the current row and now if i execute this program or this query you can see that it's and let's say not five i want let's say the second most expensive product okay and you can see that it's basically not printing properly so in in the first partition here i have four records the second most expensive is wf uh 1000xm4 for the last three record it's printing fine but for the first icon it's not printing fine the reason for that is when it's processing the first record it's only the frame is only having access to the first record and so within that first record it's not able to fetch the second record so it's printing null but when it comes to the processing the second record it has access to the second record so it's printing this and the same thing happens when it's processing the remaining records okay so that is why whenever you're using last value and nth value make sure to specify the proper frame clause so you get the data as per what you desire okay so now let's go to the next one and let's talk about another window function that is yen tile n-tile basically can be used to group together a set of data within your partition and then place it into certain buckets okay and sql will try its best that each bucket that it creates within a partition will have almost the equal number of records okay if you don't understand this now once i write the query it will be very clear so let's say i want to write a query where i want to segregate all the most expensive phones mid-range phones and all the cheaper phones so i want to basically create three different categories or three different buckets where the first bucket should have all the expensive phones the second bucket should have the mid-range phones and the third bucket should have all the cheaper phones so when you want to segregate your data equally within a few different groups and that is where we can use the entire window function so let's try to write this query okay so i'm going to write this query from scratch so i'm just going to say select star from the product table okay and then here i'm going to use the entire function so n tile and this entire function will accept one argument which is basically the number of buckets that you want to create so let's say i want to create three buckets because my first bucket or the first group of records that i want to create should be for expensive phones the second group will should have all the records which has the mid-range phones and the third group should have all the records related to cheaper phones so i need three groups or three buckets so i'm just going to say end tile of three and then i'm just going to use the over clause i'm not i'm not going to use the window clause because i'm just going to use one window function here so i'll just use a single over clause here so i'm just going to say over clause and i do i need to partition here i actually don't because it's telling me that i the query is only interested in phones so we are only interested in fetching the data for phones we don't need other product categories so i can just put a filter condition here saying that where product category equal to phone so we are only interested in this particular category so inside my over clause i don't need to specify partition in this case but i need to specify the order by clause and the order by class should be based on the price of each phones so that we know at the top we have all the expensive phones followed by all the less expensive phones so i just say order by price in descending order and then i'm going to give an alias saying this will be something like my buckets okay so if i just execute this query you can see that i have this new column created buckets and it has values like 1 and then i have something like 2 and then i have 3 so totally i have 10 records because i have 10 phones mentioned in this table under this category phone and what sql is doing is it's since i have specified three buckets okay so all these ten records it will it will try to equally split among these three buckets now we cannot equally split ten but into three buckets and so that is why what happens is it will give precedence or it will give priority to the first bucket so the first bucket will get one additional record that is four records here the second bucket will get three records and the third record the third uh bucket will get three records okay so this is basically how entire works it basically groups together different sets of rows depending on how many buckets you want to create and under each bucket it will try to provide equal number of records as much as possible so let's say instead of three buckets if i wanted to create five buckets okay so now if i execute this query you can see that i have one two three four and five there are five different buckets and since i have ten rows each bucket has been equally distributed with two records each so our requirement for this query was that we wanted to write a query which would display all the more all the phones under the most expensive category mid-range category and cheaper category so we'll stick to three buckets and i have all these buckets created here and then i'm going to write put this inside a sub query and in my outer query i'm just going to say i just am interested in the product name okay and then i'm interested to say whether this product is expensive mid-range or cheaper ones okay so for that i'm just going to use a case statement okay and but before that i'll just put all of this into my sub query so i'll just push this ahead and i'll close the bracket i'll give an alias like x and that's it so here i'll use a case statement so i'm going to say case when x dot buckets equal to 1 then i want it to be printed as expensive fonts okay and let's say if i just copy this here and here and then i'm going to say end and okay so i just click something so here i'm just going to give a name for this something like um phone category okay and i'll just say this so if this is 1 then it is expensive if it is two then it will be mid-range phones and if it is three then it will be cheaper fonts okay so i'll just execute this query now and you can see that i'm having 10 records and under this column phone category it's thing that these four phones that is galaxy z4 3 iphone 12 pro max galaxy note 20 and iphone 12 pro are the expensive phones then i have three phones under the mid-range category and then i have the cheaper phones at the end okay so this is basically how we can use entire function whenever we want to group together a few records into some buckets okay so now let's talk about the dist function cube dist basically stands for cumulative distribution it's basically a mathematical formula what this will do is it can basically be used to identify the distribution percentage of each record with respect to all the rows within a result set so the queuedist function will always provide a value which is within a range of 0 and 1. now the best way i can explain queue test function is by writing the query and then i'll explain you the formula and where we can use this qmdist function so let's say you have a requirement where you want to fetch all the products which are basically constituting the first thirty percent of all the data in your products table based on price and then in this case we can use the qmdis function so let's try to write a function so before that let's try to understand what we are trying to do so i have this product table it has the list of all the products so let me just do a select star from product so select star from products product okay so i have all of these products so you can see i have 27 records there are all sorts of products so i don't need to partition based on different category because this query is for the entire product table so what it's telling is out of all your product table if you wanted to fetch what are the products which are constituting the first 30 of your data so how do you identify which is a 30 of your data okay and in this case it's just 27 records you might be dealing in real time projects where you have thousands of records or millions of records so if you want to find out 30 percent of that data and in that 30 percent of the data if you wanted to know which are the products that constitute this 30 percent of the data then in order to write this kind of query we can use the qmds function okay so let me try to write this function so i'm just going to say select star from and i'll say cubedist queue dist and qmdist we don't need to specify any argument it does not accept any argument and then i just say over and inside over i don't need to specify a partition because i want to deal i want this queued window function to run on the entire result set so i just need to specify an order by clause so because it's told that i need to sort the data based on price based on the price i want to identify which are the first 30 of the whole data the products that belong to the first 30 of the data so i'm just going to say order by price and in descending order and then i'm going to give an alias for this something like let's say qm distribution okay something like this and if i execute this query you can see that what's happening is there's a new column created and this value might scare you and it might not be very clear on what this value is but i'm going to explain you so basically just for better understanding what i'm going to do is i'm going to create another column here the same thing i'm just copying that but here instead of displaying this in this decimal format i'm just going to convert it into a percentage format okay so to convert that i'm just going to use the round function i'm just going to say round off this and then i'm just going to convert it to numeric and then i'm going to say i'm going to multiply by that by 100 because it's in the form of percentage and then i'm just going to fetch two decimal points okay so this column i'm just going to name it like dist percentage okay so if i execute this now i think this will be more clear to understand so basically this both columns are the same it having the same data actually but just that this column i have just modified that by changing it into in percentage format okay by multiplying this value by 100 and then only fetching the first two decimal points okay so basically this column represents the dist value of each of the record in this result set or in this table okay so the first record which has a dist of 0.03 which basically corresponds to 3.70 percent means that the distribution of this particular row when compared to the entire result set is 3.7 percent so meaning that this row has a distribution of 3.7 percent compared to the entire result set of these 27 records okay and same way if i wanted to identify the first 30 percent of the records of this result set then i can just do that by less than 30 so all this so because i don't have anything like 30 so less than 30 would be all of the seven rows the first seven records here basically constitutes the thirty percent of my data from this table okay now we are dealing with a very small table here so it might seem uh not that useful but when you're dealing with thousands of records and if you have you've been given a task of fetching maybe the last 20 so especially in the sales data if someone tells you that in the sales data go and fetch the data for the 20 of products uh which are performing very badly then you can go and use the dist function to fetch the least the the last 20 of the data and then maybe extract whatever columns or values that you want in order to know which of the products that are performing badly so that is vacuum this can be really helpful now for our query what we want to do is we want to fetch all the product names which constitute the first 30 percent of the data in this table so i have the dist value displayed here now in order to fetch the first 30 percent of the data i can just use a sub query so i'm just going to say select and i'm only interested in the product name so i'm just going to say product name and then i want the dist percentage so i'm just going to copy this big column name so this one and then i'm just just for more understanding clarity i'm just going to concatenate it with a percentage symbol okay and then i'm going to give an alias to this i'm just going to call it with the same thing okay so i can just put it inside a bracket just for better clarity and after as i'm just going to leave a space so this is my column and after this i'm just going to say from and i'll just remove this bracket i'm just going to move this uh to the right so just to make it more clear and i'm going to give an alias like x okay and here i'm only interested in the first 30 percent of the data so i'm just going to say where x dot and i'm going to use this particular function here so i can use this one or this so maybe i'll try to use this so i'm just going to say this one is less than so 30 so i'm just interested in less than or equal to 30 right so this one so if i just execute this program or this query you can now see that i'm able to extract the query where it's returning me all the product names which constituted the first 30 percent of the data in my product table okay so this is basically how we can use cumdist function i think i forgot to explain you how this dist is calculated so let me go back and run this query here okay and i have this data here now this is a formula that is used to do this calculation and the formula is pretty straight forward so the formula is what it will do is and i have just mentioned it here so this is a formula it just basically takes the current row number and divides it by the total row number total number of rows so when it's in when the sql is trying to process or find the queue list of the first row it's it knows that this belongs to row number one so the row number is one and it knows that the total number of records in this table is 27 so it's just going to do one divided by 27 and hopefully you should get this output let's just try to check that so i have my calculator here and i'm just going to say 1 divided by 27 and you see that i'm getting 0.037037 something like that so if i see here and that is exactly the value so for every record it's just going to divide the row number divided by the total number of rows until it finds a row where there are duplicate values so let's say row number four and five okay so as soon as it finds a duplicate record what it's going to do is instead of considering the row number of the very first record it's going to see how many duplicate rows are there so here i have two duplicate records so and duplicate record is based on what you have mentioned in the order by and order by i have mentioned is this price column so the price is 2000 for both these records so it's going to the row number it's going to consider is of the last record that is the record number five so it's going to do five divided by the total number of rows 27 so let's try to do that so if i do 5 divided by 27 i'm going to get 0.185 185 and that is what is getting displayed and this list will be common for all the duplicate records or all the records which has this duplicate value okay so this is basically how the dist value is calculated the formula is pretty simple i hope this was clear and not too confusing so let's proceed okay so now let's talk about the last function that we are going to cover in this video that is of percent rank now percent rank as the name suggests basically stands for percentage rank it basically provides a relative rank to each row in the form of percentage so it's kind of like a percentage rank for each row in your result set okay now the percent rank is almost similar to qmds but the formula to calculate percent rank is slightly different from that of queue list so to understand this better let's try to write a query okay so let's say i want to write a query and i want to identify how expensive is a particular product so in this case let's take the product as galaxy z fault tree i want to identify how expensive is galaxy z fold tree when compared to all other products okay in my products table to write this kind of output or this kind of query i'll just use the percent rank function so i'm just going to write the query like select star from product and here i'm going to use the percent rank function so i'm just going to say percent rank and again this will not accept any argument so i can just say over and here inside over i'll just order the data because i want to know uh how expensive uh i want to sort the data based on its price so i'll say order by price i'll leave it in ascending order let the phone with or the product with the most expensive price be at the bottom and i'm going to give an alias to this something like percentage [Music] rank okay so if i just execute this you can see that i'm getting the data and there's a column that is added at the bottom here there is a lot of data here it's not really very clear right so again what i'm going to do is as i told you the data that is shown by percent rank is also again in the percentage basically it represents a percentage rank so we can just represent this decimal point number into more meaningful way that is by converting it into a percentage so to do that again i'm going to use the round function same like how i did in the previous function that is qmdist so here i'm just going to convert this to numeric and then i'm just going to multiply this by 100 to convert it into percentage and then i'm going to leave consider only two decimal points so and then i'm going to name this something like first per rank okay or something the naming is not that good but uh let's use it for now and you can see that i have two different columns the last column is more meaningful and this as i told you the formula to calculate percent rank is slightly different now the difference is that the first value the value that is shown or the value that is calculated by percent rank or by disk will always be within a range of 0 and one and in percent rank the first record will have the percent rank value as zero and the last record would generally have one it's not mandatory to have one but in most cases it would have one now what basically the calculation formula here is it's going to consider the row number that is when it's processing the first row it's going to consider the row number of the first row and it's going to do a minus 1 so row number 1 minus 1 is 0 and then it's going to divide it by the total number of rows minus 1 so the total number of rows in this table is 27 again minus 1 is 26 so here current row 1 minus 1 is 0. so 0 divided by anything is 0 and that is why it's getting 0 okay when it comes to the second row what it's doing is it's calculating row number in this case is 2 2 minus 1 is 1 and 1 divided by 26 because total rows is 27 it's again subtracting by 1 so it's 26 so it's going to do 1 divided by 26 let's go to our calculator and let's check it so 1 divided by 26 it's returning 0.03846 0.03846 so this is what it's getting displayed so this is basically how the calculation of percent rank works it's just a simple uh mathematical formula and the same thing happens for all the different rows okay so the formula of calculation of percent tank and qm disk is different but they're almost identical in their uh functionality so so let's say if i wanted to identify how expensive is galaxy z fold tree so where is galaxy z fold tree i can see that i think here okay so galaxy's default tree is 80.77 percent more expensive than all the other phones that are available in this table so this is basically what percent rank uh value means okay so if i find a value something like let's say galaxy watch 4 i can say in other words that galaxy watch 4 is 38.46 percent more expensive than all the other products in this table this is how we can use the percent rank function okay so now in order to define our query what we wanted to do is again i'm going to use a sub query here so i want the product name that is should be my product and then the percentage so i'm just going to display per rank okay and then i say from and i'll just move this to the sub query and i'll give an alias like x and here i'm going to say x dot this per rank correct it should be um okay i don't need the per rank i want to know what is a percentage how expensive is galaxy z43 right so i'm just going to copy this product name and here i'm just going to say product name equal to this particular value okay and if i just execute this you can see that it's telling galaxy z4 3 is 80.77 percent more expensive than all the other products in this table so this is basically how we can use percent rank uh and also all the other window functions hopefully this video was clear you got good idea about window functions if there are any confusions and i believe there would be a lot of confusion because it's not very easy to understand all of these concepts at once you need to work you need to make especially work on some projects where you'd be using all these window functions to completely get a hold of this but i hope the syntax and frame clause and the alternate way of writing window functions was very helpful if you found value in this video then do consider subscribing to the channel also like this video and leave a comment below with your feedback thank you and have a good one
Info
Channel: techTFQ
Views: 270,084
Rating: undefined out of 5
Keywords: sql window function, sql window functions, window function in sql, sql analytic function, analytic function, window function, how to write sql queries, partition by, over clause, sql over clause, sql partition by, parition by query, sql query tutorial, sql queries tutorial, sql tutorial, tutorial, sql query writing tutorial, cume_dist, nth_value, ntile, percent_rank, first_value, last_value, How to write SQL Query using Frame Clause, frame clause, SQL, techtfq
Id: zAmJPdZu8Rg
Channel Id: undefined
Length: 54min 30sec (3270 seconds)
Published: Fri Aug 13 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.