Introducing window functions in DAX

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
ciao Franca and welcome to a new video from SQL bi in this video I want to talk about the new window functions that have been introduced in daxa with the December 2022 version of power bi desktop window functions are powerful because they are the first time that we have functions that can work on a table sorting the table and navigating through it so once you have a table like in the variable or with some code that generates a table you can easily use these functions to go to the first row or to the last row or while you are scanning the table you can go to the previous row or generate a moving window that contains let's say the last 10 rows starting from the current row so the power of this function is that they make it extremely simple to build calculations that were much harder before because you had to use a mixture of filter indexer Rank hexa and all these complex functions besides window functions introduced for the first time apply semantics now apply semantics is kind of tricky and I will not have time to discuss apply semantics during this video even because we are still studying and understanding exactly how it works we do have a white paper currently in draft mode for SQL bi plus but it is what it is it is a draft we are still studying and understanding and that's the main reason I'm not going to tell to tell you anything about apply semantics we do not understand all the tricky details in order to give directions the right way nonetheless let's go back to window functions there are three window functions index offset and window they're simple to use and in this video I will focus mainly on their usage and what you can do with them in order to simplify your Dax code so let's get started writing some decks together I'm going to use our regular contoso model so there's nothing fancy here and as usual we start with a blank page and we start writing together some Dax code there's a lot of Dax code that I need to write so I will read from time to time what is the next topic otherwise that the video will become a nightmare to record the first function I want to talk you about is the indexer as I said in the introduction window functions require a table so you first generate a table and once you have it index gives you the capability of grabbing one row from the table like give me the first row or the second row or the last row and it's a very easy way to extract one row from a table that is sorted let's see how it works well first of all we need a table and the table contains Brands and sales so let's start writing evaluate a bit larger the font okay a bit smaller okay let's start with a variable let's call it Brands and Brands and sales where I use add columns I start from all product brand so that returns all the values from the product brand and then I add a new column that I call sales that just contains the sales amount let's use return and let's start looking at Brands and sales so we have a view of the content of this table now Brands and Sage just contains the brand and the column sales now what if out of this table I want only the first row sorting it for example by brand or by sales I can use index to do that so instead of using Brands and sales I can use index indexer is a table function so its result is a table and let's say that I want the first row out of Brands and sales and because I want the first row I also need to tell index how I want to sort the data and so I need to use order by or Dubai is one of the modifiers of index that tells you how you want to sort the data and because the table contains product brand and sales amount we can sort it for example by product brand and the result will be the first row of the table after having sorting it by product brand I do that and the first row is a datum I can also decide that I want to sort it descending and if I do that I will have the last row because I sold the sending and then I obtained the first row you can actually obtain the same result sorting ascending but instead of using one you can use -1 if I use -1 and I run the query I still obtain whiteboard importers and if I sort it descending of course I will obtain 8 atom that is the first row so you need to provide you need first to build a table and once you have a table you sort it in some way and you grab the row that you want now this could have been done previously with top n that was a simple function to obtain this scope but index is powerful because it gives you access to any row like what if I want the second row after having sorted ascending I do that and I obtain Adventure works so I can easily grab the individual rows from a table using just the indexer now index Works nicely but it requires all the rows in a table to be different what happens if you have ties now for the name we do not have ties but for example we can change the order by and say Hey I want to sort it by sales and if I do that and I get the first one I have a taste pin toys that is uh the brand that sells the list so we can go for -1 or use order by sales on the sending and Adventure works is our top seller the brand that sells the most it it works and one thing that I need to tell you before we move further is that the capability of sorting by columns that are created on the Fly is unique for Index right now neither offset nor window support this feature but the feature is all the window functions are currently in preview so we expect that the feature to be added over time and it's likely that before General availability they will also add this feature that is not for sure but hopefully it will anyway as I said all the rows in the table need to be unique now I'm sorting by sales descending and it works because if I return the Full Table Brands and sales you see that the value of sales is always a different number I do not have ties I do not have two rows with the same value but what if I change the definition and I use for example m round sales amount and I round it to 1 million if I do that and I return still Brands and sales I have two Millions two millions three millions one million one million so now if I only consider the sales column I will have a lot of ties and if I'm interested in the first row the sending let's order by uh sales descending you see that with three millions we have only Adventure works but if I ask for the second row to Millions to billions to Millions I have three possible results still it works because if I go back to the previous query I order by sales the sending if I use one I obtain Adventure works if I use two I obtain contoso accounts that it work there are three rows all with two millions the thing is by default window functions if the rows in the table are not unique they will add that columns in the table in order to guarantee that there are only unique rows in the table so in this special scenario we can go back to the table the engine said well if I only order it by sales I will not be able to guarantee that all the rows are unique so I will make the choice of adding the brand as part of the order by closer and I will use two columns I'm not gonna tell you anything but I will guarantee uniqueness the engine tries to do that by itself and it might fail because uh maybe you do not have the option you do not have uh rows that are unique and if all the rows in your table are not unique then these functions will produce a network they will never return multiple rows and that is an important difference between a top end and index for example the pen can return multiple rows in case the r ties whereas index only returns a single row and it never returns more than one row index function accept another argument that is the way you need to handle blanks it is actually the way you can sort the way blanks are sorted when considering the order by right now the only value that you can use is keep they will probably add the further values in the future but for now you can only use keep and another important thing about index functions is that you are not forced to use it so you can just skip an argument and the engine will just use the function as it is so keep is not mandatory it's a useless argument for now but you can just avoid it and avoid using it another argument that is in instead important is Partition by because index Returns the first row but you might have a table that contains multiple columns and you want to first split the table into different partitions and then use an index that is local to the single partition now with only the brand we cannot do much but what if we extend the table and we use a product category and then product brand let's do this way so it's a bit clearer I sorted not by sales but I sorted by category and then by sales so now my table will contain three rows I have the category audio three rows controls a whiteboard whiteboard importers and Northwind Traders and cameras and concorders with the brand and so on with the amount of sales let's not use them round we go back to our original sales amount okay so we have some numbers now I might be interested in grabbing the first brand per category so whiteboard importers will be the first one fabricam will be the first one for cameras and camcorders and the phone company would be the first one for the cell phones if I want to do that I use again my index function I use order by and then I use another argument that is Partition by I can Partition by product category by using Partition by what happens is that the engine will create an index a different index for each partition and then it Returns the first row for each individual partition I can run it and you see that now the result contains one row per category audio TV and video computers cameras and camcorders and only one brand per each category that is the top sell the top brand seller in that specific category and before we move further an important note I told you at the beginning that index always returns one row no matter if the r ties there will only be one row but if you look at that it's not only one row we have a table a full table that is return that contains all the rows that have the index one after the partitioning the way it works is you have to apply semantics apply semantics as I told you in the introduction is not among the topics that I'm able to cover right now but it's a unique feature that make the calculation work also in filter context where it is not expected to work so even though I do not have only one partition and one current row all these functions will work thanks to apply semantics and as soon as we will understand it properly we will provide you videos and articles that describe the feature the right way so indexer oh one thing that I was forgetting is uh if what if you omit the source table look at that I have created Brands and say so I have created a table and then I order it by something but you are not actually forced to provide the source table you can also create the source table automatically that is not something that I suggest you to do because whenever the engine does things automatically there's always the chance that it does it the wrong way or not in the way you want other than the further example let's say that I want to produce a report that shows uh by brand and by year the sales amount and then I also want to report the sales of the first year to do that well I need to start again from scratch let's start from evaluate and I want to group by product brand so I use summarize columns and I group it by product brand and then by date year day yeah then I want the sales amount and this is just say some I this is just the sales amount if I run this query I obtain by brand and year the value of the sales amount then I want the sales for that specific brand in the first year so I want to report 2017 in all the rows for contoso so this value repeated for all the rows of contoso I can do that I can write sales first year and I can use index now index one because I want the first year and then at this point I should provide the source table I can skip it I can just say order by no order by date year no actually I want calculate uh sales amount because I want the sales amount for the first year let me see if it is yes it is correct uh if I run it now this calculate will say compute the sales amount but move the year to the first year ordering by date here and if I run it you see that five one one eight one zero that is the value of the first year is reported for all the rows in contoso and then once I once I have whiteboard importers I have the value that is repeated only for white War importers so you have the option of omitting The Source table and use only order by what does the engine do if you omit the order the source table well it defaults to all select the date here so this all selector is not needed because the engine adds it automatically if you omit the source table that's not something that I suggest you to do well first of all it uses all selector that is probably the function index that scares me the most it's very very complex we wrote a lot of article and topics about all selected and I suggest you to study them before using it as a default it's a complex function and its results are kind of hard to understand besides it is very important that you know the table that you are scanning that you know the source table you are working with and that is the reason why I suggest you never to omit the source table always build the variable store The Source table so you can later inspect it with a debugger or just by returning its results and then you start navigating through it I pre I quite never use um I quite never omit when I write code the The Source table and that's it for indexer so index gives you the option of finding the nth element inside a table navigating through a sorted table the index function requires you absolute references so it goes on the first on the second or on the fourth element there's another function offset whose goal is different offset returns you a relative index of a table so you can go to the previous row or to the next row or for the second previous row or stuff like that so that gives you the option of navigating through a table and for example compare the sales of a brand with the sales of the previous brand or the sales of a date with the sales of the previous date or if you have orders and customers you can compare the amount of an order with the amount of the same of the previous order for the same customer all these kind of calculations were extremely complex to developing tags you have the option of doing that but they were slow first and hard to author now with offset they are much simpler to do and mostly they are faster mostly because they are not only always faster when when it comes to Performance window functions might be extremely fast or they may be slower than regular dark Squad again that's a topic we are still starting in order to give you uh better advices later but let's go back to uh offset offset gives you the option of returning to the previous row and in order to show you an example I want to show quite a more intricate piece of code well not really intricate but at least interesting not only to show offset but also to show a use case where it makes sense let's pretend that I want to analyze the sales in my database and I want to return only the dates with sales amount where there were no sales in the previous day so for whatever reason in the previous day there were no sales and I want to report the sales only for the days when I do have sales but there were no sales in the previous day if you think about writing that index in regular Ducks that's far from being simple because you need to compute you you need to mix a filter Max X iterations in quite a complex way whereas with the window function that is way simpler because you can first build a table that contains only the dates with sales then you scan it row by row and if a row has a value that means the r sales and in the pre the previous row is not the day before that means in the previous day there were no sales and you you can return that row so let's do that together we start with evaluate and then we first build the source table let's call it dates let's use the same name dates and sales and that contains a filter of add columns onto all the data so we start getting all the dates from the date table then we add a new column that we call sales that is just the sales amount and we only want dates where there are sales therefore we just check that the sales is greater than zero let's start looking at this first result return dates and sales and we do have a problem let's add columns with a C so that returns of 18 19 20 22 with the value for sales and you see that we have a whole here between the 20th and the 22nd for on the 21st of May 2017 there are no sales and I want to return there for only the 22 with this amount because there were no sales before now to do that I need to uh let me again I always need to check how I wrote it in the article in order to avoid having different code uh we can use filter dates and sales dates and sales and now for each row I want to check the value of the previous row so to do that I grab the carbon data that is Day date I'm iterating dates and say so carbon date is a variable that contains the current day then I want the previous date previous date and here I can use offset I can say go back one row over the dates and sales table dates and sales and I always need to provide the order by order by date date uh let's also add ascending just to make sure that we go on the previous day now that returns a table offset is a table function so it Returns the full row I'm not interested only in the first in the full row I only want the value of the date column and if I want the value of the date column then I use a select columns and I only grab the day date now previous date contains now the previous date how do I know that I want to keep the filter well I want to keep the filter If the previous date is not the current date minus one so I just return previous date previous date different then car run date -1 this closes filter and if I did it correctly that should be it let me show you just the entire piece of code we have our data and sales variable and then here we filter it only returning the rows where we do have uh the current date and the where the previous day is not the current date minus one if I did it correctly I run it and that should return you see the 18 and then the 22nd the 29th the 5th the 12th so only the dates where there are no sales in the previous day now that was quite a complex example because I wanted to show you some intricate use case but you can use these functions also to solve problems which are actually much much simpler to do so for example an important an interesting example is what if I want to show the sales in the previous year we do have the same period last year function the time intelligence function that does the job for us but we can use the timing a window function to obtain the same goal so what we can do is Define a measure let's call it sales and I named this function previous year sales okay so let's do that uh previews year sales and I want to go back on the previous Dr so I will have my summarize columns so it's actually better if I do that the opposite way let's start with evaluate summarize columns some rice columns I grew up by day year and then I write sales and I just return the sales amount that returns a 2017-2018 2019 and the value of the sales amount but then I want to return the sales in the previous year now to do that I still have my defined measure in the clipboard I can use calculate I compute the sales amount and then I use offset -1 order by date year now this calculate will go back on the previous year so I expect my sales previous year says to return I name it previous year sales and I use previous year sales and if I run it you see that now I have 2018 2019 2020 and I have a blank at the beginning because I do not have sales before 2017. now when you use window function you need to pay attention to the fact that calculations are visual that means the default is all selected on the date here we didn't specify it here but we can use all selected the ear the result is not going to change because also like the date here was the default of the function but what happens if I reduce this to a couple of years I use let's say 2019 and 2020 I filter only 2019 and 2020. our date year now 2019 has a 355 as at the sales amount and we do have previous year sales but if I apply a filter the filter over date year will be used by this all selected therefore oscillator will only contain 2019 and 2020 and instead of having 4984 for the previous year sales if I run this code I will obtain a blank for 2019. I do have sales in 2019 but they are not visible and because I used all selected the net effect is that I do not have a value when you use window functions in your code you need to pay extra special attention to the details of the filter context and build a source table that is actually a good source table that contains all the data you want to show here instead of using all selected date yeah it is way better to use or date here so even though I'm only showing two years in the window function I use the entire range of years and I show data also when it is not visible in the current visual if I run it with all date here now you see that previous year sales shows a good value even though it is not visible in the visual but it is in the database if you were using same period last year same period last year always goes to the previous year regardless of the current filter context whereas window function when they defer to all selected they only show visual results and if you do not specify the source table then it's on you but the engine will use all selected by default which is quite never the best choice unless you want to do a calculation that use pure revision if that is the case then it makes sense to use all selected as The Source table now the functions we have seen so far are index and offset index for an absolute reference offset for a relative reference there is a third function window window does not return only one row Windows returns a real table that contains a range of rows and you can go from the first row up to the current one to build something like a running total or you can build a moving average for example by taking the last five rows the last six rows so it mixes a index because it gives you the capability of using absolute indexes it uses the capabilities of offset because you can use relative indexes and you can build an entire window the syntax is a bit more intricate and of course the function is a bit more complex to use so let's see that with some examples first example that I want to show you is a moving average now if you look at this chart now we are using power bi desktop we have the sales amount that is uh the light blue line and the darker line is a moving average over six months now moving averages can be computed in quite a simple way using time intelligence functions by why not using window functions and that's exactly what we are going to do I do have already the measure here but we get rid of it let's get rid of six month average and we also get rid of it deleted from the model so we write it together again let's build a new measure okay A bit larger font now the six month average what we want to do is build a moving window containing six rows and we do that by using the window function it's an average so we start with other Rejects and then we use window accepts two types of references both absolute references like indexer or relative references like offset here we want to start from -5 relative because we are on a given month and we want to go back to five months before and then zero again relative order by or the buy date okay date year month number and because we have both the year month no actually it's enough but I have the year month so let's also use date year a month because we have it in the visual so we want to get rid of that too that generates a window containing um that should work yes that generates a table that always contains six rows and then out of that we compute the sales amount now I think this is just an intelligence problem let me check it is indeed the measure is actually working and if I place it here now you see that we have the six month average we should clean up the measure get rid of the values at the bottom but it's a very simple way to create a moving window it can be six months or 12 months or five days or whatever performance wiser is going to be faster because it sorts it builds the lookup table then it sorts it and then it scan it in quite a simple way you can use window not only with absolute values but also not sorry not only with relative values but also with absolute values for example let's pretend that I want to compute a running total of the amount of sales so I want to sum all the values from the beginning of time up to the current row again window is going to help them with help me with that and I can do that uh this time we do that using DAC Studio let's start with evaluate we use summarize columns first we compute the sales amount and we show the sales amount by year amount so day the year month number and also the date year month just to see that in a human readable way and then I want sales that is the sales amount I run it and I obtain the German number that is just a number that makes no sense to a human but then we have May June July and so on let's also order by the year month number okay so we are sure to have May June July then I want a running total now in order to compute the running total I'll build a window that starts from the beginning and goes to the current row so I call it sales run in downtown and I use calculate compute again the sales amount and then I use window now this time I want to start from the first row error so one this time is not relative it's absolute because it's the first row ever and I want to go to the current row that is zero relative and then I use order bye and we use year month number and year month as we did before okay get rid of that comma and close calculator a bit smaller the font okay so we are grouping by month Computing the sales amount and this goes back to the to a window that contains from the first row up to the current row sorted by month number and by month and if I run it I have an error the Syntax for order is uh yes I'm missing a parenthesis here not yet so window this closes window this closes calculate this close is summarize column so what am I missing uh all the Buy calculate sales amount that should be it there is a narrow somewhere unexpected value for order argument in order by function oh yes I need to provide ascending and ascending to if I provide multiple order by I need it's a mandatory argument ascending or descending and if I run it I have my running total it starts with the sales amount and then it grows it grows it grows and goes on growing all the values an interesting feature of Windows is that like all the other window functions it supports a partition by I can turn the running total into a year-to-date by partitioning it by year so that it resets the value at the beginning of each year and to do that it's enough if we add the Partition by so we say order by year month number Partition by date yeah foreign it I obtain a result that contains that grows up to December and then resets you see that in January it resets and it goes on it resets again in January 2019. now it is interesting to note that in this last example that I'm going that I'm showing you I did not specify the source table The Source table is automatically built by using all selected on all the columns that are used both in the order by and in the Partition by part altogether they instruct the engine to create a table containing the year month number the year month and the date year and then using these three column it uses all selected builds the source table sorts it and then lets you navigate through the table in a simple way as you have seen window functions are quite powerful they are a bit complex to use because they're natural the way they work is um well not extremely intuitive it's quite different than usual bags because they rely on sorting they have this concept of partitioning and most relevant they do have applied semantics now believe it or not we used apply semantics during uh the entire demo because it's always there and it made it simple to compute code in code that is would not be all that simple it all has to do with the concept of carbon draw when we say that we are interested in the previous row or in the next row we always need to understand how the engine finds the current row I totally know the concept right now is kind of Cloudy it will become clearer as soon as we provide more article and more content about it because apply semantics is all about that finding the current row and making all this function work also in the feeder context where the concept of current row is not all that simple and as soon as we have videos and articles about that we will publish it for now what I can suggest you is start playing with the feature nothing that window functions are the Holy Grail that will solve all performance issues and they will make ducks super simple they are not they are powerful there are scenarios where they really shine and perform Masa has an incredible boost there are scenarios where performers are just okay and you can obtain better performance just by using regular function again we are investigating on it but it is worth if you are serious about dags and to start using them starting to understand them right now they are not yet generally available so it's the right moment to start studying them understanding them so that when you will need them you will know all the details about how to use them enjoy ducks [Music]
Info
Channel: SQLBI
Views: 23,978
Rating: undefined out of 5
Keywords:
Id: bxSWQNSQ8rQ
Channel Id: undefined
Length: 40min 57sec (2457 seconds)
Published: Tue Jan 31 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.