DAX Query Best Practices and Historical Context (with Jeffrey Wang)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] a [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] [Music] a [Music] a [Music] [Music] [Music] hi Jeffrey good morning how are you doing today all right thanks for having me on this last final episode of 2023 yeah absolutely I feel like it's a good way to usher in a new year is to get a a fun Deep dive session with you and with a uh I'd say almost like a new product for Dax and to a degree um we we often you know the last few years have gotten some code changes but not very often we get a a new way to actually play with Dax in a sandbox environment so I think this will be a very fun session today um yeah thanks for taking time around the holidays to come on for this yeah it's really yeah I know it's interesting to learn some hard attacks right before the holiday so you have something to think about exactly something to play around with for uh for people who have a bit of downtime and if they they want to um dabble a bit in the in the code language and I saw a comment from Brian yeah about the hoodie thank you um something to keep me warmer on the winter I'm also uh for just um people who tune in on a regular basis I finally got a new setup there is a uh a lot of people have in front of their cameras that little mirrored screen I just bought a new elata one so I'm actually able to give pretty near consistent eye contact with you Jeffrey because I actually have like that that mirrored um image that I can see directly in front of my camera lens now I used to have to look kind of below up my monitor to talk but now I I bought something to help me like make more eye contact with my guest and your video is going to always be up there so I'm now going to retrain myself to look up that that's a rare Shing okay I'm I feel much more comfortable you're watching over me exactly um but no this is uh uh I think this is going to be a really great session um you've been on before also sort to uh to do a couple of dbes that we had about a year and a half ago um but for the few people who might not know um who you are or your involvement in the you know history of the vertec engine and everything with that do you want to give a little background about yourself sure absolutely yeah uh I I'm uh uh on the Microsoft's powerbi uh engine team uh actually I joined Microsoft almost exactly 20 years ago that's been a while and uh I've been always on the bi engine team and uh so I was a part of the original powerbi uh uh the the foundation team and uh I part of the uh effort to invent Dax and uh after that I've been working on uh Dax engine development and the Dax programming language development all these years yeah it's um something that if when was the actual before he was named and all that um what was the first year that the concept of it was even like starting to be discussed out of curiosity I think it's a about 20 uh 10 I mean 20 is my uh anniversary at Microsoft about 10 years ago yeah okay it's uh maybe my memory not a precise but I think that's the rough time uh time frame yeah it it's especially time right 13 yeah yeah uh is it is it 10 years because powerbi has been publicly 2015 so seven years for powerbi and power P was around so years before that that's that's about right yeah that's about right okay yeah yeah it's um it's something that has come a long way since then I I remember just what first came out there was only a couple people like it was Rob collie and like a few others who were uh Casper there was a few individuals who were blogging about it but it was still like a very new thing and now it's you know pretty much the the lifeblood of the entire bi stack for Microsoft so it's it's nice to have seen the verp engine become so massively widespread as like the foundational platform for it yeah you're right it actually surprised all of us how fast they thing uh become popular over the past couple of years at the beginning like any new product that a slow adoption uh back then I remember Tableau was still pretty dominating in this space but uh just in the last two years I noticed that many people started talking about parbi being the default solution and everything else need to justify why they're better yeah that's a very interesting change of Dynamics I think it also helped when they started including powerbi Pro with E5 skes for sure because that pretty much just gave everybody a seat with powerbi who was a Fortune 500 company that owned any kind of an office uh subscription absolutely the pricing model uh definitely helped a lot I agree yes exactly um but fast forward into to today um back in November we got that new Dax query editor um that we'll be talking about today but uh what are what's some of the stuff around the best practices that we'll be kind of getting into um as part of our discussion around Dax and some of the new new releases that we've had in the last few months yeah that's a good point Dax query view is the only New View I believe added since the Inception of powerbi right the original Three views the report view the uh the data View and the model view have been there for from the beginning so this is only the first time since the beginning that a new view is added this will I believe definitely help uh uh a lot more tax developers to start learning uh how to write the tax queries uh it's it's really this really makes uh parbi desktop uh much more uh powerful then desktop has always been a very uh important uh uh uh booster to the adoption to parbi because it'll give users uh uh a very safe sandbox to play to do experiment without worrying about uh their life uh uh reports and uh DX sare review will give them a uh additional power to do even more complex experiments and uh uh and well feel very safe that they're still in a sandbox environment so that definitely exactly and I as we'll get into once we actually start to see is I I I I will find that this is going to make it a lot easier to like save save notes developer Pages because I'm sure pretty much everybody in the chat anybody who's built a powerbi report has at some point saved a PBX file with hidden pages that had a bunch of visuals and other stuff designed literally just to analyze the the code um especially when they if they weren't using external tools or anything so it's kind of nice to clean up the report layer and have that special view where you can just have as many tabs as you want and and just save a lot of your testing work or notes for clients or anybody else that you're handing this off to yeah actually I just just yesterday it helped me actually in my own work so I was trying to extract some data from a model and uh while I was writing the Dex query I realized that oh actually it's really more convenient for me if I have uh created a calculated column in the middle that will help simplify my uh Dex expression but I really don't want to pollute my model with a new calculated column I only need this for this one query so I created in the DX query View and the nice thing is I realized that oh now I can actually save save it along with my model so it stay as a part of the model without actually polluting it previously I would have to create a calculated column hide it but that still doesn't feel right to me because it doesn't belong there I don't need it in part of model yeah so actually really Comm yeah so so the final the final measure didn't need a calculated column but you it's one of those things where you kind of create it temporarily usually in the table just so you can kind of see row by row what what's happening before you you finish the iterations um so you just you B basically just a summarized table or something like that where you you have the table with the extra column it's saved just as a query that's okay I haven't even thought of that yet but that's that also is a great use for that um so yeah it was not my plan I just found oh that's really convenient I have a column but uh it's confined to this one report yet it's saved with my uh PBX all the time so every time I open it I don't lose it so I found that's really convenient it doesn't pollute my model yet I don't lose it and stay with my model all the time get a package together so I actually uh there are many there will be many more interesting use cases of this feature people will find out if uh if you're ready to get started like I think there's there's going to be a lot of cool stuff we can show um but if you're ready to go I can flip over to your screen and we can take it off from there absolutely thank you yeah I'm ready yeah okay okay I can see my screen that's great so uh as Reed has uh already hinted today we'll be talking about how to use how to write the Dax quaries in using the Dax query view uh there will be uh I want to show uh many different ways of uh writing the essential core part of uh any dequ and then I want to explain why uh one of them is recommended over the others uh since uh many of them will have uh some uh like a very similar behavior for the beginners yet uh uh only after uh like extensive use you will find out the like a special behavior of each method so we don't have we don't want to go to into all those technical details so I want to just add some like a history to the introductions of those functions the rationale behind their uh invention this way uh I I hope I can convince you that uh this is why uh I think this approach is better than the others uh without getting into too many technical details so uh before we look at the actual Dex patterns let's first reveiew the most important gradients what I call the fundamental trail of any Dex query uh there are like a three essential elements that are uh Central to the vast majority of bi quaries not just power bi qu iies uh the first one is the grouping columns the second one are the set of filters and the third one are the it's the measures so basically you people tend to when they build a visual they will drag some columns to their Visual and then they can drag some measures or aggregations to the visual and finally they will set set some filters either through filter cards or through uh slices or slicers or uh through inter interactions with other visuals so for people who are more familiar with the SQL select select statement uh we can roughly say uh the grouping columns are similar to the group by columns actually it's more powerful I can explain why uh and uh the filters are similar to The wear Clause of a SQL select and the measures there's really no true equivalent in SQL this is actually the the biggest one of the biggest inventions of powerbi that the concept measures like also like in the past two months we officially introduced the the concept of the idea of a semantic model because powerbi always are actually if anything super strong on the modeling aspect the semantic modeling aspect and the reason is it has the concept of measures that has really no SQL equivalent but we can say uh for simpl in the simplest case measures are just the basic aggregations okay so uh if we have to map the three uh uh Concepts to a SQL statement you can roughly say the grouping columns is similar to a group by the group by columns in SLE the filters are the work CLA and the measures are aggregations but the powerbi has uh taken the the three much further than a simple SQL select statement and uh for people who are wondering uh does a powerb quaries have a a SQL equivalent of the from Clause the answer is no powerb quaries the Dax quaries don't need a from Clause because a Dax engine can automatically uh deduce uh which tables are needed and how to join them because the relationships are created of of course you can use uh use relationship and the cross filter functions to modify the joint Behavior at a query time but fundamentally the from Clause is automatically referred there's no need to explicitly spell it out that's one of the reasons why Dax queries look a lot lot smaller compared to equivalent SQL queries and also uh if you uh if you look at the the the the the the fundamental Trail plus the implicit from Clause they are the initial four steps of of uh the if you look at the uh the order of uh operation order of execution of a sequel select the statement that the first four stages are uh the from clause and uh apply the wear clause and uh uh you do the grouping and then you do the agregation so this fundamental Trail plus the implicit form uh uh is exactly the first four stages in the order of execution of a SQL SEL the statement the rest of the SQL operations like the having the select list uh uh the Water Bu The Limit all the other things uh uh they come at a later stage so that we are not going to talk about them today so in Dax uh if you are used to writing more complex Dax queries you know that we do things step by step so this fundamental Trail is always the most important the first step under the rest of the operations just like a SQL but uh we instead of combine them into into a single uh command we actually do do them in the followup steps which I'm not going to talk about today so let's look at a concrete example about how this uh uh three things the grouping columns the filters and the measures are uh uh like a shown in the different uh ways of writing tax queries so I'll switch over to a desktop and switch over to my desktop so here in my desktop I have a very simple uh visual I just created a matrix here in my Matrix I have uh two columns one is the year the other one is the country so I have a year and a country I actually I have a single country because I also set a filter that is a uh country equals United States and then I added two measures one is called total quantity the other one is called the uh mean price so this is a very simple Matrix so uh I set two filters through two slicers one is on the right hand side the country uh I selected a single country and the color product color I selected the two colors blue and red so this is a very simple uh uh visual okay and uh uh two grouping columns country and the year and two filters one on Country one on color and uh two measures one total quantity one minimum price so I get this a small thing which is a 600 uh so so the the the three quantity values and the three price values so this is a very simple thing so let's see how I can uh achieve this using a Dax query and the all the different ways of writing the same Dax query so now let's go to the Dax query view the fourth one on the left left and uh I have uh prepared a few different ways so the the first way is the uh wellknown summarized columns so uh in the summarized columns I can put all three uh uh ingredients in the same function so I have uh the the year and the country as the two grouping columns and then I have the two filters in the middle one is the uh one is the country here on the country one is on the color another filter on the color and finally I have the two measures and the total quantity here and the the minimum price here so a single function can take in all three uh uh elements that are essential to the Dax quer uh to any Dax query if I run it I will get the same values uh we just saw on the visual the three quantity values and the three uh price values so this is a one way of doing it but there there are some other ways also doing it so let's look at uh another way which is I'm using the oh let me uh I can use the summarized function so for the summarized function uh it uh also uh can do a uh the the the two uh group eyes the year and oh sorry Z so let me do this one so these are the two uh grouping columns the year and the country and then the two measures the total quantity and the mean price since a summarize it takes a table as the first argument and instead of uh uh uh and also it doesn't take the filter explicitly so one of the common ways of applying the filter is to use a calculated table function that can supply the filters uh uh uh from outside function I can move the inside function too but there are many different ways so I'm just choosing one of them to use calculate the table to supply the two filters one on country and the one on color so so this is another way if I run this query I should get exactly the same result the three quantity values and the three price values they should look exactly the same as the summarized columns result and I have a quick question that actually un summarized columns because I know like earlier on when I was in my um going through my Dax education with myself um you know I think a lot of people get tripped up on like you know trying to use summarized column in a measure um rather than summarize add columns or anything in between and you know often get getting like an error or something so like how when and where should summarized columns be used and can it be used with measures uh that's an excellent question uh hold on hold on on that I will I will answer your question later yes that's excellent actually I I I have a good news for you okay so so now we have seen we have seen oh that's a big hint so now we have seen two ways one to use a summarized columns and the other one is using summarize and then there is a third way uh I can use the groupy function so uh to use the group by function uh again is I can supply the two uh uh grouping columns the the year and the country uh Group by functions is very very similar to the summarized functions the first argument must be a table in my case I uh happen to use a filter function that uh so that I can supply the two filters in uh uh as a part of the table argument to the groupy function so the two fil filters are uh one on Country one on color and I already have the two grouping columns and then I also and uh one of the limitations of a groupy function and is also the main reason why it's not recommended it doesn't truly support measures and so but since my two measures happen to be simple aggregations so I just use two uh aggregations to uh one is the sum X over the water quantity and the otherwise using the mean X and over the unit price so again is two grouping and the two aggregations and the two filters so if we run this one uh we also get uh exactly the same result the fourth method uh is to use uh the add columns okay so this is to use the uh the add column method the add column is mainly used to add something to existing table so I'm using it to add the two measures so uh I use the add columns to add the two measures one is the total quantity the other one is the minan price and then the first argument will be a table and in the table I Supply the group by columns so I have to use the summarize function for that purpose so I use a summarize just to only use it to provide a groupy this is different from the previous use case of summaries where I Supply both the two group eyes and the two measures here I'm only using summarit to do the groupy on the year and the country so I already have the two measures added by the add columns and then I also use uh uh the calculated uh table function to to supply the two filters one on country and one on color so again uh two two measures two group bu and two uh filter columns uh if we run this one we should again get the same result last but not least let's look at uh uh the the newest addition to Dax it's not that new anymore because uh uh the composite model which is direct quary to the semantic model uh was uh uh GX about a year ago but uh in the grand scheme of things it's still a relatively new feature considering the power uh give to the user so the group uh cross apply uh is intended to be a internal function because it was introduced to support uh uh comped models so the support to this function is still limited and uh you can still see all this R Squigly because intelligence which is the uh the the intelligence support has not been added yet that's one of the reasons why this uh DX reviews still in a public preview not GA yet I believe it should be fixed before GA and also the public documentation is still being prepared but the group cross apply function is very similar to summarized columns function so as you can see uh it it's also uh take all three ingredients and also in the same order as summarize columns and uh the two group of functions at the beginning the year and the country the the two filters in the middle one on Country one on color you notice that there is a third filter here even though it doesn't truly filter anything there's no like a subset of values of year and uh and uh and but the two measures are added in the end which is similar to summar columns and this additional filter is because as I said this function is an internal function therefore all the colors must come from the filters it's one of the uh constraint so it's less flexible than summarized columns I just to uh show it because uh it's also one of the ways for you to get exactly the same result okay so we have seen like a five different ways of uh uh achieving exactly the same result uh I want to mention them is because this is one of the reasons why uh people find the DX hard is uh they they they seem to have a like a multiple ways of doing the same thing and for some users there's no clear reason why uh uh there are so many so much redundancies so that's exactly what uh today's uh presentation is about I want to give people a little bit of history about each method and each function so you can understand why and why we recommend one of them which is the summarized columns approach so let me let me go back to my uh I definitely found that to be the easiest to write because it's just you know there's there's no extra functions you just you pick whatever columns you want from the model build the table out summarizes it up um but it it it's as you know historically had had some nuances of like where you're supposed to use it then um has tripped people up I think over the years I would say easy to write is definitely one of the uh most important one of the important reasons why this is recommended but is of course it's more than that but for other people who uh people humans are uh like a animal of habits right so you can if you look at the other approaches they are not too bad I agree they are more verbos less elegant but they're not too bad what once you get used to it and you find out that it's it to solve almost all your problems especially the initial problems then people tend to get you like get like a stuck with it right so because they already fully understand the nuances of a particular approach so they want to stick to it instead of learning a new way to to switch over to a different approach yeah there's the um I'm sure you know you know that like that iconic book from the 90s the who who Moved My Cheese um it's the whole idea of like it's we're even though a better approach might come out it's it's hard and it takes it we're slower at it because it's new to learning like no no no I don't have time for that I'm just going to do whatever I've been doing for the last 20 years because I'm it's it's my pattern and I'm used to it and I can get it done faster by 5:00 pm rather than re trading myself on whatever this new fancy thing is yeah yeah absolutely yeah I I remember it took me a couple of years to convince run one of my relatives to finally adopt the Google map to to to to switch away from G PS and just to adopt the Google Map even though the lad is much much much faster to set up so yeah people it's hard to break habits I know exactly okay so we we we we we look at the five different options right we can use so it's not just only one function we saw that we need a combination of a function plus other ones but I just want to single out the most distinct feature in that uh particular approach so number one summarize columns number two uh using summarize uh to get the grouping columns and the measures and use calculate table to supply the filters we can use the group by function to get both the simple aggregations and the grouping columns and uh we can use add columns to add measures and to uh to to some uh using some other approach like a summarize to actually or you can use gry to generate the the grouping columns and the filters can be supplied either through the filter function or calculate the table function and the newest addition to the family is the uh function that internally used to generate uh queries for direct query to sematic model the group cross apply that seems to be uh uh almost like identical TN to summarize columns except for some uh small differences uh uh uh uh requirement uh where the grouping columns must be uh come from and uh so today we want to uh uh look at compare these different approaches from the historical perspective and to understand why the summarized columns approach is the the the the right way to go okay so number one is the official approach the summarized columns okay so uh the if you actually are uh deep if you deep dive into uh powerbi uh go look uh under the cover many people have already uh captured the queries actually generated by all the different visuals and the D Square VI definitely make it easier you you should notice that the summarized columns is the fundamental uh uh function powerbi itself uses to generate almost all the Dax queries so uh let's uh look at the one we just uh let's look at the the video we just uh uh the the simple Matrix visual if we capture the underlying Dax query you will see the uh summarized columns so let's go to the performance analyzer and uh we can start a recording if I refresh the videos uh uh I have the Matrix uh right here and now we have the we can Rite the directly inex Square review that's really convenient now so if I capture it and that shortcut is fantastic yeah that that is uh saves a lot of time don't have to copy and paste around and then if we first of all it's a much bigger query okay but uh if we go to the beginning you should see there's something called H there's a variable called where is my cursor where did my cursor go okay let me start again okay I found my cursor so uh in the middle you should see a variable a variable called double uncore ds0 core and it is defined using the summarized uh columns function and then these are the two grouping columns the year the year and the country there will be two filters and come in as variables but if we look at the variable uh you will see that for example the variable right before for right uh above us is the one on the country is the filter on the country column and the other one is on the year uh even further above and uh the two measures right here so so this is a so since powerbi is using samiz columns to generate the core uh obviously there's a good reason uh we recommend it to the users and uh if you look at the rest of the steps in the query they are all building so if I search for this variable the ds0 core double underscore in front of it if I search for it you will see that uh the rest of the uh intermediate steps are all built on top of this ds0 core uh variable and so one thing that I'll I'll just somebody mentioned it in the in the the comments in terms of filtering the like years ago when I when I was looking at this and reading a couple blogs like one thing I love to do is actually like those complex filtered tables putting that in a a declared variable and then just calling it um like all the other ways to use variables it it really cleans up reading through your logic when you separate the the two of them so even when I'm writing measures I will often declare my a filter table as a variable and then just call upon it in a calculate function or something um it just really does add to the readability and it helps break up those chunks rather than having a giant block of code yes so varosa is great about there there like everything's great about variable except for it name it should not be called variable yeah and uh we back then we were we were debating and the lat we should have used a lad or const or VAR and then I lost the the the battle on the naming because I I tend to say okay let me because you didn't know that uh I need to I need to convince other people that we need to take a variable as a feature back then actually it there was a strong push back to introduce as a feature they said oh let's let's just combine to un qu I said no no no no there's a lot variables everywhere including in the measures and uh so at least I win that I won that battle but uh the naming battle unfortunately it's it just stuck with a variables even though it do it is a little restrictive in terms of what like what you can declare because it's not just a variable it's table like it's a lot of stuff but it was one of my favorite inventions in in ads to in the last eight years when when they released that uh with with one of the updates so many measures instantly just became faster when you started to be able to use those especially with like switch statements I I love variables yeah yes and and also make it debugging so much easier right you can just return any of the intermediate variable as the result yeah like the Marco and Alberto blogged on that years ago and like that is exclusively the pattern that I use for every measure that has a variable is you always have a result variable so you can swap and uh so uh the the reason why I call the the the the the grouping columns and the filters and the measures the fundamental Trail is because uh the result of that fundamental Trail is used to uh to Fed uh all the other uh calculation steps for the rest of the query okay so the final query does not return that if I run it I will not get exactly the same thing actually I'll get the two different results one is just the country the other one is uh uh more like the things we just saw that is the year but this time without the country and with something called column index and then the the three values for the quantity and the three values for the price so you can say the the final quiry shape is not exactly the the the the result of the summarized columns but it is the result of the summarized columns that is used to build the other things that eventually returned as the quiry RO sets Okay let's go back to so the most important thing we we we know the the one of the key reasons why summarize columns is the most uh uh uh is the recommended way to to build the core of your query is because uh powerbi is using it to build its core and uh in the future when you are debugging your powerbi uh like a close it uh in the future will will you debug uh like a uh some some of your visuals and then you can use the performance analyzer to capture the DX query underneath and then you should uh most of the times it should be this variable name you should always look for this variable name and the the double uncore ds0 core and almost every query has it the vast majority the mass of the non-trivial ones they almost always have it and this supplies all the data that is used to shape your visual even though your visual may eventually not display all the data for example there can be sampling right so for the line chart they cannot fit all the data points if you have too many data points but still the data points are actually uh returned by the ds0 core it's just like the rest of the uh followup steps so we'll do additional sampling either using a top end or using one of the sample functions to uh reduce the total amount of data uh eventually displayed on the visual but on the calculate calculation engine side the Dax engine side all the data points are prepared and they are uh stored inside this variable and uh then if summarized economies is so uh essential why is uh why there still people are considering the alternative approaches as I said one is the habit but why do they have the Habit in the first place this is exactly the question you asked just a few minutes ago summarized columns cannot be used in measures and uh so people are forced to uh use other approaches to produce a similar effects they as as a one of the intermediate steps they need to build uh uh a small table as a part of their measure calculation that have some grouping columns apply some filters or just have some implicit filters picked up from the evaluation context and then they want to calculate some aggregations or some measures as a part of the intermediate step but since the summarized columns was is banded inside the measure and they are forced to learn the other approaches once they learn the other approaches they maybe say okay maybe I can just use it for uh tax query uh uh in general so that's one of the most uh in my opinion potentially one of the most important reasons okay but the good news is I hope that uh uh there will be some New Year's gift uh in come coming up to 2024 and finally this restriction will be lifted hopefully in the first in the early part of 2024 that's some of the yeah I say like that's some of the more exciting news that I've heard in a while because it it's the easiest approach that I found of all the patterns that you've shown for somebody who's just getting familiar with Dax like it it just it lets you pick a column from anywhere you don't have to do a measure and have a Max or a Min of another column to add the special ones you just select the measures from anything that's related in the model build your your table and it's the easiest way to get started but you know you get those little um uh you get the um syntax errors try trying to use that so if that actually can come through I I think that will certainly make the the bar for intermediate dexs will lower even more um for for for people needing to use that because it's just the it's the most intuitive uh in my opinion to to generating those like custom filter tables that you need to that's awesome yeah if anything else it will definitely simplify the syntax right because the function takes in the the trail uh all ass is argument so it becomes more elegant and a simplified expression I mean it's syntax sugar pretty much yeah exactly well yeah it's actually more than syntax sugar but uh for the vast majority of the basic scenarios yes it definitely is a Syntax for yeah and uh so uh one of the reasons why we have all these different Alternatives is uh it's not just uh uh uh like a Simplicity of a syntax uh more importantly uh is because of uh uh semantics and the performance okay so summarized columns is recommended uh also because it has the relatively the most correct semantics by correct I'm not talking about the box that uh it's it it doesn't return the result that it designed to return that's not what I meant I means even when it returned the result as designed it may not be as intuitive or as understandable to the end users uh summarized columns has the list of list of such side effects compared to the all the other approaches so this is also another reason why we recommend you to use a summarized columns uh uh as we already seen that all the different approaches uh for basic things they they all return exactly the same result that they work and one of the great things about Dax is we do our best to give every function to it's a full potential okay we don't put in artificial hard limit saying oh I'm going to raise error simply because this is not my primary usage scenario we said okay as this function makes sense from that function perspective we'll support it we will fully implemented the functionalities for you to use that's why no matter which option people have chosen they find themselves they can go very far with that approach so they can solve almost all their initial problems so they never realize that there can be some some like a potential like a limitations are lurking under the uh surface like a further down the road when you run into more complex scenarios you may run into those kind of things so this is what I call the semantic issue of different projects and among all the different approaches summarized columns has the list of those things so you will not get like unpleasant surprises later on when you are let's say you get used to your summarized columns then you started adding arbitrary combinations of uh grouping columns filters and the measures and you will have the least amount of surprises but it's not fully immune yet and the summarized columns do have one uh semantic issue and the the CBI team has a published a good blog post about it it's called the auto exist behavior of uh uh summarized columns why do I always lose my cursor the first time let me put in the middle I'll uh see if I can get the the link to that and I'll post it here we go Dax Auto exist um was this a recent post or one from like a couple years ago it's from a couple of years ago okay then I I got the link it is uh it is summarized comment uh actually uh is doing the designed Behavior but the designed behavior is simply just doesn't make sense to many users who have a uh like a use like certain combinations of grouping columns and a certain type of measure expressions and that is but still it's not that uncommon actually every single year uh Alberto or Marco will send me a new incident from their customers saying well it's not returning the expected result I would say 80% of time is because of the auto exist Behavior so it's very surprising that the function can return R out that way so this is another potentially good news hopefully this problem will finally be solved in 2024 sometime hopefully so I'm guessing then like basically it's it's when eventually it's getting used inside of a measure it there will probably be some things happening in the back end where recognizes that and it's accounting for this Auto exist scenario like that's that's what's been needing to get patched so to speak no no no that's not what I mean so why I say hopefully it will finally be solved in 2024 I mean it'll be solved in both the measures that's a separate thing so first thing we hope 2024 will bring up a good news that you can finally use sumarized column C measures secondly is regardless where summarized EC is used in measures or in queries hopefully we have a solution to the auto exist okay just just globally okay perfect yes globally yes okay so but we have to where you you you can understand why we are cautious about it because summarized columns is such a fundamental part of RBI and there are so many existing report out there we have to be very cautious about changing the default behavior of the function right it has a huge impact potentially even though the combination is not that common but I wouldn't say when when you have hundreds of millions of like a uh quaries running every day chances are even 1% is a very big number right and if even if at 0.1% and uh so that's uh uh uh the first option the second option is use a calculate table to supply the filters and use a summarize to supply the grouping columns and the measures the reason why this approach uh exist uh is because they can be used to uh to create those temp tables temp results inside measures and also this method Works in almost all the cases very similar to summarized columns is because it's actually how uh powerbi at least the initial implementation of powerbi actually generate queries the the query generated is not what I showed uh in the report let me go back to my uh so what I showed is uh I use uh I use a calculate table uh to supply the uh to supply the filters and use a summarize and then the first argument is a table and uh for Simplicity I simply used the uh a simple table name but that's not how the actual query is generated if you look at a um something that is a closer to what actually generated back then during the power pivot and the Power view error of power Pi uh the query looks like this uh this is not again this is not the full quy I'm only highlighting uh the the essential part of it so again I'm using the the back then we're using the calculator table to uh we're using calculator table to supply the the filters we using summarize to to supply the grouping columns uh which are here and here the year and the country the measures are total quantity and the the the minimum price but the the table that is fed to the summarized function is actually more complex so it it might it's not exactly same but it's very similar to what was generated back then is this part so we're actually using a generator function to combine the individual group columns one is the the the year column the other one is the country columns and then we are repeating the measures here you can tell the measures here and we are using a filter function to do it back then this this is a more explicit way of of applying the implicit what we call the the nonempty filter okay so the non-empty filter is saying the combination of a year on a country uh only for the combinations that at least one of the measures have a nonblank value so back then the nonm filter uh was explicit today in summarized it's baked into the implicit semantics of summarized colums but uh not in the initial version of par quaries okay so that was when the that was the old ways of of generating query back then so uh why did we switch away from that approach to summarize colums uh we actually uh we discovered both semantics issues and performance issues as you can you can already tell that the measures are repeated twice right so one so the same set of measures are uh uh are supplied twice once as a part of the filter condition to the nonempty filter the other is through the uh the the final output of uh the summarized function so this is not elegant for sure but it's more than just a a syntactic uh uh Simplicity it's actually caus significant performance issues as well so we have both performance issues by the duplication of the measure references in different parts of the same query we also have issues with semantics it returns uh it has a much worse version of Auto exist behavior in a sense where we sometimes call the clustering Behavior okay so this uh query pattern unfortunately have both severe semantics issues and the performance issues in more complex scenarios that's why it's difficult to fix both because semantics is something like you don't want to just change uh randomly that's one of the main reasons we introduced the summarized function summarize columns functions in a project internally code named the the super Dex project so yeah we did that and the super so so this is a so uh for for the early adopters of power pivot and the Power view you know that is not nearly as popular as powerbi today but this is also the reason one of the reasons we it's a surprised that even us that the powerbi get a like a wide adoption so fast because we had a very slow adoption back then in the power p and Power view days but the good news is we learned a lot from that including uh that uh what kind of query behavior and query performance is desirable to the user and we started this superdex we introduced the uh summarized columns we introduced the variable feature for example and we uh it's actually a m-e effort but the good news for us is when we finally finished the project uh the new powerbi was launched and uh so this uh like a uh Improvement even though uh the the initial the the the initial powerbi after it's being relaunched from parp and Par it become a standard loan product the initial parp can only do very simple things like a basic aggregations and the basic grouping columns so the old one does not even have the uh the semantic issue you not even be able to see the the semantic issue because you can't even create it you couldn't even create a measure back then but the performance the Boost alone was one of the main reasons why parbi was popular even out of the gate and you you didn't know back then I when I was playing the the the the the initial version of parbi after it became a a standalone product because the client team have to completely rewrite the entire client so they a lot of the old optimizations was not picked up in the initial release so back then I remember if you have like a 10 pages in your report and each page have a three measures have three visuals then you uh no matter what you do on the canvas like I just drag one video from the left side of the canvas to the right side all 30 quaries will be executed so back so they it was it was loading all pages I at the same time it was running all quaries behind the scene oh my goodness I mean I I was using RBI designer I forgot about that um that's well yeah so the super Dux uh at least that the performance performance Improvement aspect of super DX uh was really one of the reasons why people don't reject the powerbi immediately think about how slow things could be right no matter what you do I simply resize the size of the window I didn't change any data where I just swapped the water of two columns no matter what you do every single query of every single visual on every single page visible or not are re-executed in the back end I feel like there's some report there's reports built today that if that happened the the computer would catch fire if every deck measure tried to run like that it's a good Improvement yeah I didn't realize that was uh an issue so very early on in the stage was quickly addressed with super Dax yeah very few people even noticed that a slowness and one one reason is because back then the data volume was a small and the queries are simple there's no complex measures but still we around 30 queries for every single UI G you will you will be annoyed if the performance is a real problem to you yeah so yeah come back to here so uh so the reason why uh this uh calculate table proc summarize is uh pop one of the popular options out there is because uh they can be used to recreate pretty much the same result uh inside your measure def and also they are very in terms of capabilities they are very close to summarized columns okay without the constraint of not uh permitted in measures and but still we want you to adopt the summarized colies because exactly is because uh 10 years ago we discovered that there are some issues when calculations when the combinations of the trail become complex enough you will run into like a problem s that you may scratch your head for days if not weeks and to save you all the trouble just stay with summarized columns so that you don't even uh run into that problem in the first place and uh but it doesn't mean we want to decate the summarizer function even though summarized function was invented for query generation we we recommended that you use the restricted version of summarize so summarizer said it can supply both the grouping color under the measures we said that the measure part is where the more like it depending on the measure expression the more like uh uh complex uh Behavior May manifest but if you only use it as a simple way to generate a valid combinations of grouping columns then use a summarize you you should still use it yes yes just use it to generate the combination of uh grouping columns that that is a valid usage and the groupy function why we invented the groupy function is because uh uh groupy function was also a part of the superdex uh project it was used as a function strictly for the UI to do some additional aggregations on top of the core results so summarize columns is going to the physical table to to do the grouping and the aggregations and the group by the purpose is it was taking the result of summarized columns which is storting a variable it's it's a intermediate step and then it's no longer a physical table and then the groupi function can can be used to do additional aggregations strictly back then for the purpose of uh let's say uh for a line chart and they want to calculate the maximum value and the minimum value in order to calculate the range of the axis those kind of things so that was the original reason to introduce this function but unfortunately for people coming from the SQL background relational database background groupy may be the first function they run into because it has a really nice name right it it Maps exactly to the concept of groupy in relational algebra so so I I I'm afraid many people want want to use this function simply because they like the name of it but and also it can do for the basic aggregations even though it is a uh if we look at the uh the the the syntax of the group ey uh it it it requires a uh let's say instead it only require it only why first time so it only uh takes aggregation functions it also doesn't take the simple version it requires the X version of the aggregation functions and because you always have to say the current group and then uh you it can supply the grouping columns under the basic aggregations it's more verbose in terms of a syntax but uh some people like it because of its name and uh the reason why I want to mention that it was introduced exactly to do the postprocessing step of summarized columns is even though let's say I only want to use it to do the uh What uh the previous uh the summarized approach is doing is summarize was doing something similar as well right it's providing the group the grouping functions as well as the measures and the group I is doing the same but the summarize at least uh it is uh friendly to the storage engine by storage engine I mean the verti engine or the direct uh the direct quiry to the relational database so um fundamentally the Dax engine is sitting on top of a query accelerator so at the bottom layer the vert pack engine that what let's only talk about the import for the time being the verti pack engine is like a fast like is is it can Leverage is much closer to the hardware therefore it can deliv the Supreme performance and the Dax engine is mostly to M to do task management to make ensure the full feature and the full functionality while the the verti pack engine can do the like a raw like a turning of the code like the CPU where in the future potentially GPU turn and to to produce the actual result as fast as possible so summarize and both summarized and the summarized columns are optimized to push as much as possible to the Verte engine uh uh or to the relational database but the group by function doesn't do that so if you use groupy on smaller data set you would not notice any difference but on the uh bigger data set even if you do exactly the same even if you only do basic things the performance can be sub stantially worse compared to the other projects okay so the and and so you're saying the versus the two the group group bu against hundreds billions of rows will um will aggregate or go through materialize the data faster uh the group buy will be slower because it's not pushed pushed down to the Verte engine got what would be your use cases to recommend would there ever be a use case to recommend that then over summarize if it's less efficient I personally would not recommend it at all okay as I said many people discovered it by its a friendly name so I'm just so so this is really supposed to be a internal function mostly for query generation do some post processing of the core result during the quiry generation yep makes sense okay and the add columns is very popular not because syntactic is better it's mainly because if you look at it a syntax if you we're almost at the time so we're fine if you need to go over a bit so no no worries okay thanks so look at the syntax it added the so it takes a table as the first argument in my case I happen to your summarize it can be any other table you can supply the filter directly there if you want and then it added to measures therefore unlike all the previous functions that the grouping columns and the measures are fit into the same function this guy force you to separate the evaluation of the grouping columns if you have any or filters if you have any versus the measures so exactly because this like a like a physical separation of the arguments it has a some it has a less side effects so in some more complex measures sometimes you run into those like Auto exist behavior that both summarized and the sum summarized columns has but ey columns doesn't have that problem so so that's why sometimes people find ey columns to be uh a good way again is you can use it to author Expressions inside the measure it's not restricted there and then it also doesn't have some of those side effects simply because if you look at the way the arguments are passed to it the tables where are the group the grouping columns are calculated is already it's a separate argument it's not a part it's it's not the function's job to do the group ey right the function is only to add additional columns and then in my case if I'm adding the measures then the measure then this like a complex interaction between certain combinations of grouping columns and measures it doesn't apply to to add colums so in that sense uh it's it's a good thing but first of all it's still two both in my opinion but more importantly is because it doesn't support uh the implicit Auto exist so for many people who are not aware that auto exist not not exist the the the implicit uh nonempty filter right so so so so for many people who are not aware the the the importance of that filter they may run into some surprises and uh when they use other columns so uh the CBI team has proba the articles about the correct pattern so you have to explicitly spell out the nonempty filter like the old Power query uh uh the old way of doing uh things that where you explicitly uh spell out the uh the the filters here oh if you want to use add columns you have to be more careful to supply your own non nonm filter if it's a two columns from different uh Dimension tables like the grouping columns coming from to if you only have a single grouping column that tend to be okay most of the times and you may see some unpleasant blanks but it's not not big deal but if you're not careful to uh to have a grouping columns from two different dimension tables and you don't explicitly Supply your own nmt filter you may get a huge nasty cross drawing that uh that you don't know follow Supply quick followup question from somebody in the chat related to um summarize so would you ever select a combination of like summarizing the data first in uh in a declared it as a as a as a table with a with a variable and then using group ey for further grouping um or would that again like would you be concerned performance issues versus some other approach um did that you you uh I in that case I wouldn't worry about the performance because by defining a variable you are potentially already uh doing the heavy uh lifting already in the first variable but since a summarize can be used as a group itself you are already using summarize you can just keep using summarize to do the actual group okay rather than having to switch functions you could just you could just you could Nest it if you needed to okay yeah you don't need to uh do another group by using a different function you can just use a summarizer to do group by so what I'm trying to say is that we we still uh want people to use a summarize but use it as a group by function instead of as a pure groupby function don't add measures to it gotcha and uh and uh and also add columns uh well is is not as optimized as a summarized columns in the sense that summarized columns when everything are simple filters are simple measures are simple the model is are simple but it can uh it basically can generate a single uh storage agent query basically if you direct query it gener it entire summarized columns can be translated into a single SQL select statement but add columns is never intended for that purpose so it will always generate two or more uh storage engine queries so it has a performance issues as well even if you supply your own nonm filter okay so so the the in the end summarized columns is a recommended way to go and especially if next year we finally remove the constraint that is not allowed in measures and uh the final option is the group cross apply this one uh I don't wait I just want to mention it because now that you can easily capture queries you may actually capture some queries that use this function this is also internally how uh as engine generate uh uh translate summarized columns when it cannot be pushed as is to uh direct qu uh to a a remote As engine right so uh uh some people may say oh can I use this because it has a different it it's very similar to summarized columns in the syntax right and uh so it it has a little bit inconvenience that uh uh all the grouping columns must come from the uh uh all the filters so even you don't even if you are not trying to filter anything you still have to supply a filter just so that you can use a grouping column that Al is just not uh convenient enough but but on the other hand it seems to have a a very similar syntax and behavior to summarized columns especially in simple cases that is indeed true and it actually uh the reason why we are we have to introduce this function as a part of the compensated model work the direct query to semantic model work is exactly because uh uh it has some semantic differences between uh this function and the summarized columns and internally because now we're talking about we have a internal representation of a more complex query structure a more generic data structure but we need to uh push the calculation to a remote S engine instead of doing it locally because it's a direct query to uh s uh in that scenario therefore uh we have to introduce a new function because we have to reflect the internal uh behavior and semantics when we try to evaluate this grouping and this measures and this filters so uh but uh again uh let's leave it to be a internal function there's no need to learn a new thing from users's perspective you should stick with summarized columns as much as possible and uh it's only for the uh truly Advanced developers they want to learn everything about the Dax so they may want to uh delve deeper into uh this kind of like a internal functions so I wouldn't recommend people spend too much time on this try to figure it out how this function works okay summarize columns should be sufficient that'll be very exciting once that uh finally comes out quite a few people in the chat are quite eager to to see that happen Okay so uh so that's all the different patterns so again uh multiple options but in the end summarized columns is the recommended way and uh for people on a related question for people to say can I Supply the filter used the calculate the table or calculate uh in general where use the filter function I just want to let people know that these two functions the calculated uh filters and the filter filters are semantically different okay so when we design the Dax programming language the calculator table we have a prefilter behavior by pre-filter we mean the filter are semantically supposed to be applied at the Lea level so you have a more complex expression tree the filter is applied first at the leaf level notes and then the result is propagated up uh toward the uh the the top of the expression tree while the filter function by Design having oppos filter semantics which means that the table expression is supposed to be evaluated first and then the filter is applied afterwards but since the filter is such a popular function we actually do some deep analysis with say oh a lot of times we can actually push the filter down we can apply obviously you want to uh apply the filter as early as possible so we said okay as long as the final result stay the same uh we will push the filter down uh whenever possible but I just remember they are semantically different we have a complex expression the calculator table one is supposed to be the filter is applied at the Lea level and by Design and the filter is applied afterwards so for people who are asking how do I do the having Clause uh uh like in SQL select statement having is obviously a post filter Behavior right you have the fundamental Trail to produce the initial uh groupy and aggregations and that the having is supposed to apply afterwards and then this is where the filter function should be used to remove unwanted values from that by the way unfortunately there's a few situations of that like the the auto optimization I I think in the last couple of years there's been definitely some pretty big improvements uh with with that is you you identify the the a bad pattern where somebody is writing something that gives right number but it's not performant and the engine recognizes oh like we can do this function instead of that you're going to get the exact same result but this will be three times faster because you're not materializing nearly as much data in the storage engine now you're right you're exactly right so in that sense the tax is very similar to SQL they are declarative language so they declare the semantics the intention of the user the actual engine the actual execution order is left to the Dax engine to to do the best possible so yes we do a lot of optimization actually the vast majority of the development work in Dax engine is optimization it's like for all the equivalent results can we choose a different execution plan to uh to achieve the results faster yes exactly as you said we have done a lot of such optimizations over the years so last but not least many of the functions we talked about today have been uh like a blocked by the CBI team so I put provided some of the links so if you look around some of the behaviors or the performance issues I alluded to today but without getting into concrete examples and and you can find many of those examples from the blog post and by the cob team and also by other folks by the way and but I just only listed a few uh links from the cob it and I just made sure to drop all of those in the the chat as well so everybody has those links um and I'll I'll update the description for for the video so those will be listed in some of the additional resources for people to check out yes okay yeah that that conclude uh my presentation today any any questions this yeah if people have any remaining questions that we haven't gotten or gone through uh throughout this so feel free to drop them in again but otherwise uh this is each time you're run it's always an informative and very deep dive session um and I even come out learning quite a few new things about some of the the nuances between these because you know I think a decent number of us who are intermediate with Dax we know at least from whatever sqlbi is blogged about with the um the summarize and add columns but just like how some of those evaluate between that the group by function when and where to use those but for sure the biggest and most exciting thing is the fact that you mentioned that at some point in 2024 once um all the backend stuff gets fixed as we will finally get a chance to use summarize columns and measures and um I know anybody who cuts their teeth with Dax at all is going to be very happy to start using that as a common pattern um in Dax so that will be absolutely fantastic I'm very excited for that yeah I'm excited for that too sometimes when I'm writing measures I just find that without the summarized colies it's it's just like a such a pain to write a certain very basic things yeah yeah the a few extra steps and and layers that we have actually I do have a question from from Greg Kramer so uh how much are you using GitHub co-pilot to create code in your daily work and that's a very interesting question uh the githa the co-pilot the uh have different M on different programming languages and uh it's it's excellent in Python I would say okay so I found that it's a invaluable companion when I'm do python uh when I'm writing python code it's just great at it for some other programming language is is not quite there yet yeah so I do I a lot yeah the the big the big ones is helpful for sure but the the one cu I've used some AI assisted stuff co-pilots and others for Dax and all those but the problem is unless it can actually connect to the model and you get the you get the the um the auto complete for a lot of the fields it's harder to ride blindly for you know I I've tried a few um code code assists for power uh not for power query but just for Dax and the that that one downside is it just it won't do an autocomplete for your fields and it's that slows me down enough that I i' I've not used those but uh there have been people who have done some free plugins with chat TBT to help write uh Dax in Visual Studio but it it doesn't get the rest of the the model context like whether not a relationship as stuff go ahead yeah I I I agree I I I I believe chat gbt is not there yet uh for Dax development well to be honest the Dax is quite new it's changing like a in the recent days right recent years so clearly their training cannot be uh so up toate compared to many many of the other programming languag and more importantly the amount of uh literature about Dax on the Internet is nowhere com nowhere close to uh how uh the other programming language like there's so many code bases out there open source code bases so so they're just not nearly enough training materials so this actually is one of the research projects I'm doing nowadays I'm I'm looking into like a because internally we can build up many uh of the training literatures to say if we can uh enhance the existing larger language models to to help on the Dex development but but just by looking at my personal experience like using uh Coop like the python co-pilot I would I I I I I'm a firm believer that this is definitely doable for Dax so this will be the automa like a develop helper that python is a substantially more complex than Dax and it can do a very decent job in Python so there's no reason why I cannot do the same thing for Dax yeah I mean i' I've had a couple people on this stream that have demoed some of that like code assist that's built into fabric for spark and all those others and I can imagine someday Microsoft is and with you and your team or whomever will will eventually have something in powerb desktop or there'll be a similar little if you're writing a measure there something will will get better at that um the quick measure type stuff but it will just I see you're trying to do a pattern maybe instead of using uh having a measure inside of summarized columns perhaps you should use add columns instead and you know it's just going to help you Auto optim as you WR so I I'm hoping in the next couple of years we'll we'll start to get that and just help to Auto educate people as they're they're writing the stuff um you know just just on the fly yeah yeah that's definitely uh one of the very interesting uh things that that everybody's looking forward to in the next couple of years for sure yeah exactly yeah it's very very enticing yes um with that I I don't think I see any other questions other than just a lot of uh accolades and appreciation uh from from the chat on a really great session today and some very exciting announcements so I just want to say thank you for taking time out of your Friday and uh coming on for this this has been fantastic and I very much look forward to the next time we get some um additional Dax announcements down the road and having an opportunity to bring you back on again so yeah thank you for for taking time out of this and coming on today Jeffrey thank you again for having me one more time on your show it's always a pleasure and uh Happy New Year Happy New Year to everyone uh that uh uh is is on the on the session listen listening exactly yeah everyone enjoy your new year um and then I will see everyone in 2024 yes absolutely all right take care thank you so much for watching please consider hitting that like And subscribe button and if you want to help support this channel take a look at our Channel memberships or our merchandise store for cool swag and last but not least please consider sharing this video on your social media platform of always to help our Channel grow so until next time
Info
Channel: Havens Consulting
Views: 4,756
Rating: undefined out of 5
Keywords: Power BI, PowerBI, PBI, DAX, Data Modeling, Visualizations, Tips & Tricks, Power Platform, Power Query, Power BI for Beginners, Power BI Training, Power BI Desktop, Power BI Best Practices, Power BI Relationships, Power BI Dashboard, Power BI Tutorial, Power Query Excel, Power BI Versus Excel, Power Query Tutorial, Power Query Functions, Power Query Parameters, Power Query Editor, Power BI Service
Id: bCcJorCKw8k
Channel Id: undefined
Length: 83min 56sec (5036 seconds)
Published: Sat Dec 30 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.