Propagating filters using TREATAS in DAX

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Ciao, friends. In this video, I want to analyze with you different ways of moving a filter from one table to another table. Whenever you want to apply a filter that starts from one table to a different table, there are multiple ways of obtaining the same goal. A relationship, INTERSECT, TREATAS, CONTAINS. We will see all of them and as it often happens, because we have multiple choices, we will analyze performance in order to be able to make the right choice when you need to implement it in your model. Let's get started. The first thing is understanding what it means moving a filter from one table to another table. Even though it looks like a rather technical topic, it is actually happening all the times when you build your report. Let's take a look at that. Here I am slicing by brand and I'm showing the value of the sales amount. Brand is in the product table, sales amount is in the sales table. So in order to compute the value, the engine needs to move the filter that is coming from here from the product table, and move the filter from product to sales detail in order to compute the value. So moving a filter is something that happens all the time in your reports. And the usual method of moving a filter is to use a relationship as we are doing here. So the filter from product goes to sales detail. Sometimes, you don't have the option of using a relationship and therefore, you need to use DAX in order to move a filter. Let's see that with an example. I have a promotion table here. Now, the promotion table contains, let's zoom in a bit, the promotion code, the name, the discount and the promotion is applied in a given year to a category of product. So it's filtering a year from date and the category from product. And it would be great to be able to build relationships between these tables. So from promotion to date and then from promotion to product. But if I do that, or if I were able to do that, I would create an ambiguous model because the filter from promotion to sales detail has multiple parts. One goes from promotion to date, sales header and finally to sales detail and the other one goes from promotion to product to sales detail. Therefore, I cannot create the relationships the way I'd like. And whenever you cannot create a physical relationship, you revert to what we call virtual relationships. A virtual relationship is nothing but a piece of DAX code that moves the filter in a CALCULATE statement in order to obtain the same behavior that you would have obtained by creating a physical relationship. Of course, speed is no longer so cool but if you need it, it's a technique that is worth investigating. Let's see how you can do that. We leave for, well, we basically leave now Power BI because we are going to write all the code just using DAX Studio. So I just open DAX Studio. Let's go there. And we start writing a simple query that does EVALUATE, SUMMARIZECOLUMNS, promotion, let's take the promotion. And then sales, which is just the sales amount. Now, if I run the query right now, you see that the result is the same number everywhere. I will need a lot more space. It's the same number everywhere because we are missing the relationship. So if we want to make the relationship work, we need to move the filter from promotion to the sales table. Let's do that by defining a new measure. Let me define a new measure, sales, and we call this promotion sales. The first version will use TREATAS. As I said, what we need to do is take the selection on the promotion table and create a table that filters the sales table. Now, in order to filter the sales table, we have basically two ways. We can go from promotion to date and then from promotion to product, relying on the physical relationship that then will move the filter from date to sales detail and from product to sales detail. Or we can go straight from promotion to sales detail if in sales detail we have suitable columns that work for us. For now, we just go from promotion to date and from promotion to product because we are not interested in performance at all. So what I need to do is use CALCULATE. Compute the sales amount and here, I need to build a table that contains the values of the promotion year and the promotion category. So I can use SUMMARIZE promotion by promotion, promotion category and promotion year. And here, instead of using sales amount, we use promotion sales TREATAS. And let's also provide the right name. If I run the query now, I get an error. Why that? Cannot find, oh, yeah, yeah, it's not sales because that is sales detail. Sales detail. Okay, if I run the query, right now nothing changed. I have the very same value. Why that? Well, the reason is this SUMMARIZE returns columns in the promotion table. So the result is a table that is able to filter this area of the data model. I need to move the filter from here to date and to product. So to do that, I'm going to change the lineage of my table and say well, the year is actually a date year and the category is actually a product category. To obtain that, I need to use TREATAS to change the lineage of my table and say the first column is a product category. And oh, no way. The second column is a date calendar year number. Now, with TREATAS, the filter will go from promotion to date and product and finally, to sales. If I run the query now, you see that I have a correct value, which is the amount of sales in the years where the category was actually in the promotion. So this is a first way of moving the filter. There are other ways of obtaining the same goal. One is by using INTERSECT and the other one is by using CONTAINS. Let's start by looking at INTERSECT. We can write, yes, a new measure. Let's use create a new measure. A lot of the code will be the same, so it's worth copying it. But this time, we use INTERSECT. INTERSECT, again, we use it to change the lineage. We compute sales amount. We still take the promotion category and the promotion year. So this is the table containing the category and the year but we intersect it with a table that we create on the fly with the correct lineage. So we are going to use INTERSECT. And we create the CROSSJOIN of ALL product category. This returns all the product categories with the lineage of the product category. And ALL date calendar year number. Now, this part of the query is returning the full CROSSJOIN of category and year numbers with the lineage of product category and calendar year number. It has all the possible combinations. But then I'm intersecting it with the combinations which are visible in the promotion table. So the results are only the values that appear in both tables. Therefore, it basically contains the same result as this SUMMARIZE but with the correct lineage. Once I apply that with CALCULATE to the model, it computes the value. That is promo sales INTERSECT. So let's return that in the query. INTERSECT. And if I did everything correctly, now you see that I have the same result here and here. They are two different techniques. They obtain the very same result, even though performance-wise, there is a big difference but we will talk about performance later. There is a third technique that is worth learning, which is using CONTAINS. CONTAINS uses the same methodology as INTERSECT, it's just a different syntax. To use CONTAINS, we will iterate the table containing the CROSSJOIN of years and categories and for each of the pairs, we check if it appears in the currently visible promotion table. Let's see that. Again we copy the measure because a lot of the code will be very similar. And that we call CONTAINS. So we are no longer going to use INTERSECT. Instead, we are using FILTER. FILTER will iterate the CROSSJOIN of category and year number and for each of these pairs, it checks if CONTAINS, if the promotion table contains a row where the promotion category is equal to the product category, the currently iterated value of the product category. And the promotion year is equal to the date calendar year number. This CONTAINS basically checks if the promotion table contains, if, yes, the promotion table contains the currently iterated category and year number. It returns either true or false and then FILTER removes all the rows that do not appear in the promotion table. This is CONTAINS, no, we do that here. We add a third row. This time, it's using CONTAINS. And again, if I did everything correctly, you see that I have the same number here, here and here. There are no differences at all. Three different techniques to obtain the same goal. Now, so far, we used the promotion table that is really tiny. It contains a few tens of rows. So performance is never a problem. If we were to evaluate the performance, you will see that it's extremely fast. I need to expand that again. Let's enable server timings. And then we run the query. Let's look at server timings. The full query evaluating the three measures is running in 93 milliseconds. So it is so fast that it is nearly impossible to evaluate the difference between the three different techniques. But there is, indeed, a very big difference between the three different techniques and in order to appreciate the difference, we are going to use the same model but a different table from where to start moving the filter. Let's start by looking at the model. Moving from promotion to details, as I said there are too few rows. That is the reason why we have the sales header and the sales detail table. This is not a best practice having a header, detail table but I used it in this model because I wanted to have a bad model useful to evaluate performance. Indeed, the sales header table contains around 1.6 million rows. And the sales detail table contains around 12 million rows. So the cardinality of the relationship between sales header and sales detail here is around 1.6 million rows, which is quite large. And I will write measures that move the filter from header to detail, using the relationship, which is the fastest and the easiest way to move a filter from one table to another and then the three different techniques we have seen so far. In order to do that, I need to disable this relationship because I won't enable it on demand and I want to keep it disabled, otherwise the engine will use the relationship instead of using my code and that's not what I want. I want to write the code by hand. Okay, let's go back to DAX Studio. This time, I'm not going to write the full code because it would take forever to write it. I already have everything ready here. So let's just quickly look at it. The first MEASURE sales header amount RELATIONSHIP computes the sales amount moving the filter from sales header to sales detail, using the relationship. USERELATIONSHIP reactivates the relationship that I deactivated a few seconds ago. Amount TREATAS uses TREATAS to move the filter. Amount INTERSECT uses INTERSECT and amount CONTAINS uses the CONTAINS technique. So I have the three different techniques plus the relationship. And then my SUMMARIZECOLUMNS slices by calendar year and it's worth to note that calendar year is here, so the filter starts from date and it will reach detail through header. So date filters header and then from header, I move the filter to detail, using different techniques. Let's now run the queries one by one and let's see the result. First, let's look at amount RELATIONSHIP. We already have the server timing here. I can just run the query. And it takes 31 milliseconds to move a relationship that is 1.6 million rows against a table that is 12 million rows. So it's pretty fast and as I told you, it's the best way to obtain this goal. What about using TREATAS? We comment that and we remove the comment from here. If I run the query using TREATAS, as you are going to see, performance are very different. Several orders of magnitude of difference. It takes 5.5 seconds. So we move from a bunch of milliseconds to 5,000 milliseconds. There is a huge difference. Moving a filter with DAX is way slower than moving a filter using a relationship. That is why creating header, detail tables or creating models where you need to move the relationship by DAX is not a best practice because performance wise, you have an issue. But that was TREATAS. What about the other ones? Let's look at INTERSECT. INTERSECT is using the intersection, so it's using a different functionality. It's no longer TREATAS. TREATAS was 5.5 seconds. This is already running since 10 seconds and it's gonna take 12.8 seconds. 13 seconds. So TREATAS is much slower than a relationship but it is faster than using INTERSECT. What about CONTAINS? I honestly don't remember but I would guess that CONTAINS is the slowest one. Let's see that. We have INTERSECT, 13 seconds. Here we are already seven, eight. No, it's actually not that bad. It's in the middle between INTERSECT and TREATAS. So it's 7.9 seconds. With that said, there is still a huge difference. When you need to move a filter between two tables, using a physical relationship is always the best way. You revert to DAX only if you really need and if that is the case, of course you need to choose the right way. So as you have seen, you have multiple ways of moving a filter. The best, a physical relationship. Second, TREATAS. TREATAS is quite well optimized. It's way slower than a physical relationship but in some scenario might be acceptable. And an important detail, if your table is small, you don't have to worry at all because at that point, performance will always be good, no matter what. You worry about performance when the table gets larger and larger. So TREATAS, quite slow. CONTAINS was not that bad but slower than TREATAS. And INTERSECT, the slowest one. With all that said, beware that your experience might be very different. Whenever it's time to evaluate performance of DAX measures, you need to do them on your model. So the real takeaway of this entire video is the technique used to evaluate different measures, having different options, you evaluate them all and then you choose, depending on which one is the best in your specific model. Enjoy DAX.
Info
Channel: SQLBI
Views: 15,757
Rating: undefined out of 5
Keywords: DAX, PowerBI, Power BI, CONT, C0115, Alberto Ferrari
Id: 6WU7Ze32Q3w
Channel Id: undefined
Length: 19min 17sec (1157 seconds)
Published: Tue Feb 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.