Using OR conditions between slicers in DAX

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Ciao, friends. In this video, I want to show you how to create a calculation that uses an OR condition instead of an AND condition with slicers. Before we dive into the details, let's review together the requirements and the calculation that we want to obtain. If you look at this report, we have two slicers. One which is filtering the occupation, only professional and the other one that is filtering the brand, Contoso. Now, these two slicers are working together on the report. And the number you see here, 204, represents the number of products sold that are both of the brand Contoso and they have been sold to professionals. So the condition we are using is professional, professional. And Contoso. And there is an AND condition between the two. This is the default behavior of any slicer. What we want to obtain is a different calculation. We want to build this calculation, the units or. Units or you see shows a much larger number and the reason is units or, instead of using an AND condition, uses an OR condition. So we are showing the products, which are sold to professional or that they are Contoso. Now, this is not the default behavior of slicers. So in order to obtain this goal, what we will need to do is, write some DAX code that gets rid of the default behavior of Power BI and replaces it with our own logic. Before we go farther with the video, a big warning for everybody. In this video, we are going to show all the details of how to write the DAX code. So if you are in a hurry and you are just interested in looking at the code and implement it in your model, well, you can just go to the SQLBI website, download the PBIX file, look at the code and you are just happy. If you are interested instead in seeing and learning the whole process of writing the different versions of the same code and how we obtain the last piece of code, then follow me and we will start from scratch, build a calculation, do a lot of reasoning in the meantime. We will see good solutions, bad solution, intermediate solution, a lot of DAX code. Ready? Let's get started. Let's start by writing the first solution. The first solution is easy to understand. It uses SUMX, iterators, and comes with a lot of limitations. But it's easy to understand and it's the first step in obtaining the final version. So let's get started. We need to build a new measure. I already called it or number one and what we need to do is first of all, retrieve the values of the occupation, retrieve the values of the brands because we will use that in our conditions and then we iterate over sales in order to compute our value. So we start saving the brands and we take VALUES of product brand. That is the slicer coming with the brands and then we also need the occupations. Occupations. VALUES that is in customer. Customer occupation. Then we need to go for the RETURN. As I said, we are going to iterate over sales and retrieve all the sales that either have one brand that is in the selection of brands or an occupation that is in the current occupation. So we do a SUMX over sales of quantity. This just returns the quantity but we want to filter sales where I need to access the customer occupation. So I need RELATED customer occupation IN occupations. And I need an OR condition. So OR customer occupation IN occupations or the product brand IN the brands. So I scan sales and I retrieve all the rows where either the customer occupation is in the slicer occupation or the brand is in the brand. That should do the trick. Let's give it a try. I need to take the measure, drop it in the report, get rid of this and it doesn't work. It doesn't work, it shows the very same value as the AND version, and the reason is, I'm iterating over sales, so this reference of sales is still working under the filter context coming from the report. I need to get rid of it. I could use ALL sales. Let's do a first try. If I use ALL sales, now the number is wrong, because by using ALL sales, I also got rid of the filter from the year and any other filter from the report. So ALL sales is a bit too much. What I want to do is get rid of the filter from occupation and brand, which is coming from the slicers and keep any other filter. So instead of using ALL sales, what I can do is embed this filter into a CALCULATETABLE that gets rid of the filter from the brands, ALL customer or the occupation first ALL customer occupation and ALL product brand. So these two ALL are needed in order to get rid of the filters coming from the slicer, which I will then replace with my OR condition that appears here. And if I did it correctly, now I see values and these values should be the correct one. Let me just check that they are indeed. Yes, they are the same value that I have here. So we have a starting point. Now, this formula works just fine and there is nothing bad about the formula in itself but it has a very strong limitation. Let's look at it into more detail. In order to compute the sum of quantity, which was my primary goal, I basically had to embed my condition by filtering inside this CALCULATETABLE and this CALCULATETABLE is inside an iterator. So I had to rewrite sum of sales quantity in order to push the filter inside it. It works, there is nothing bad with the formula but the strongest limitation is that it works with sum of sales quantity. What if I want to apply the same logic to another measure, for example? I have another measure, which is the DISTINCTCOUNT of sales product key. If I try to obtain the OR with a DISTINCTCOUNT or the OR with any measure that cannot be aggregated with a simple sum, this pattern, this way of writing code, doesn't help me. It will not go very far. So I need something different. And that something different is to replace the iteration, this SUMX with a filter. Think of that. If I take this CALCULATETABLE and instead of iterating over it, I use it as a filter, then I will be able to use whatever measure because the filter will be applied to the model and any measure will work just fine. So another way of obtaining the same result is to copy this entire code and write a new version of the same code. Let me paste everything and call it or number two. With or number two, we do not want to use SUMX but we want to keep everything else. So this CALCULATETABLE is fine but I will not use it to iterate. I will use it as a filter. So I add a CALCULATE around it. And I compute number of units, which happens to be exactly the sum of sales quantity and this entire CALCULATETABLE now works as a filter, which is added to the filter context, replaces the entire filter on sales with the new condition. This version is very close to the previous one. It has its own drawback but first of all, let's check that it actually works. We can put or number two here. The number produced is the very same. What is the problem, the main problem of this solution? Well, the main problem is this filter over sales. I'm filtering the entire sales table and when I mean the entire sales table, I mean the entire sales expanded table that is extremely large. So the optimizer might be able to work on my code and product something good but it is very likely that it will not be able to optimize it the best way and so my code will be suboptimal. Suboptimal basically means slow. Filtering sales, filtering an entire table is quite always a bad thing. It would be better to restrict the filter to only the columns which are actually needed for my model. And if you look carefully at this code, I actually don't need the entire sales table. What I need is a subset of the sales table that only contains the customer occupation and the product brand. So I can replace this filter sales with a filter that retrieves only the brand and the occupation. Now, brand and occupation, they belong to two different tables. So it will not be easy. I need to build a table that contains the occupation and the brand but I can do that. Instead of filtering sales, I can use SUMMARIZE. And I use SUMMARIZE to group it by customer occupation and product brand. Now, this SUMMARIZE returns only occupation and brand and by the way, I just noticed that I'm creating a new, I'm modifying the measure. What I wanted to do was creating a new measure. So I just copy the entire code in a new measure. This is the same code as before, but I need to call it or number three. So I SUMMARIZE sales by occupation and brand so that the result of this SUMMARIZE only contains two columns, occupation and brand and I can still work with my condition. Everything is embedded inside CALCULATETABLE. So this code is very close to the previous one, apart from the fact that it doesn't work. Yes, because I'm still using RELATED, which is no longer needed. Neither here nor here. Earlier, I was iterating over sales, so I needed RELATED to go through relationship. Now I'm iterating over a temporary table that I built on the fly. So there is no need to use RELATED. Now, this code, let me reduce the font a bit because the code start to be longer, this code should produce the very same result. We can get rid of or number two, put or number three. And it no longer has the drawback of iterating over sales. This is or number two, or number three uses SUMMARIZE. Again, this code is not perfect. I mean, not perfect doesn't mean it's bad but it has one drawback. It requires one scan of the sales table, which is happening here. The group sales by occupation and brand. And another scan of the sales table to compute the units. So at least I will need to scan the sales table twice. Once to build the temporary table containing occupation and brand and a second time to compute my units. This might take a lot of time depending, of course, on the size of the table. I can actually optimize it or at least, change the way the code is computed because this SUMMARIZE scans the sales table but I do that because I want occupation and brand. What I could do is different because what I want to do is replace the filter on occupation and brand, so I do a SUMMARIZE inside a filter over ALL. I can actually build a temporary table that contains the cross join of occupation and brand and then use that instead of this SUMMARIZE. This will save me both the CALCULATETABLE and the SUMMARIZE with the scan of the fact table. So I can copy again everything and build a new measure. It is or number four. Instead of iterating over SUMMARIZE, we iterate over ALL customer occupation. Actually the CROSSJOIN of customer occupation and product brand. Missing a comma here. And now that I have put ALL here and here, I can actually get rid of the entire CALCULATETABLE because I no longer need to get rid of filter, I'm just ignoring them using the CROSSJOIN. Now, this code is shorter than earlier, so shorter is typically good. I don't know if it's gonna be faster or not but at least it's a different way of writing the same expression. It is or number four and it might work. Let's give it a try. Let's get rid of or number three, or number four. You see it still produces the same result. So we moved from the first version that uses SUMX, bad, because using SUMX I have to rewrite sales quantity. And we created a second version that removes the usage of SUMX and instead it uses CALCULATE with the table iterated as a filter. Then we use SUMMARIZE in order to reduce the number of columns which are used in the filter in order to speed up the calculation and make the code better. And then we finally ended up with this version, which is shorter. It has the advantage of working with a smaller table. And it's kind of a good solution. But we can actually do something different. I don't know if better or worse, but instead of using this version of the code that embeds the OR condition, you see that we are basically building the table that is used as a filter and we can build a table also in a different way. To understand that, let's go back to the drawing and let's reason a bit on top of it because what we want to obtain is professional or Contoso. So let me clear some of this stuff. Now, how can I obtain professional or Contoso as a table, as a set just using set functions? Well, actually, if you think about that, you can think of the problem in a different way. When occupation is professional, let me clear everything. It's probably easier if we just write it down. Okay. When the filter on the occupation contains professional, any value for the brand is fine. So my condition can be occupation equal professional. And if the occupation equal professional is true, then regarding the brand, I can use any value. So if the occupation contains professional, then the brand can have any value, it doesn't matter. And at the same time, if the brand contains Contoso, so if the brand equals Contoso, then regarding the occupation, I can use again any value because if the brand is Contoso, any of the value for the occupation are good. I can create this set, the one containing occupation, professional and brand any by creating a CROSSJOIN that contains the selected brand, cross joined with any occupation. And I can build this other set, this one that contains the brand Contoso and any occupation by creating another CROSSJOIN. And finally, I can union the two together. Doing it this way, I can build the table in a single step, using table operations. Let's do that together. What we want to do is build the table with a simpler step. Let me copy again everything and we build a fifth version of the same code, which only uses sets. We no longer need those variables. And what we need to do is cross join not ALL customer occupation but VALUES of customer occupation. This is our occupation that was professional. Cross joined with all the brands. Then we compute the CROSSJOIN of ALL the occupation with the VALUES of the brand. The first CROSSJOIN builds professional, any brand. The second CROSSJOIN builds any occupation, Contoso. And then we union them together. This code is the same, produces the same set as the previous one but only uses set operation. This is or number five. Let's give it a try. We can get rid of or, put five. Again, we still have the same result. So you see that we have created several versions of the same code and we ended up with this version that just uses set operations. Whether this is the best solution or not, well, it really depends. It's hard to tell whether this solution is better than the other one. If we want to take a decision and we want to understand which one is better, what we need to do is work a bit more and do some further analysis. So let's draw some conclusions from all the trip that we did through this DAX code. We started from a first solution that uses iterator with a lot of limitations. Then we moved to more set-based operation using CALCULATE and the table as a filter and then we start refining the same code again and again until we ended up with a version that is very elegant because it's short, it only uses set operators. But the real question is which one is the best? As of now, it's a matter of personal taste because apart from the first version that clearly had limitations, all the other ones, they kind of work and choosing the best one takes time. So if we want to understand what is the best one, we will need to start a different trip and understand the characteristics and the features of all the measures in order to choose the best one, depending on the model, the size, the number of brands, the values of the slicer. But that would take a lot longer and it's probably the content for another video. For now, go to SQLBI.com download the file, start to play with the code and maybe, you come out with a different version that you can share in the comments. Enjoy, DAX.
Info
Channel: SQLBI
Views: 16,003
Rating: undefined out of 5
Keywords: DAX, PowerBI, Power BI, CONT, C0109, Alberto Ferrari
Id: l5JX3G5Ntzk
Channel Id: undefined
Length: 22min 42sec (1362 seconds)
Published: Thu Dec 03 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.