Analyzing a slow report query in DAX Studio

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Hi friends. In this video I want to show you how to analyze a slow query in a report. In this example we have many visuals and with Performance Analyzer I can capture the information for the execution time of each visual. This way I will capture the slowest one so I start recording and I click on refresh visuals. At this point an important thing to do is highlighting what is the query that has the biggest execution time because it's what is causing the slowest visual in the report. And in order to retrieve this I have to sort by DAX query time and I have to sort by in a descending way. Doing that, I know that this matrix is what is the slowest DAX query in the entire report. And of course there is also other time to complete the rendering and other but now we want to ignore it because our focus is we want to test the execution time of this DAX query and we want to possibly to optimize the code. So in order to do that I can copy the query here and once I have copied the query in the clipboard I can paste the query in DAX Studio. I connect DAX Studio to Power BI Desktop and now when I paste the code here, this is the very same query executed by the visual that is displaying the matrix in Power BI Desktop. At this point you can click run and you run this query in DAX Studio. You can see that this query shows the result which is the first thing you can see in this pane but another important consideration is how much time we wait for executing the query. This important information is contained here in the output. You can see here that this query has been executed in 340 milliseconds. First of all we have to execute the same query multiple times in order to make sure that we are not measuring some noise caused by some other program running on my computer for example. So in order do that we can click run multiple times. One two and now I go back to output and I can see that the duration of the query is between 275 and 340 milliseconds. At this point I can also consider, wait a minute, am I executing the query in the worst possible condition? In order to do that, we have to make sure we don't have any data in the cache and this can be obtained, and it is a best practice when you analyze the query in DAX Studio, to run the query checking this option which is Clear Cache and then run. So with a single click, each click will clear the cache and immediately after execute the query again. Let's try to do this another three times and see what happens, so one, two, and three. Once we complete this and we go back to the output, we can see that this time, the execution time is considerably slower. We are at one second and a half, 1,500 milliseconds, 1,577 and 515. So you see that executing the query in the worst possible condition also stabilize better the execution time. Which is important when we want to analyze the performance. At this stage we know that this query is considerably slow especially when it is executed with a cold cache. So the next thing to do is gathering more information about the query plan and the execution timings. These two options, query plan and server timings are what I want to use in order to capture this information. And so I click on query plan and I click on server timings as soon as this operation completes and when you see here in the output, query trace started after you clicked on the button, you can start your next execution with run. Now as you see I don't look too much at the result, at what I get in the result, I'm more interested into the execution time in the output. But at this point you see that we have other two panes here, query plan and server timings that contains additional information about my execution. So let's see the query plan. The query plan is the actual query plan of the query. We have a physical query plan here and a logical query plan here. As you can see, this information is relatively complex. If I scroll down just in the physical query plan which is probably the more interesting part of the entire of the two query plans we have available. It's very long and believe me, I don't use this information as the first way to optimize the query. If I can I try to move to the other pane server timings here and server timings has a number of information that are related to the execution time of the query with some number that is important to know. So let me describe what we have here. Total is the total execution time, how much time we wait. This is the same amount of time we have seen in the output window but this time it's split between formula engine and storage engine. And the formula engine is the time spent waiting for the formula engine and the SE is the time spent waiting for the storage engine. So the sum of these two numbers correspond to the total. Now usually a DAX expression should be evaluated as much as possible in the storage engine and then in the formula engine because in theory the storage engine is faster than the formula engine but these details are something that you need to understand looking at how the VertiPaq engine works. There are other videos and other courses for that. Now let's just focus at the very high level about what is the problem of this query. Once I realize I have a problem between formula engine and storage engine I can also look at the other information like the number of storage engine queries, which is the number of requests made by the formula engine towards the storage engine. And here you have the details about each single storage engine request made by the formula engine. So you see that we have already a complex situation, a complex query plan and a lot of information about the storage engine. So let me spend some time looking at the query that we are executing. Now if you look at this query, you can notice that this is some DAX code that you might not have seen before because this is the syntax of a query in DAX, whereas you might be used to writing DAX measures. And what is the DAX measure used here? Actually I know that this is customers, the customers measure that we have in the report, but the easiest way to find this information is going back to DAX Studio, clicking on the matrix here. This click highlights this visualization and because I selected this visualization I have here in values, customers, which is the only measure I have. What can I do at this point? I can include this measure here in my query by searching for the measure Sales Amount here. You see that I can click here, I can click on sales amount in this list and I can right-click and click on Define Measure. When I do this, the definition of the measure is included in my DAX query and I can change, for example, I can format the query so that it is more readable. This is a good idea. And once I have this I can take a look at the sales amount. I should have used customers so let me do this again. Customers is the measure I want to include. I right-click here and I include the definition of customers. I click on Format Query again and now I can focus my attention on this definition of the customers measure. Why I want to have the definition of the measure here? 'Cause now if I want to test the behavior of the measure, changing something I will just change the code here, I click run again and I test the same query with a different version of my measure customers. And this time I can tell you this measure is easy to optimize. Why? This query is violating one of the best practices in DAX. Filter columns don't filter tables in CALCULATE. And what are we doing here? FILTER sales. So I can tell you this is an error following the best practices in DAX and the right thing to do is to filter just the column Quantity greater than zero. But in order to not change the semantics of this query, I have to make sure because I was filtering a table before, I have to include this condition in a KEEPFILTERS. This way I'm sure that I will obtain the same result, but in case you're not, pay attention before executing the query, you should go in the results pane. You should take a look at the results of the numbers you see here and in the next execution we have to make sure we have the same numbers. Let me try. I click run now, remember I am clearing the cache and then run, and I have an error here because I probably forgot something. Yes because I removed the filter before and I still kept the ending bracket of the FILTER so now I fixed the code, I run this again and if I go to results, you see that I have the same number we have seen before. You can go back to view the video to check the numbers are the same but believe me, they are the same. Now if I go to the server timings I discover that all of a sudden, my query has been executed in 12 milliseconds instead of one second and a half. And why this? Because my best practice application to the formula, reduced the time for the storage engine, for the formula engine, the number of queries and the overall execution time is now down to 10 milliseconds. So what should I do at this point? I just have to select this measure here in DAX Studio, copy and paste this measure as the new version of customers, of the customers measures in Power BI. Enjoy DAX!
Info
Channel: SQLBI
Views: 14,770
Rating: undefined out of 5
Keywords: DAX, PowerBI, Power BI, Marco Russo
Id: C5HBhlLUFsE
Channel Id: undefined
Length: 10min 33sec (633 seconds)
Published: Thu Jul 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.