- Ciao, friends. This video explains the differences between the formula engine
and the storage engine, which are part of the DAX engine included in Power BI and Analysis Services. Every time you have a Power BI report, your DAX queries are
executed by these engines and understanding the differences and how they collaborate is important to understand how to read
the performance metrics and to optimize your DAX expressions. We will start with a quick analysis of how the engines work together with some practical example and then we will see demos using Power BI and DAX Studio. So first of all, let's
see the architecture. When we have a DAX query
generated by the report, by the visual in your report, the query is received
by the formula engine. The formula engine is the first stage of the execution, which
generates the query plan, organizes how to execute the query and retrieve the data from the storage where the data is stored. The data can be stored
in memory, VertiPaq, or in DirectQuery, for
example, in SQL Server. The storage engine is a separate engine that provides the engine
to the storage engine to execute the query. And during the remaining
part of the video, we will understand this better but, before starting, let's take
a look at a practical example that shows you how the two
engines collaborate together. Let's say I want to
know the maximum number of CALCULATE words I have in
a single page of this book. How can I do that? I can read the book cover to cover. In each page, I look for the presence of the CALCULATE word and I count how many CALCULATE
words I have in each page. In order to help myself, I can use here a small sheet where I write down for each page, the number of CALCULATE words. So let's start. So I go to a page here, I open a page and I have to repeat this for every page and I scan the content. One, two, three, four, five, six, seven, eight. So this page, 153, 153, has eight CALCULATE words. And I continue to do the scan in every page of the book. So for example, here, page 155 has one, two, three, and four. So we have four words here. And I continue, I just
keep the page on this side because it is not visible in the camera but I repeat this over and
over for all the pages. One, two, three, four, five. Here we have five and we are at page 157. Now, once I complete my list, imagine I continue to do this
for all the pages of the book, I no longer need the book at this point, I can remove the book from here and I complete the job, finding the maximum value
here in this column. So this is an intermediate step in order to complete the
calculation that I write here, writing the final result
of the calculation, which in my case is the maximum value of the number of CALCULATE
words in a single page, which is eight. So let's recap what I did. I started from scanning the book. And for each page of this book, I wrote down an intermediate
step of the calculation here. Once I completed this intermediate step, I had to do a second
step in my calculation, which was scanning the result of the previous step and
finding the maximum number. Now, this is a representation of what the DAX engine
does when we have a query. The initial request is,
"give me the maximum value, "sorry, the maximum number of pages for, "the maximum number of CALCULATE words "in a single page of the book." And this produced this result. This is what the formula engine does. The yellow sheet here
represents the formula engine. The formula engine can do the calculation but has to access data
from the physical world, from the place where we have the data. But the formula engine
doesn't have a direct access. The formula engine has to request the data to a second layer, which
is the storage engine. Only the storage engine
has physical access to our book, to our data. And the result computed
by the storage engine is consumed by the formula engine. Why do we have this architecture? Because with this architecture, we can change the storage engine with a different one that has access to different
type of physical devices. In our case, we could have not something like a physical book but
something completely different. In the case of the DAX engine, we have a storage engine, which stores data in memory compressed but we have also DirectQuery that can store the data, keeping the data in SQL Server. It doesn't matter what
the storage engine does, from the point of view
of the formula engine, it is just an intermediate
step of the calculation and the formula engine
performs the remaining part of the calculation. The formula engine has the
more advanced calculations, whereas the storage engine
usually has a limited set of calculations that can implement and compute and the result
of the storage engine is consumed by the formula engine as we will see in the next demos. It is now time to see the formula engine and the storage engine in action. We will see how a Power BI
report generates a DAX query and how this DAX query is executed by the formula engine
and the storage engine. The first example is a Power BI report with a DirectQuery model. We have a single visual in this report that shows two measures, the sales amount and the
margin percentage by month. So this visual has to
retrieve one data point for each month, and this is obtained generating a single DAX query. We can capture the query using Performance analyzer and we can analyze the query that we we will obtain with DAX Studio. So as soon as this execution completes, I click on copy query. The query is copied in the clipboard and I can paste the query in DAX Studio to see the content of the DAX code that has been executed. Now, this code actually
includes also the SQL query. This is a DirectQuery model, we don't want to see the
SQL query at the moment, we just want to execute
the DAX query here. And I click Run so that I see what is the
result received by Power BI to populate the visual. And the result is this table that has three columns and one row for each month for which we have data. You see that there is a column with the end of the month for each month. We have a column with the
total of the sales amount and the value for the margin percentage. Now, this is the result that Power BI receives from the DAX engine but what happens during the
execution of this query? We can enable the Query Plan and we can also enable
the Server Timings pane, which provide me information about the actual execution of the query. Let me run this query again. And here, in the Query Plan, I will see a detailed list of the steps performed
by the formula engine to execute, to compute the
result that we received. Now, the formula engine, remember the formula engine doesn't have a way to directly access the data
from the SQL Server database, in this case, we have a DirectQuery. So looking at the content
of this query plan, we can see that there are several points where we see direct query result. Direct query result and we
have many of these points here. Now, the query plan describes the steps, the calculations performed
by the formula engine. But when the formula engine needs to retrieve the raw data from the storage, it has to send a query
to the storage engine. In this case, because we
are using DirectQuery, direct query result
represents in the query plan, a point where the formula
engine has to consume data coming from the storage engine. And what is the request
made to the storage engine? In server timings, we can see here a list of requests sent to the storage engine from the formula engine. In this case, we have only one, even though we have multiple nodes in the query plan we have seen before. But let's take a look at this query. I copy the code here, the SQL code in this
case, the SQL request sent to SQL Server and this time, I paste this code in SQL Server Management Studio, which is an environment
similar to DAX Studio to execute SQL queries. So I paste the code here and I execute the query. It takes a few seconds. Again, it's a complex query,
an expensive query for SQL but at the end, I obtain the result that is similar to what I have seen before because actually, if you
see the numbers here, we have one row for each month and we have two numeric columns. The question is, "is
this exactly the number "we have seen before?" If we carefully look at the
result, this is not true because these two numbers
are relatively large numbers, whereas the value that
I obtain in DAX Studio, if I go back to the results, is a number where we have the sales amount has an actual number here, which represent the number,
a number in millions probably but the margin is a decimal number. So if you think for a moment, the model we have shows sales amount, and margin percentage but
we didn't spend time looking at how these measures are written in DAX. So if we go back to the model and we look at the
definition of these measures, we can better understand
whether these calculations can be moved to SQL or not. So if I go back here to Power BI, and I click on margin percentage, which is the measure that
I didn't see in the result, you see here I have the margin percentage is the division between
two other measures, margin and sales amount. And what is margin? The difference between
sales amount and total cost. And what are sales amount and total cost? Well, sales amount is the sum of the multiplication of quantity multiplied by net price and finally, these two columns are two actual columns of the
data model we have in SQL. Quantity and net price,
like for total cost, total cost has quantity and unit cost. So these columns are the columns where we have the raw data, the data that is in the storage. And only the storage engine
can access to this information. However, we didn't copy the entire content of the sales table. We would have got 12 million rows. We only received one row for each month, which means that the
storage engine request transferred a request to aggregate data and because we have to sum the result of a multiplication row by row, I think that also the multiplication has been sent to the storage engine. So if we go back to the
SQL code we see here, I can tell you that the
number we see in a0 here is the value of sales amount, whereas a1 is probably the value of... Oh sorry, it's the opposite. A1 is the value of sales amount, yes, whereas a0 is the value for total cost. So total cost and sales amount are two measures that the
formula engine can request to the storage engine, in this case, so the storage engine has the ability to execute the result of our SUMX formula. However, the subtraction
between the two measures and the division between the result of the other two measures is something that the
formula engine has to do because we see that the storage engine didn't compute the margin percentage. If I go back one second to the whiteboard, just let me recap what happened here. So we have seen a SQL query sent by the formula engine
to the storage engine. And this query SQL retrieved a table that has, let me draw here, three columns. We have the month here, we have the total cost and we have the sales amount. Okay, and we have this total. However, the result produced
by the formula engine and provided to Power BI is different because the result here
has the same granularity, the same number of rows, because we always had one value per month, but the value here is sales amount but here, we have a margin percentage. So this calculation is performed in the formula engine. And if we go back to the query plan, we can see that there are steps that consume the result of this storage engine request and perform the additional calculation. In this case, our DIVIDE between the measure, one measure is the
difference between sales and total cost, so it will be sales minus total cost, comma, and at the denominator, we should have, if I remember well, sales. So this calculation is
performed by the formula engine because if we see that the result provided by the storage engine doesn't
include the result we see in the Power BI report, it must be the formula engine to do this calculation. And the reason why I
wanted to show this example with this schema, is because if I go back to DAX Studio, and so let me go to DAX Studio here, and I go back to the query plan, my problem with the query plan is that it's extremely detailed and verbose and it could be complex here to retrieve the single step that perform the division
and the subtraction. If you look carefully, you
can see this information but if you look at this code, it's many, many, many lines
of code in the query plan, sometimes for relatively simple queries. It is much simpler to just
look at the server timings, look at the result provided
by the storage engine in order to understand
what is the workload for the formula engine. You can also see that here, we have the formula engine and the storage engine represented here and in this case, most of the time is spent
in the storage engine. When you see that the formula
engine consumes more time, it's because the storage engine
retrieve the data quickly but then the formula
engine has to do more work. Sometimes it has to aggregate the data because maybe the storage
engine was not able to perform the aggregation based on the request you made. So this first example's shown how the DAX query produced a query plan that produced one request
to the storage engine and the same request made
to the storage engine was consumed in different parts of the query plan executed
by the formula engine to produce the final result. Now it is time to see the same example, using a different storage
engine because at this point, we have seen the storage
engine using DirectQuery. Now we see what happens
when you import the data in the model and you use another storage engine, which is VertiPaq. Every time you import the data in memory, the engine used to keep the
data compressed in memory in the Power BI file is called VertiPaq. This model has the same data
that we have seen before. The only difference is that now, we have the model imported in memory. So we can have the
ability to see the content of the data of all the tables here. You can see in the data view this. You can see that the
data model is the same that we used before. And the measures we have are the same too. Same visual, same model but this time, the data is imported in memory. So what changes when we look at the query plan and the execution? So if I go here and I enable
the Performance analyzer, start recording and refresh visuals, you see that this query has been executed in just a few milliseconds. Now I copy the query and
I go back to DAX Studio and I open DAX Studio here. And I will execute the
query in DAX Studio. So first of all, let's execute the query without any query plan just to see that the result is the same as the result we have seen before with the DirectQuery model. Now I enable the Query
Plan and the Server Timings and I execute this query again. You see here the query
plan, the description of all the steps performed
by the query plan and this query plan is slightly different from the one that we
have seen for DirectQuery because for a different storage engine, there could be a different query plan. The reason is that the
different storage engine can have different features and can perform
aggregations or calculations that don't have to be made
by the formula engine. In this case, this
difference is not visible but you can see that there
is a different naming, a different name of the node that retrieve the data
from the storage engine. This node here is called cache. So cache, cache, all these points in the query plan identify requests made to the storage engine. Now, the storage engine used by models where you import data in
memory is called VertiPaq. VertiPaq is the storage
engine used by Power BI, by Analysis Services when you
import the data in memory, whereas when you use DirectQuery, the storage engine is
your external database. Now, we can see in Server Timings, the detail of the request
made to the storage engine. And in this case, we see that
we had only one request made to the storage engine
that is described here. Now, let me reduce this
and I can copy this code and I paste this code in the
editor here at the bottom. Just to show you that the
storage engine request made to VertiPaq is actually a query in a language that resembles SQL. Indeed, this is called xmSQL. XM, sorry, this one, xmSQL. So we are looking at this code. Let me just go here, here we go. So this is the name of the
language that is described here. It's not an actual language. We cannot execute this query for real. This is a textual representation of a request made to the VertiPaq engine. Internally, this request
is made using a binary tree in memory, which is more
condensed and compressed but this textual representation help us to understand what is going on. And you see that this
syntax contains expressions, these two expressions are derived by the two measures, sales
amount and, oh, I removed this, sales amount and total cost. And here you see that the SELECT statement retrieves three columns. One column's used to do a group by and two columns to perform an aggregation of the result of these two expressions. One is multiplying quantity by unit cost. This is the total cost measure. And one multiplies quantity by net price. This is the sales amount measure. So the result of this xmSQL query is not visible unfortunately. We can imagine that like
we have seen before, the result here is a table
that has three columns, the month, the total
cost and the sales amount with one row for each month for which we have results. And the formula engine
receives this cache, this data cache, this is the internal name of the result of the
VertiPaq query data cache and this data cache is transferred to the formula engine that
performs the other calculation that, in this case, are identical to the calculation we described before because we have to compute the margin and we have to compute
the margin percentage. So this is DAX code that is executed by the formula engine, so we can say that what
the formula engine does, it knows, the formula engine knows DAX, whereas the storage engine knows, in this case, the VertiPaq engine knows xmSQL, which is the language
that we have seen before. Let me show you this this way. I was covering the name here. Okay, so this is the area. Now, once the formula
engine receive the data, once again, it builds
the final result sent to Power BI, which contains, for each month, the sales amount and
the margin percentage. As you see, this value has been computed in the formula engine, whereas this value, sales amount, was coming directly
from the storage engine. There was no transformation required by the formula engine. So we have seen that the
DAX engine has two engines, the formula engine and the storage engine. The formula engine is very sophisticated, has the full knowledge of the DAX language and can perform any calculation. It can perform also further
aggregations if required. Whereas the storage
engine is more limited. It has direct access to the physical data and this allows flexibility because this way, DAX can
be used with DirectQuery, with VertiPaq and potentially
with other storage engines in the future. So this is the first step to understand how DAX works, understanding
the formula engine and the storage engine. You can start to investigate
your measures using DAX Studio to see how they are
executed in the query plan in the server timings with
all the storage engine queries and of course, there is more to study to optimize your code but by far, this is the
most important concept to understand before you move forward with other additional details about the VertiPaq engine and the DAX engine. Enjoy DAX.