- Ciao friends. In this video, I want to answer
one of the most frequently asked questions about Power
BI and Tabular in general. It goes like this. If I need to create a
new column in my model, and I have the option of doing that in DAX or in Power Query, which one is better? You did ask it yourself, didn't you? Now, the thing is answering this question can be rather complex or extremely simple. It all depends on what you mean by better. Better means smaller, a
smaller model is better or better means faster
or better means simple, more simple to create. There are a lot of possible options and answering all of them takes time and a lot of reasonings. That's why I divided the
video into two sections. In the first one, I just give
you my opinion and my answer. And if you trust my judgment,
then you just go for that. If you want to know more, then you will need to follow
me up to the end of the video, where I will give you an explanation on why I provided the first answer. Let's start with the answer. The answer is it depends, but if your model is not really huge, it doesn't really matter. Go for what you like the most. That's the end of section one. You're still there? I know you, you want
to know all the details and you want to understand how I ended up with this conclusion. If that is the case,
follow me and let's start analyzing why this is the answer. We start with the usual
Contoso data model. We have sales, promotion, a
lot of different dimensions, but specifically for this model, we are only interested in
looking at the sales table. I loaded the entire Contoso model. So it's 12 million rows. And the reason is if we
don't have enough rows, it is impossible to measure anything. The first thing that we
will need to do is take some base measures, because we
want to measure the space used by columns created with
DAX or with Power Query. So we need to know at the beginning, what is the size of the model. And then we will do the two steps. We create the column in DAX. We create the columns in Power Query, and we do some reasoning in the meantime. Let's start by measuring
the size of the database as it is now. To do that, we just
need to open DAX Studio. And from there, we take
a look at the metrics and we save them just in case. So I just connected DAX Studio,
take a look at the metrics. And now I have all the
information about my data model. You see that the only table,
which is relevant is sales that contains around 12 million rows. I'm not interested in doing
any kind of analysis so far. I just want to take the metrics
and save them for later. So next step I export it and I give it a meaningful name, like Contoso 12 million base. Okay, let's save it. And we have our numbers. Let's go back to Power BI Desktop. As I said, the first step is
creating the column using DAX. I'm gonna create a new
column that contains a small number of distinct
values, and it's the price range. The price range basically splits the sales depending on the net price. So let's call it price range. If the price is less than
100, we say it's low. If it is less than a 1,000 it's medium. If it is higher than a 1,000 it's high. So we use a SWITCH over TRUE and we check if the net price, sales net price, sales, we don't have sales net price. I am probably creating a measure, yes. So let's start again, I
wanted to create a new column. It was price range that does a SWITCH over TRUE. And if the net price, now I see that, is less or equal than a
100 then we said it's low. Otherwise if sales net
price is less or equal than a 1,000 then it's medium. Otherwise it's high. This column only contains
three different values. So it should be a small
column in the model. Just a second to compute it, now I have my price range column and it shows three values,
high, low, and medium. Now that I have the column in the model, I can go back to DAX Studio
and look at the metrics now. So let's go back to DAX
Studio. We still have it here. And I can look at the metrics here, but I'd rather create a new window so that I have the option of looking at the metrics in both. So we look at the metrics right now. And as I always do, we
save it just in case. This is Contoso 12 million calc col in DAX. Now the sales table still
contains 12 million rows, but now I want to dig
into it a bit deeper, because I'm searching for the
new column that I created. Here it is, price range. Price range uses around
2.6 megabyte of space. So it's a column, 12 million values. 12 million rows. Three distinct values. It's around 2.6 megabytes in size. Creating it was rather simple and didn't take a lot of time to do that. But now the next step
is, let's do the same using this time Power Query. So first we delete this column, because we want to go back to the
original size of the model. And then we do that in Power Query. So we go to home, transform data, go to sales, and we create a new column, custom column. Okay, now we need to write M code. It's still unit, no
it's still price range. If ... the net price is less or equal than 100, then low. Else, if the net price is less or equal than a 1,000 then it's medium. Otherwise it's high. Okay the code is not
different than what I did in DAX and we just click on okay. And finally we close and apply. Now, here comes the first difference. When I created the column using DAX, it was quite fast, because
it created the column and it was immediately available. DAX is super fast when it
comes to creating columns. Whereas with Power Query, I now need to wait until the
full table is loaded again. I could have cut this
section from the video, but I decided to keep it here. The reason is that I want
you to experience the time that is needed to load 12 million rows. It's not fast, and it makes the whole process kind of annoying, because you need to wait a lot of time every time you do a small change. We are around 9 million. We have three million still to go. And if for whatever reason,
I need to update the column, I need to run again
through the entire process. So from the development point of view, it is kind of better to use DAX, because the entire development
process is much faster. Now it finished. Keep in mind that it's not that easy. It's not that DAX is
always the best option, because if you really have a huge table, then it is very likely that you
have partitioned your table. Now, if a table is partitioned, you probably always load,
especially in production, only the last partition, which is rather small
compared to the entire table. A calculated column will
force the recalculation of the calculated column
for the entire huge table, not only the last partition. Whereas a column computing
in Power Query or in SQL will be loaded from the server and does not trigger any recalculation. So for a large model, the choice really depends
on different aspects, not only on size; for small models, the difference is not really relevant because it doesn't take a lot of time to load from a small table
or to recompute a column for a small table. Now let's go back to the size. We created the column,
we need to measure size. The thing is, if I measure
the size of this model, right now, I will get wrong values, because immediately after processing, the model looks larger
than it actually is. There is a good reason for
that, it's rather technical, but you cannot measure the size of a model immediately after a process. But what I need to do is save the model and load it again in order
to take a good measure. So I save it, and then I close it, and
then I will open it again. Hopefully, it's not gonna take too much. So let's open it again. Now the model is being loaded. Memory is being allocated and now I will be able to
take the correct measures about the size. Just a few seconds, come
on Power BI, chop chop. We're just waiting for you. Here we go. Now the model is ready. I already have DAX Studio open. I just need to go there and
create a new connection. I need to connect again,
because I closed it earlier. So I now need a new connection, go to Advanced, View
Metrics and expand it. Now we can start to make some comparison. We have sales. We search for the price range, and here it is, the size
of the price range column. It's 470 megabytes, kilobytes. So it's much smaller
than the previous column. Let's take a look at query two. We have 2.6 megabytes, and here we have 470 kilobytes. And this is expected. A column created in Power
Query is always smaller than the same column created in DAX. Again, the reason is rather technical. When you create a column in DAX, it does not participate in the choice of the optimal sorting in order
to increase the compression. This column has a very small
number of distinct values. So it's likely that it will be very early in the list of columns
that need to be sorted. And this is the reason why, if
you load it from Power Query, it will be much smaller. But how much smaller? That is relevant. We are talking about 12 million rows and the difference in size
between this column for 170, and this other column
is around two megabytes. Now, when I was a teenager, two megabytes was likely the size of a good hard disk. But as of today, two megabytes
are just a tiny number. There is actually no real difference between half a megabyte
and two and half megabytes from the point of view of the
speed of querying this column. So I do not expect any difference at all in the speed of query of this column. Beside there is another important factor that you need to take into account. We've looked at the price range column. Let's look at the other columns. For example, the order number columns. In the new model the
last one that I created, where the column has been
created using Power Query, it's 82 megabytes, the size of the order number. In the model that was created earlier, so the one with the calculated
column is 80 megabytes. So the two megabytes that we waste on the calculated column, we are now using the two megabytes that we saved on the calculated column, we basically moved them on another column. Again, the reason for
this is, again, technical. Whenever you add columns to your table, the sort order of the table being loaded is likely to change. And again, because this
column has a very small number of distinct values, it comes
very early in the sorting. So it changes the sort order
of all the other columns, this way, changing the size
of all the other columns. Therefore you cannot
just look at one column and conclude that that column is smaller, therefore, the model is better. You need to look at all the columns and reason on top of them in order to make sure that the columns that you use the most
are the mostly optimized. So don't fall into the trap of thinking that optimizing one column is always good. The entire model needs to be optimized. Moreover, we are not yet done. Let's look at the total size. The total size of the model
without calculated column is 173 megabytes. With the calculated column, it's 175. So two megabytes more, the
calculated column, we knew that. And the model with the column
computed in Power Query is again, two megabytes more. Again in consideration,
did we really save space? It's hard to tell whether
the calculated column using Power Query is better or it's worse. The overall size of the
model is nearly the same. The individual column is smaller, but the space that we saved on that column has been moved into all the other columns. So we made some other columns worse. It's not enough to reason
on top of one column. When you need to optimize a model, you need to look at the entire model and check that you are
actually obtaining your goal. Beside it's very important that you focus on these two megabytes. It's neglectable, it's
not important at all. There are no differences. It makes a difference if instead
of having 12 million rows, you have 10 billion rows. There, the difference is important. But if that is the case, you will not just look at one column. You will try to super optimize everything. Finally, it is very
important that you focus on the fact that, if you
have a very large model, optimizing the place
where you load the column is not the most important thing to do. You need to focus on cardinality. You need to focus on
other aspects, data types, cardinality, number of distinct values, size of relationships,
a lot of other aspects, which are much more relevant. That is why we go to the
conclusion of all this. Is it better to create the
column in Power Query or in DAX? It actually does not really matter. First, it depends. It depends on what you mean by better. The overall size does not change by a lot. Yes, the single column size
might change, but overall the model does not
change in size by a lot, at least in this specific scenario. You might find different results looking at your model, but you need to run through the entire process. The reasoning that you do in choosing whether to compute something
in DAX or in Power Query is more about partitioning,
the time that you need to refresh your model, the option of doing that in
Power Query or in SQL or in DAX and the kind of calculated columns. There are calculated columns that are super easy to compute in DAX and extremely complex to compute
in Power Query or in SQL. Then for that column, you go for DAX, but you need to take into
account that there are drawbacks. So the answer is really, it depends not only on where you compute the column, but on the overall effect that this has on your entire model. And it does not really matter. The net result at the end, that does not depend on where
you compute that column. It depends on other factors that you need to take into account. Enjoy DAX.