[MUSIC PLAYING] Welcome to MSPTDA
video number 18. Hey, in this video, we got to
talk about DAX iterators, DAX table functions, and
grain, cardinality, and materializing tables. Now, in this class
so far, we actually have already talked about
DAX iterator functions. We actually only
talked about a few of the X iterator functions,
like SUMX and AVERAGEX, that take a formula,
iterate over a table, create an array of
values, and then do the aggregate calculation,
either SUM for SUMX or AVERAGE for AVERAGEX. We've also seen two
different DAX table functions in the class so
far and, of course, that is a DAX formula that
delivers a table of values instead of a single value. We saw the CALENDAR
table function index, and we also saw the
ALL table function. Now, when we used ALL
inside of CALCULATE to calculate percent
of grand total, we didn't think of it
as a table function. But as we'll see in this video,
this is an important DAX table function. And we've learned
about grain and how it's the size or level of data
or summarization that we make. We learned how to
iterate in the AVERAGEX across the transaction-level
grain or a day grain. Now, what's new in this
video in regards to DAX iterator functions is we'll
see some new DAX iterating functions, including an X
in the CONCATENATEX function and a non-X iterator,
the FILTER function. We'll also learn that we
have to be careful when iterating over fact tables. And our formulas materialize
unnecessary tables. We'll also have to be careful
of context transition when we're iterating and the
dreaded Double Count problem. And we'll try not to
iterate over fact tables. And when possible,
we'll see if we can reduce the
cardinality, which just means the number of
items in the iteration. For example, is it possible to
go in an AVERAGEX calculation from a fact table down to
the product table, which, of course, is much smaller? And in this video, we'll see a
bunch of new table functions, including talking all about
ALL, VALUES, and CROSSJOIN. And sometimes we'll see
that the grain we want isn't directly in the
table, and so we'll have to use table
functions to determine the grain of the iterator. Now, what does DAX
stand for anyway? Data Analysis Expressions. That second letter
in expressions, X, that's the X at the end of DAX. And DAX is our amazing
function-based computer language we use to create our
measures, calculated columns, and DAX tables. Now, here are the
amazing topics we're going to see in this video. There's also a time
hyperlink table of contents below the video. There's also a bunch of
files you can download if you want to follow
along with everything you see in this video. This is the class website. These are also available in
the links below the video. Page 4 in the PDF
notes, we have a section on DAX iterator functions. Here are some examples. We've already seen
SUMX and AVERAGEX. But iterator functions
create row context and make a row-by-row
calculation on a table with a
certain granularity to generate an array
of answers which can be aggregated, joined,
or delivered as a table. The number of
elements in that array is equal to the number of
rows in the table that's being iterated. Now, SUMX, AVERAGEX, MAXX, those
are all aggregate functions that create an array of answers
and then deliver sum, average, and the maximum value. CONCATENATEX, that's another
X function like these three. It'll iterate a table and
then generate a single answer of joined elements. And we'll see an
example of that later. FILTER and ADDCOLUMNS,
those are iterators that go through
each row in a table and either filter the table
or add an extra column. But these two
iterating functions deliver a table, whereas these
four deliver a single answer. Now we want to go over to
Excel and learn more about how iterators work. And we're going to start
off in this Excel file. And later, we actually have
a few other Excel files and Power BI Desktop files. Now, I've already imported
the data using Power Query. We can see over here
I've imported the data. If we go over to the data
model where I've already imported this, we
can see we have four tables in diagram view. There's the
fTransaction surrounded by dimension tables in
one-to-many relationships and a star schema data model. Back over to Data View. This is the fact table. It has about 2.2
million rows of data. Now, down here in the fact
table down in the measure grid, we want to create
our first measure. We're going to create a
measure for total revenue, and it's going to
be an iterator. Now, remember, we
talked about the choice between a calculated column and
a measure in video number 15. If we create a line item revenue
calculated column and then use a sum down here
to add it, that column will be added to the data model,
and we'll take up RAM memory. If we create a measure, it will
simulate that entire column in a single measure. When we drop the measure into
a pivot table or a Power BI report, all of those values
will have to be calculated. So it's always a trade-off. Do you want to store the raw
data in the Columnar Database or do you want to calculate
each time you drag and drop a measure? We're going to create our
measure for total revenue. So we come up to
the formula bar-- total revenue that
iterates over a fact table. We're in Excel, so we
use colon, equal sign, and then we use SUMX. We need the table, F,
Transaction, comma, and then the expression, that's
our formula that will iterate over this table. FT and we're going
to do UnitsSold. That is a column in this table. So because this
is an iterator, it will create row context, which
means the column units sold will pick out each one of
the individual units sold and then multiply it by-- well, we have to get a column
from a different table. And because there's
a relationship between the fTransactions
and the dProduct table, we can use RELATED to
look it up, dProduct, and we want retail price. Now, Related goes through
each row in the fTransaction, sees the product, and can
pull the correct price. That's our formula
right there that we're going to use to iterate. Close parentheses. It'll calculate all of the
values as an array of values internally, and then
SUM will add it. And Enter. Now we can add some
number formatting. Now, I want to copy this. And a few rows down, I'm going
to come up to the formula bar, and I want to create a
very similar measure, but this one's going to be
called Average Transactional Revenue Iterate
the fact table F. The only difference
here is going to be, instead of the iterator
that aggregates a sum, we're going to use the iterator
that aggregates as an average. The table is the same. We're iterating over
the transaction table. And the formula is the same. When I hit Enter, it does
all of the same things, but now it's
calculating the average of all of those
internally-generated numbers. I'll add some number formatting. Now, we want to
think about this. We had a choice here
when we did AVERAGEX. We had a formula here to
calculate the line item revenue, but we could have, if
we wanted to, since we already had a measure here,
we actually could have put that measure as the
second argument in AVERAGEX. Now, there's a bunch of
problems when we do this. And so we want to see
what those problems are. I'm going to copy this. And below, I'm going to come up
to the formula bar, Control-V, and I'm going to
change F, which means we have a formula in
the second argument. I'm going to say, in
the second argument, we're going to have a measure. And instead of that
formula right there, I'm going to square bracket,
and there it is. There's our measure. It'll give me exactly the
same answer on this data set. Close parentheses and Enter. I'm going to add some
number formatting. We can see, on this
data set, whether or not we use a measure in the second
argument or we put our formula. We get exactly the same answer. Now, I'm actually going to
change the name of this to-- in the second argument,
there's a formula. That way we can differentiate
easily between these two. Now, for the measure, there's
a bunch of problems with this. The first problem is
internally inside that measure, this is the formula. And what's inside of it? The fact table. So that formula is iterating
over this entire fact table, 2.2 million rows. But guess what? That measure, as it starts
iterating over the transaction table inside of AVERAGEX,
that means for every row inside of fTransaction, it has
to then iterate over a fact table inside this measure. So visualize this. That means the measure, the
first row in this 2.2-million row table, the first row
it has to iterate over 2.2. Then it goes to the second row,
iterate over 2.2 million rows. That's a lot of extra
calculating as compared to this one right here. The second unnecessary
thing is, well, one advantage of using
a measure in an iterator is context transition. That just means that the
hidden CALCULATE function around every measure brings
in all available row contexts and converts it
to filter context. Well, the F transaction
table for every row is already there, so it
doesn't need to do that. Now, the third problem,
we'll see in the next example in the next file. But what we want to do right
now is go measure and time the difference between using
this iterator, AVERAGEX, with a formula and
with a measure, and then we'll clearly see
that this formula takes longer. We'll also be able
to see in DAX Studio that an extra table is
materialized internally that we don't really need. And that helps slow down
the calculation time. Now I want to save this
and close it and open up DAX Studio. Now we have this
Excel file open. I've already searched Google for
DAX Studio and downloaded it. It appears in Add-ins,
so I click DAX Studio from inside of the Excel file. It's smart. It knows that I opened
this from an Excel file. So I simply click Connect. This opens up the
DAX Studio window. Over here, we see into
our Excel data model. For example, we can open
up fTransaction and see our measures and our columns. This is the only one
that's not hidden. That one is hidden. Now, we want to come
up to the Run button, and I'm going to select
Clear Cache and then Run. That way, every time we
test one of our formulas, it will be as if we're
running it the first time. In order to time our formulas,
we'll come over to Traces and select Server Timings. Now, we're going to
come into the white. I'm going to hold
the Control key and roll my wheel to zoom in. We can see the percentage
zoom right there. And we saw this
earlier in the class that we have to use the
EVALUATE command in order to visualize a table. Now, wait a second. We're trying to time formulas,
not visualize a table. If I were to select
a table like this and now click the Run button,
over on the Results tab I could see the whole table. But that's a table. So we have to use
a little trick here to get a table to materialize
with our single formula. And then that way,
we can time it. And the trick-- I'm going to delete this. And the trick is we have to
use the ROW DAX function. Now, ROW DAX function
just needs a name of a column and then
some expression. It will return a
one-column, one-row table. In double quotes,
oftentimes people just put an X because they don't
care what the column is called. We're just trying to
use the second argument. I'm going to point to the
middle here and click and drag. There's our iterator's
second argument formula. So I'm going to double-click. I'm going to Control and roll
to bring this on the screen. Close parentheses. Now, when you're in a
hurry, we just put X. But I'm going to second argument
formula, then come to the end and close parentheses. Now, I want two
different formulas. So I'm going to hit Enter,
Shift-Tab, Enter, EVALUATE, Enter, Tab, ROW,
open parentheses. And in double quotes
in the second argument, this will be the measure,
N, double quotes, comma. We'll change that to second. And now I'm going to select
second argument measure. Double-click. Close parentheses. Now, we have two
different tables here. And because we don't
want both of them to materialize and be timed,
I'm going to highlight just one, then come up to Run. I can see it's very polite. It says boom, boom right there. It looks like I have
an extra parentheses. Now we're going
to try this again. Over here, we can
see the results of the single-column,
single-row table. There's the same answer. But what we want to look
at is server timings. Now, there's a split for each
formula between SE and FE. SE is Storage Engine. That's where all
the data is stored. In order for a
formula to calculate, it has to go down to
the Storage Engine. And remember, that's a
Columnar Database that stores each column as a unique list. So if it needs to
work on a full table, it's going to have to go
down to the Storage Engine and materialize it. You can see, over here,
Formula Engine, that's the calculation part of the
DAX that can't be done down in the Storage Engine. Now, the total time
is the addition of Formula Engine
and Storage Engine-- so 8 milliseconds. Now, if you click this
a few different times, it'll give you different
times each time. That's the same way we
time things over in Excel. Oftentimes, you have to
time it a number of times and then take an average. So there it is. It looks like 6
milliseconds in the Storage Engine, 1 millisecond
in the Formula Engine. Now, there is a preference
between these two. Formula Engine only gets
to use one processor. Storage Engine gets to
use multiple processors. So if we can force the
calculation of our DAX formula down to the storage
engine, that's a big advantage because it
can calculate more quickly. And later, we'll see this
little calculation right here, when it is using
multiple processors, it'll tell us how
many processors here. Over here, here's the one
we want to look at, Rows. This will tell us, in the
Materialize table, how many rows were materialized. We can also, if you
know how to read xmSQL, look at the code over here. Now, I actually took Marco
Russo and Alberto Ferrari's optimization course
and read their book, but I am no expert in reading
a query plan or this xmSQL. But some of you might. It's similar to SQL. That expression
means the formula. SELECT and then there's SUM,
comma, SUM, and expression equals expression. This actually happens when
you're doing an average. That's trying to add. This is trying to
calculate a count. Then select from what? Transaction-- LEFT OUTER JOIN,
just like in SQL, dProduct on, and there's the ProductID
column with the relationship. All right. So what we're going to look at
is how many rows in the table are materialized
and the total time. Now, if we come down to evaluate
the one with the measure, now we click Run. Oh, you can already see. Look at that. It had to materialize
that full fact table. It had to run two
different queries. Down here, 974 milliseconds
in the Storage Engine, 551 in the Formula Engine. And there we can see a 1.5. That's why this total, the
actual total down in the CPU, is higher than this
Storage Engine. If we take 1,454 and
divide it by a 1.5, that gives us that amount. And since these are the actual
total times, when we add these together, there it is. So we can see two things
when we have the measure. It takes a lot
longer to calculate, and it had to materialize
internally a big table. All right. So lesson number one,
when we're iterating, since we were iterating
over a fact table and then had a measure
with a fact table, we don't want to do that. We want to use the formula. Later, we'll see in this video
and in the previous two videos, definitely sometimes
we want to measure to do context transition, but
just not over a fact table. All right. Now, we can actually save this. I'm going to Control-S. I gave
it a name, chose a location, and click Save. Now I'm going to close
this and go back to Excel. Now, back over in
this Excel file on the Power Pivot for Excel
window, we have our measures. I'm going to click PivotTable,
on a new worksheet, click OK. I'll call this PT. Now, I want to drag-- from the dDate table, I'm
going to drag year and month. Then from our fTransactions,
I'm going to drag Total Revenue. Let's see how long it
calculates-- basically instantly. Average transaction with a
formula calculates quickly. And then our measure-- it's taking a while. So it took a while. So even over here in the pivot
table or over in Power BI, we can see there's a
lag in the calculation because when we drag that
measure into sum report, it has to calculate
for each cell with a different
external filter context, and then it has to do all
that iterating and context transition over a fact table. Now, I'm going to
save this file. We want to go see a third
problem with a measure iterating over a fact table. And we're going to open up
a Power BI Desktop file. Now, over in this
Power BI Desktop file, we have a much smaller data
set with an fTransaction and a dProduct. If we go over to
Data or Table View, we can see we have
a very small table. If we go over to
Report View, I've already created a multi-row
card and created two measures. Now let's look at these
measures-- total revenue. Oh, it's exactly the
same as we did over in Excel on a larger data set. And for average transactional
revenue second argument formula, there it is,
that's the amount. But let's create
this same formula with a measure in
the second argument. Come up to Modeling,
New Measure. And the second argument
will have an M. And over here in
Power BI Desktop, we use an equal sign
and then AVERAGEX, fT, that's our fTransaction fact
table, comma, square bracket, and we'll get our
total revenue measure. Close parentheses, Enter. We'll add some formatting,
come over and check. And what in the
world is going on? Why is it when we have a measure
in the second argument instead of a formula, we're
getting a larger number than this average
transactional revenue? Well, if we go look at
the data or table area, the problem is there's 1, 2
transactions that are exactly equivalent. Down in the middle, there
is another duplicate record. Now, as we'll see in
just a moment, when we have duplicate
records, the measure is doing context transition
and getting confused and double counting the
duplicate records each time it sees that duplicate record. Now, we'll see more
specifically why that's happening in just a moment. But think about this. This is just a small data set. But if you have a large
transactional data set, it's not uncommon to
sell a particular product at a particular price and
have a duplicate transaction with the same number of units,
or whatever it might be. It's not uncommon
to have duplicates. So we want to see that
there's two different ways to solve this problem. Now, if we go look
at the formula, the reason we're getting
double accounting, an average that's larger
than the correct average, is because of
context transition. Now, context
transition, any time in an iterator or a
calculated column, the measure with the hidden
calculate takes the row context and converts it into an
equivalent filter context. Now, what that means for
duplicates is that when it gets to this row right
here, instead of filtering down to a single row, it thinks that
there are two matching records. So it filters down to two rows. Now, to prove this to ourselves,
let's add a new column. We'll call it Line
Item Revenue, and we'll do Units times RELATED,
and we'll get our dProduct, close parentheses and Enter. Whoops-- dProduct Price,
close parentheses and Enter. We can clearly see we have the
correct line item revenues. Now let's do the same thing. We'll call it Line Item Revenue
Measure, square bracket, and we'll get our total revenue. And when I hit Enter,
what did it do? It double counted. When it got to this
row, context transition took the row context,
converted it to filter context, which meant it filtered this
table down to two records, doubling the sales
for that line. And of course, it did
it twice because when it got to the other duplicate,
it did exactly the same thing. Down here, also, for
this duplicate, boom, there they are, double counting. Now, obviously, the
best way to solve this is to use the formula
because the formula is not going to invoke
context transition and filter the fact table
down to two matching records. There is another way we could
solve this problem, also. Now, if we go back
to our Excel file and open up the Power
Pivot for Excel window, there's our two measures
we did over here in Excel, formula measure,
but we got the same answer. The reason why-- there
were no duplicate records. We had a Transaction ID column. So every single transaction
had a unique identifier. So that's why we didn't run
into this problem over here in this data set. So back here in
Power BI Desktop, we actually could
amend our table. I'm going to go to Edit
Queries, Edit Queries. Here's our fTransaction. We can go to Add
Column, Index, From 1. And just by doing this, now
there are no duplicate records. So when I go to Home,
Close and Apply, now we get the
correct number here because there are no
duplicate records. The moral of the story, of
course, is, in this situation, we just don't want to put a
measure in the second argument here. And the reasons are simple-- we don't want to have the
formula iterate over the fact table twice, we don't need
the context transition, and if there are duplicates
and no primary key, we don't want to get
the wrong answer. In fact, one other thing, if we
go back over here, we can see, of course, there's an index. So now, that doesn't happen here
in the measure column either. All right. I'm going to save this
as the finished file. All right. Back over here an Excel file
in our Power Pivot window, we don't want to use a measure
and iterate over this fact table. But guess what? What if our goal was,
instead of calculating average transactional
revenue, we wanted average daily revenue? Well, in that case,
average daily revenue, we definitely want to use
AVERAGEX and the table. Well, we can't use fTransactions
because that would give us the individual
line item revenue, and we want daily totals. So we use DD, the
dDate table, comma. And in this case, I want to use
my measure for total revenue, and I want context
transition to bring each individual day,
which is a filter, to flow across to the fact table
so that internal fact table has only the records for
each one of those days. Also, when this measure
iterates over the dDate table, there's never going to
be that double counting problem because the dDate
is a unique list of dates. Close parentheses and Enter. We'll add some
number formatting. We'll go back to our
pivot table sheet. I'm going to remove the
average with a measure. And there's our new
average daily revenue. I drag it down to Values
and it calculates quickly. Now, we did these two
calculations this video and last video,
and we were lucky because the average
transaction for each one of these calculations,
we just happened to have the right table with
the right grain, transaction and day. But what if we
wanted to calculate monthly average revenue? Now, before we see
how to do that, let's go remind
ourself about grain, the grain or the granularity
of a particular table. Now, all granularity
or grain means is what is the size or
level of the detail? In this fact table, we see this
is the transaction or product level. Invoice 27002 sold
product number four. But it also sold
product number three. So we've decided to break up the
numbers at the transaction line level or the product level. Now, that's different
in this fact table where the two numbers are
combined into total sales for the invoice. Now, the grain of a
fact table is important because it determines what
criteria conditions or filters we can apply to the table. Now, for this
invoice-level fact table, we cannot apply a product
filter, whereas, over here, we can apply a product filter. So the grains are different
between these two fact tables. Now, later in this class, we'll
actually solve this problem and combine them into
a single fact table where the grain is at
the smallest level, which will allow us to apply
conditions, criteria, or filters however
we would like. Now, in this video, we'd
like to talk about grain as it relates to the table
inside an iterator function. Well, in fact, we
just learned how to use SUMX to iterate the
transaction-level grain of the fact table. We also saw how to iterate
across the fact table using AVERAGEX. And then we saw how to
use AVERAGEX to iterate across the date table. Now, the grain of the date
table itself is the day level. Now, what if, instead
of like last video calculating the average
of daily revenue, what if we wanted to calculate
the average of monthly revenue? Well, we don't have a
table, like a date table, because if we get a measure
to iterate this table, it is going to do the day level. Well, in this video, we'll
see the DAX functions VALUES, ALL, and CROSSJOIN
to access certain columns and get a unique list. So internally, inside of
our iterator functions, we can iterate at whichever
grain we would like. Now, back here in our
Power Pivot window, we'd like to create
our calculation. But let's go look
at the dDate table. Now, luckily, we
created a date attribute column that has year and month. We can't just use month
because then all the Januarys are grouped together. We need an attribute column
that shows both year and month. If you don't have
this column, then you have to use CROSSJOIN to create
such a column or such a table combining year and month. Now, we'll see both methods. But by all means, when
designing the data model, if you're going to calculate
average monthly revenue, then this is the
attribute column you want. It'll make our life easy. We come back over
to fTransactions. Down here, up in the formula
bar, Average Monthly Revenue. Now, notice we're
acknowledging the grain in the name of our formula,
transactional, daily, and monthly. Now, AVERAGEX. And for the table, we're
going to use a brand-new DAX function called VALUES. VALUES can look at a column or a
table and return a unique list. So right here, Year Month,
I'm going to double-click. Close parentheses. And now we have our table that
the formula can iterate over. This is the month granularity
or grain, comma, square bracket. I'm going to get my total
revenue, and there we go. Close parentheses and Enter. Add some number formatting. Down here in the
measure grid, that's the average monthly revenue
for all of the transactions in the fact table. When we drop it into
the pivot table, it'll give us the correct
average monthly revenue for each one of the years. Now, VALUES is a
brand-new function. So I actually want
to show you two ways to see what VALUES is doing
inside of a pivot table. I'm going to close this
Power Pivot window. Now, before we go over and
prove what VALUES does, let's drag our Average
Monthly Revenue down to the Values area. And look at that. Uh oh, it's exactly equal to
total revenue for the month. Well, that's correct because
the total for the month divided by one is the total revenue. But it's the year area
of the pivot table that we get the correct
average monthly revenue. Now, actually, we could
come over to the side and use an Excel function and
average these monthly totals. And when I hit
Enter, I better get exactly the same
average monthly revenue. Yes, we do. Now, I want to go over to DAX
Studio, Add-ins, DAX Studio. We'll connect. In the white area, EV, Tab. Enter, Tab, VALUES, Tab. And from the dDate
table, I'm going to double-click Year and Month. Now, remember, that
column is an attribute that has many repeats. But when I click Run, sure
enough down in output, I can see I get a unique list
of all of the years and months. Now, while we're over
here in DAX Studio, let's see what happens-- EV, Tab, Enter, Tab-- if we don't have this
attribute column, which, again, would be
silly because it's so easy to create in your data model. But we can use the
CROSSJOIN function, and CROSSJOIN takes
two tables, and we're going to use VALUES of
Year and then VALUES of the Month column. Close parentheses,
close parentheses. Now, CROSSJOIN does
a Cartesian product where it matches up each
item from this table with every single item
in the second table. And because the
values are delivering a unique list of
years and months, when I highlight this and click Run,
I get the same basic thing, but with two columns. Since our goal is to
use this as a grain to iterate over to calculate
monthly totals before we average an AVERAGEX,
that will work just fine. Now, this will take
a little bit more calculating because it has
to do multiple calculations, whereas this one's
looking at a single column and delivering a unique list. Now, I want to go look at the
VALUES functions in another way because when I do this
over here in DAX Studio, this is showing me every
single possible combination. Now, when I do EVALUATE on
VALUES, Date, Year Month, that's a single column. But notice, it's given
me every single item from the Date table. It is a unique list,
but it's, in essence, without any external
filter context. So I want to show you another
demonstration of VALUES that will prove, internally in
each cell on the pivot table or over in Power BI Desktop,
it will give you a unique list in the current filter context. Now, I'm going to close
this and not save. I'm going to save this Excel
file and then close it. And we want to go over to
a different Excel file. And here's the name of
this new Excel file. Now we have this fact table,
a Product dimension table, and a Sales Rep dimension table. If we go over to our data model,
Power Pivot Manage Data model, in Diagram View, we could
see the fTransaction, Product and Sales Rep. I'm going to close this. Our goal to illustrate
VALUES is this-- I've already created,
from the data model, a pivot table with sales rep. But notice, here's
the sales rep ID. And for any particular
sales rep ID, there should be a number
of different products sold. So for example, for Sue, I
want my pivot table formula to list all of the products. Well, that's the
perfect job for VALUES because if I ask values
to look at this column, well, without a pivot
table filter context, it would give me
every single product. But let's look at Sue. Sue is number one. There's a Quad and a Bellen
and, also, a Sunspot. So if I ask VALUES to look
at this entire column, when it gets to that
cell right there for Sue, that condition filters this
table down to one record. It flows across the
relationship, which will show just the records for Sue. And since VALUES will
be pointing at product, we'll see only Quad,
Bellen, and Sunspot. Now I'm going to click
inside the pivot table. We could go over to
the measure grid. But another way to
create measures in Excel is right-click, Add Measure. This brings up the
Measure dialog box. We can put a measure name,
list of product values, and the formula will be-- well, I would like to
just use VALUES, fT, and I see right
there, Product column from the fTransaction table. Now, the problem
with that formula is that's a table function. We cannot display a table
in a pivot table cell. But guess what? We can use this as
our table, and we can join all the items together
using another X iterator function, CONCATENATEX. And there it is, CONCATENATEX. Now, it needs a table. That VALUES will--
as the formulas copy down in the pivot
table cell, for each cell, it will display a different
unique list of products. Comma. The expression will actually
need the same column. fT-- there's product. Comma. And then the delimiter-- double quote, comma,
space, double quotes, or whatever you want. Close parentheses. Now, the way this
iterator will work-- well, it has its
table, which will be a different unique list for
each cell in the pivot table. That column instructs
CONCATENATE to go through that table's column and
join them with that delimiter. When I click OK, now I have
a new measure over here. And when I drag it
down into Values, that is how Values is working. So now we can clearly
see that as VALUES copies through whatever pivot
table or Power BI visual, it will deliver a unique
list given the current filter context. Now, we'll come back
to this workbook because we're going to talk a
lot more about table functions. But remember, our
goal is to learn what the VALUES function does. So over here, in
this Excel workbook, we saw that VALUES delivers
a unique list of items in the current filter context. Now, I'm going to close
this Excel workbook. We'll come back to it later. Back over in this Excel
workbook here, now we get it. the VALUES function in
that cell right there, here we can see the measure
VALUES is delivering exactly 12 months for the year 2017. So VALUES, as we copy through
a pivot table report or a Power BI Desktop visual, sees
the current filter context. Escape. Over here in the
Power Pivot window, there's our average
monthly revenue measure. I'm going to copy
this, Control-C-Escape. Click right below. Up to the formula
bar, Control-V. And I'm going to change the name
at the end to CJ for Cross Join . And just to see
that this does work, we could use CROSSJOIN
and then VALUES. There's our dDate Year,
close parentheses, comma. The second table in CROSSJOIN-- VALUES. And I'm going to down arrow
to Month, close parentheses, close parentheses. There's the table. CROSSJOIN of year and month. And when I hit Enter, it
adds some number formatting. I get exactly the same
grand overall total for average monthly revenue. Back over here in
the pivot table, I can drag this down
to Values, and I get exactly the same numbers. So in this pivot table here, we
have 1, 2, 3 different grains, transaction day and month. So the first argument
of an iterator, you always have to
think about the grain. Now, another topic
for consideration-- when picking the table for the
first argument of an iterator is cardinality. Now, cardinality just means how
many items are in that table. Now, to illustrate
cardinality, I'm going to close
this Excel workbook and we're going to open up
a Power BI Desktop file. Now, we're over here in
this Power BI Desktop file. In Relationships View,
we have a fact table with one measure for total
revenue, the Date table, and the Product table. If we go back over to Data or
Table View, if I click on Date, I can see there's 730 rows. Product has 630 rows. But the fTransaction table
has over 21 million rows. So if we're going to
calculate total revenue-- and here's the measure under
the fTransaction table. We have, for total
revenue, SUMX. There's the fTransaction,
21 million rows. We're looking up the
price from the dProduct through the relationship, and
we're multiplying it by units. If we look over in Relationships
View, what we're doing is we're iterating
over this table. And because we're
iterating over this table, we have access to units,
and then we're looking up through the relationship price. So that means there are two
components to calculating total revenue, units and price. Well, what if, instead of
iterating over the fact table, we iterated over
the dProduct table? That means there's a
unique identifier product. So as we iterated across
this, for each row we'd see the price. Then all we need to do
is access all the units. Well, guess what? If we did the sum
of the units, that means we'd have to go
through the relationship. Well, as it's
iterating, the problem is there's no filter context. But if we wrap SUM
inside of CALCULATE, then for each row
in this table, which means product ID,
unique identifier, that single product would
flow into the SUM function as a filter. This table would
be filtered down to just the units for that
particular product ID, and then, of course, the
SUM function would add. So we're going to try that
alternative total revenue formula. Now, any time we try
something like this, we want to test it by
dragging it into our report and seeing if it
calculates quickly, or we can go to DAX Studio
and time, because sometimes, depending on the data
and the data model, we can get a
significant advantage by changing the calculation,
like we're going to try. Other times, it's not
such a big difference. All right. Let's go back to
Data or Table View. Select Transactions. Up in Modeling, I
want New Measure. I'm going to call this Total
Revenue for the product table-- 630 rows, SUMX. And the table in the
first argument of SUMX will be dProduct, significantly
smaller cardinality. 630 rows, comma. We want to access the price
column inside of dProduct-- times. And if we just did SUM
of fTransaction units. But the SUM function
does not have the ability to pull the row context from
dProduct into the fTransaction table. No problem. We know by now in
this class, CALCULATE can change the filter context. And for us, we're
taking row context and converting it
to filter context. Now, internally, inside
of SUM, fTransaction units is filtered down to just the
records for individual products from the dProduct table. And that formula, close
parentheses on SUMX and Enter will give us the same answer as
our total revenue over the fact table. We'll add some
number formatting. We can see our two
formulas over here. Let's go over to Report
View, over to Visualizations. I'm going to use
this multi-row card. Now we're going to come
over and check and check. There are our two measures. Now, let's come over to
the paint roller, Format, Data Labels, text size 15,
Category, text size 15. Now, we can take this
one step further. What if we took, instead of
iterating over the product table, let's go look
at the Product table. Here's Product. Look at that. There are tons of
duplicate prices. Actually, if we go back
over to Report View-- and from the Product table, I'm
going to check Unique Count. Look at that. Even though there's
630 products, there's seven unique prices. Now, I constructed and
created this whole data set just so we could illustrate. If there's a
situation like this, then I think we
could do one better. We can use VALUES
on the price column and iterate over just the seven. By the way, Matt
Allington, who's the author of three
DAX books and a teacher and answers questions
online, is the one who taught me these
formulas here of how to go from iterating the
fact table to iterating on the other side on
the Product table. All right. Let's go back to
Data, fTransactions. And I want to select
that measure right there, copy it, Control-C, Escape,
New Measure, Control-V, and I'm going to rename it. We'll call it VALUES Price 7. And for the table, we're
going to use VALUES. And right after
dProduct, square bracket. There's the price. Close parentheses. So now we've changed the
cardinality significantly from 630 to just seven. When I hit Enter,
add some formatting. Come over to report, and we're
going to check total revenue. And look at that. They're all three
of them the same. Now, we want to go and try
and use DAX Studio to time these formulas. Now, unlike Excel where we have
a button right inside of Excel, DAX Studio I'm just
going to open up from either the Start
menu or a shortcut. Because I'm not
opening it from Excel, I don't see any option here. Here it is, Power
BI Desktop, and it sees the file we're working on. Click Connect. We can see our measures over
here, over here on the white the Editor pane. I'm going to type EVALUATE
command, Enter, Tab, ROW, and then in double quotes, I'm
going to call this 21 million, comma. And there's our expression. So I'm going to double-click,
close parentheses. Now, I'm going to create the
other two EVALUATE statements. We have our three
EVALUATE statements. I'm going to change this to
Clear Cache and then Run. Over in Traces, I'm going
to turn on Server Timing. Over on Server Timing, we
want to see the total time. I'm going to highlight the
first EVALUATE statement. Click Run. I get 31 seconds. I'm going to click it again. 43 seconds. I'll go on to the next one. So that was the
21 million record. So now let's try
the Product table with the 630 rows in our table. Click Run-- so
about half the time. Click it again--
about half the time. Now let's try our
VALUES on the Price. Highlight. Click Run-- almost the
same as the previous one. I'm going to click Run again. So about the same-- these
two are about the same, but both are about twice
as fast as the fact table. Now, the truth is
these are milliseconds. So there's probably
not a big difference. But we can clearly
see that when we change the cardinality
on these two, they're going to run
faster than if we're going over the full fact table. Now, I'm not going to
save this DAX Studio. I'm going to close
this and not save. Now, back over here in
our Power BI Desktop file, I'm going to save this
as the finished file. Now I'm going to
close this, and we're going to open up a new file. Now, we're over in
this Excel file here, and our last topic of
discussion in this video is we want to talk about some of
the different table functions. Now, we already
talked about VALUES. But to start off, I want to
compare and contrast the table function ALL and VALUES. Here's what we did
earlier with VALUES. If I come over to the pivot
table field, let's right-click, Edit. There's our CONCATENATE
with VALUES. And VALUES will give
us a unique list, but it sees the filter context. So that's why, down here for
GG, it only sees two products, because in the filter context
of GG, Sales Rep three, this table is filtered
down just to the threes and VALUE sees just those
particular products. But ALL does something
totally different. If I right-click-- and I've
already created this formula. I'm going to click on Edit. We did the same formula,
CONCATENATEX, but instead of VALUES looking at product
from the transaction table, I used the ALL function. And look what it does. It totally removes
the filter context and shows all of the
products for every single one of the cells. So right off the
bat, VALUES and ALL, they both give a unique list. But whereas VALUES sees
the current filter context, ALL removes all the filters
from a particular column, in our case, product,
or a complete table. Now, there's one
other similarity between VALUES and ALL. I've created two pivot
tables, Total Revenue. We could see our
measure right there. But what I would
like to do is see what happens when I drag
Product from the Product table and then Product from
the Transaction table. So I'm going to drag
Product down to Values. And there it is, all the
products and a blank. Now, if I come down
to this pivot table, I'm not going to drag Products
from the Product table. I'm going to drag it from
the Transaction table. So I'm dragging it down to Rows. And what? There's no blank,
but there's an Aspen. And there it is. Up here, Product from
the dProduct table, I get blank 24,000. Down here, I get Aspen 24,000. This is what happens-- and we
actually talked about this back in video number 15. There are four products in
the dDimension Product table. Over here, if we create a unique
list, there's five products. Aspen is an unmatched
item in the relationship. That's why when we drag
Product from the fact table down to this pivot
table, it lists Aspen. But for this pivot table,
when we drag Product from the dDimension
table, because there's a missing item in
the relationship, it shows a blank
because it wants to have the correct
total at the bottom. So blank gets 24,540. And as we mentioned
back in video 15, I could have 20
products here that are not in the dimension table,
and that one blank picks up all of the extra numbers
for those unmatched items. Now, how does that come into
play for VALUES and ALL? Well, I want to go
over to DAX Studio. So I'm going to go to
Add-ins, DAX Studio. And over here, there's no
external filter context. But watch what happens
when I evaluate VALUES on the dProduct table and run. I get four products
and an empty cell. That only happens when
there's a relationship and there's an unmatched item. All does exactly the same thing. Now, because there's no
filter context out here, VALUES and ALL will deliver
exactly the same answer. Now, if we change the column
from dProducts to fTransaction, VALUES, when I click Run,
it shows me all the items, including Aspen. ALL will do the same. VALUES and ALL can
also be used on tables. Now, we're still out
here in a situation where we have no filter context. So these will deliver
the same things. But if I were to evaluate this-- I'm going to click Run. VALUES will just return
the entire table. ALL will do the same. I click Run. It returns all the
records in the table. Now, VALUES and ALL can have
a table or a single column. ALL function, however, can
have more than one column from the same table. So if I were to put
sales rep ID and product from the fTransaction
and run this, it will give me a
unique combination of all of the sales
rep IDs and product. Now, this is different
than CROSSJOIN. CROSSJOIN can do it
from different columns and different tables. But ALL, when I
click Run, does it just for these columns
in the same table. Now, if we compare and contrast
VALUES and ALL, both of them create unique lists. VALUES sees the
current filter context, whereas ALL removes the filters. Each will return one blank
cell when there are unmatched items in the relationship. And if there's no
filter context, they'll both return
the same items. And you can put a
column or a table into VALUES, whereas
for ALL you can put a column, multiple
columns from the same table, or a complete table. Now, as we will see next
video when we study CALCULATE, VALUES and ALL are
very common functions that help change the
current filter context. Now, we want to go
back over to DAX Studio and see how to get
rid of the blank if we have an unmatched
item in a relationship. Now, VALUES on dProduct product,
because there's an unmatched item, shows an empty cell. But instead of VALUES,
we can use DISTINCT. DISTINCT does the
same thing as VALUES, except for when I run it or
use it in some DAX formula, it does not return that
empty cell for an unmatched item in the relationship. Now, ALL removes
all the filters. So of course, it removes all
the filters wherever it is. But it will show
that empty cell. ALLNOBLANKROW-- well, it does
exactly what it's supposed to. It removes all filters, and it
will not return that empty cell when there are unmatched
items in the relationship. Now, if you actually
had an empty cell in one of these columns here instead
of just an unmatched item, they would both
return that because it would be considered
one of the items in the unique or distinct list. Now, really, if we are building
our data model smartly, we're not going to
have an unmatched item. So most of the time,
we're just not going to use DISTINCT
or ALLNOBLANKROW. Now, there's a few
other table functions we want to talk about. We already talked
about CROSSJOIN. ALLEXCEPT-- that's a function
we'll see next video when we learn about CALCULATE. But it takes a table and a
column you want to exclude. And then ALLEXCEPT will return
a unique list of records. Another important table
function is called FILTER. You give it a table
and a filter based off of a column in that table, and
it returns a filtered table. And FILTER is not
only a table function. It's an iterator, also. CALCULATETABLE-- that's
like the CALCULATE function. It can change the
FILTER context, but it returns an entire table. And then ADDCOLUMNS--
you give it a table, the name of the new
column and an expression, and it adds a new
column to that table. Now, we want to have an
introduction to these three table functions in this video. And then in the next video when
we talk all about CALCULATE, we'll talk a lot more about
FILTER and CALCULATETABLE. Back in DAX Studio, let's see an
example of the FILTER function. It needs a table. I'm going to fT to get
my fTransactions, comma. And whatever filter
we construct here based on a column in
this table, FILTER will return a filtered table. I'm going to say
fTransactions, square bracket, and we're going to do a
test on the Units column. I'm going to say, please
return all the records that are greater than 250. Now, when I click Run,
and on the Results tab, we can see our Filter table. Now, the FILTER
function is an iterator. It's not one of the X iterators
that delivers a single item. FILTER is an iterator that
returns a complete table. Now, another table
function is not FILTER, but it's CALCULATETABLE. Now, the difference is that
CALCULATETABLE can take a table in the first argument, comma. But we can put
filters or conditions from anywhere in the data model. So for example, I could say
dProduct, square bracket, and I'm going to look at a
completely different column in a different table
than fTransactions. And I'm going to say,
hey, please only give me, from the dProduct Product
column, the products that are named Quad. And so CALCULATETABLE is
not an iterator like FILTER, and it can use any column
that's in a relationship with fTransaction
to do the filtering. CALCULATETABLE, just like
the CALCULATE function, can see the entire data model. Then when I click
Run, I can see I have filtered the table based on
a column in a different table. Now, we'll talk a lot more
about these two functions and CALCULATE in
next video, including learning how to do AND logical
tests and OR logical tests. For the time being, we just
want to realize that these are both table functions. FILTER is an iterator
and can filter a table based on a column. Look at that. Did I leave that
out, fTransactions? That was a naked column. That violates our rule
that square brackets alone represent a measure. So I want to definitely
amend that and run it. So FILTER is an iterator,
and the condition has to be based on some column
in the first arguments table. CALCULATETABLE changes
the filter context and delivers a
table as a result. Now, ADDCOLUMNS-- ADDCOLUMNS. We need the table, the
name of the new column, and the expression. I'm going to say
fTransactions, comma, and the new column will be
called Total Revenue, comma, and we'll use our
measure, Total Revenue. Close parentheses. Now it'll take the entire
fTransaction table and Date, Products, Sales, Sales Rep,
and Units, five columns. So the resulting table
will have six columns. There's our five columns
and our total revenue. Now I want to go
back over to Excel, and I want to run at least
a couple of these formulas and remind you about the
existing connections feature. Now I'm going to scroll
over to the side. And in cell X13, go up to
Data, Existing Connections. And what I would
like to do is I'd like to use DAX code to
look into the data model. And as we learned back in video
15, existing connections-- we can go to Table and
pick any one of these. I'm going to pick a small
one, dProduct, click Open. There's our Import Data. I'm going to click OK. That's the Product table,
but it doesn't matter. I needed to connect
somehow to the data model. Now, because I used
existing connections, I can right-click,
down to table, and there's a new option
available called Edit DAX. I'm going to change this to DAX. Now I'm going to go
back over to DAX Studio and I'm going to cheat
because I need this same code. Control-C. It's much easier
to create the code over here in DAX Studio because in
this Edit DAX dialog box through existing connections,
it's hard to type down here. I'm going to Control-V.
Now when I click OK, I have a filtered table that
is looking into the data model and pulling data out. Now, check this out. I have a hurdle
right here, and I've converted it to an Excel table. I already imported this
into the data model, and this is a great trick. This is how to get a variable
from Excel into the data model. The only question is,
how do I access that 250? Because maybe I want to use it
in some DAX formula in the data model. But right here, I would like
to use it as my condition for filtering this column. Then I could change this
number in the Excel spreadsheet and refresh and pull
data from the data model. Now I'm going to go
back over to DAX Studio. And the way we do that is
we use the VALUES function. Now, we saw how to
use VALUES to return a unique list of
items or a table without any duplicate records. But now, because I already
have this in the data model, there it is. And there is a special feature
about the VALUES function. Normally, it returns a table. That means the field
name and all the record. But if it returns a single
item and the DAX formula needs it as a scalar value-- that means a single value-- it will convert this
to that single value. That means the 250. If I highlight
this and click Run, it works exactly perfectly. Now I'm going to
copy this, go back over to Excel, right-click,
Table, Edit DAX. And down here, Control-V. Now
it's connected to that cell right there. Click OK. If I change this
to 200, it's not going to immediately
update, because it's not like a formula. But I can come up to
Refresh All or use my keyboard, Control-Alt-F5. And look at that. Now I'm pulling
data from the data model based on a value
from the Excel spreadsheet. Now, in the finished
file, I did a number of existing connections, DAX
queries to the data model, and you can come
and look at those. All right. That was an epic video about
DAX iterators and DAX table functions. We reminded ourselves
about SUMX and AVERAGEX, and we iterated
over a fact table. We reminded ourselves
about context transitions, converting row contexts
into filter contexts. We were careful with
context transition when iterating
over a fact table. And we saw that
sometimes materializing unnecessary tables can really
slow down DAX measures. We definitely learned
how to use DAX Studio to time DAX formulas. We saw the context transition
and double count problem. And we solved it using a
formula rather than a measure, but we also saw that
adding a primary key will fix that problem. We reminded ourselves
about AVERAGEX iterating at the day grain
on the dDate table. Then we learned how to
use inside of AVERAGEX to get a month grain, VALUES,
and CROSSJOIN and VALUES together. We talked about how
reducing the cardinality of a particular iterator can
help speed up calculations. And then we learned a lot
about DAX table functions. And we finished it off
by reminding ourselves about Excel's
existing connections to pull data from the data
model into an Excel spreadsheet. All right. If you liked that video, be
sure to click that thumbs up. Leave a comment and subscribe
because there's always lots more videos to come
from ExcelIsFun, including next video, video 19. We'll learn all about
the CALCULATE function. All right, we'll
see you next video. If you like what you
see in this video, click that Subscribe
button and the bell icon to get notified about new video.