MSPTDA 18: DAX Iterators, Table Functions, Grain, Cardinality, Materializing Tables Excel & Power BI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] Welcome to MSPTDA video number 18. Hey, in this video, we got to talk about DAX iterators, DAX table functions, and grain, cardinality, and materializing tables. Now, in this class so far, we actually have already talked about DAX iterator functions. We actually only talked about a few of the X iterator functions, like SUMX and AVERAGEX, that take a formula, iterate over a table, create an array of values, and then do the aggregate calculation, either SUM for SUMX or AVERAGE for AVERAGEX. We've also seen two different DAX table functions in the class so far and, of course, that is a DAX formula that delivers a table of values instead of a single value. We saw the CALENDAR table function index, and we also saw the ALL table function. Now, when we used ALL inside of CALCULATE to calculate percent of grand total, we didn't think of it as a table function. But as we'll see in this video, this is an important DAX table function. And we've learned about grain and how it's the size or level of data or summarization that we make. We learned how to iterate in the AVERAGEX across the transaction-level grain or a day grain. Now, what's new in this video in regards to DAX iterator functions is we'll see some new DAX iterating functions, including an X in the CONCATENATEX function and a non-X iterator, the FILTER function. We'll also learn that we have to be careful when iterating over fact tables. And our formulas materialize unnecessary tables. We'll also have to be careful of context transition when we're iterating and the dreaded Double Count problem. And we'll try not to iterate over fact tables. And when possible, we'll see if we can reduce the cardinality, which just means the number of items in the iteration. For example, is it possible to go in an AVERAGEX calculation from a fact table down to the product table, which, of course, is much smaller? And in this video, we'll see a bunch of new table functions, including talking all about ALL, VALUES, and CROSSJOIN. And sometimes we'll see that the grain we want isn't directly in the table, and so we'll have to use table functions to determine the grain of the iterator. Now, what does DAX stand for anyway? Data Analysis Expressions. That second letter in expressions, X, that's the X at the end of DAX. And DAX is our amazing function-based computer language we use to create our measures, calculated columns, and DAX tables. Now, here are the amazing topics we're going to see in this video. There's also a time hyperlink table of contents below the video. There's also a bunch of files you can download if you want to follow along with everything you see in this video. This is the class website. These are also available in the links below the video. Page 4 in the PDF notes, we have a section on DAX iterator functions. Here are some examples. We've already seen SUMX and AVERAGEX. But iterator functions create row context and make a row-by-row calculation on a table with a certain granularity to generate an array of answers which can be aggregated, joined, or delivered as a table. The number of elements in that array is equal to the number of rows in the table that's being iterated. Now, SUMX, AVERAGEX, MAXX, those are all aggregate functions that create an array of answers and then deliver sum, average, and the maximum value. CONCATENATEX, that's another X function like these three. It'll iterate a table and then generate a single answer of joined elements. And we'll see an example of that later. FILTER and ADDCOLUMNS, those are iterators that go through each row in a table and either filter the table or add an extra column. But these two iterating functions deliver a table, whereas these four deliver a single answer. Now we want to go over to Excel and learn more about how iterators work. And we're going to start off in this Excel file. And later, we actually have a few other Excel files and Power BI Desktop files. Now, I've already imported the data using Power Query. We can see over here I've imported the data. If we go over to the data model where I've already imported this, we can see we have four tables in diagram view. There's the fTransaction surrounded by dimension tables in one-to-many relationships and a star schema data model. Back over to Data View. This is the fact table. It has about 2.2 million rows of data. Now, down here in the fact table down in the measure grid, we want to create our first measure. We're going to create a measure for total revenue, and it's going to be an iterator. Now, remember, we talked about the choice between a calculated column and a measure in video number 15. If we create a line item revenue calculated column and then use a sum down here to add it, that column will be added to the data model, and we'll take up RAM memory. If we create a measure, it will simulate that entire column in a single measure. When we drop the measure into a pivot table or a Power BI report, all of those values will have to be calculated. So it's always a trade-off. Do you want to store the raw data in the Columnar Database or do you want to calculate each time you drag and drop a measure? We're going to create our measure for total revenue. So we come up to the formula bar-- total revenue that iterates over a fact table. We're in Excel, so we use colon, equal sign, and then we use SUMX. We need the table, F, Transaction, comma, and then the expression, that's our formula that will iterate over this table. FT and we're going to do UnitsSold. That is a column in this table. So because this is an iterator, it will create row context, which means the column units sold will pick out each one of the individual units sold and then multiply it by-- well, we have to get a column from a different table. And because there's a relationship between the fTransactions and the dProduct table, we can use RELATED to look it up, dProduct, and we want retail price. Now, Related goes through each row in the fTransaction, sees the product, and can pull the correct price. That's our formula right there that we're going to use to iterate. Close parentheses. It'll calculate all of the values as an array of values internally, and then SUM will add it. And Enter. Now we can add some number formatting. Now, I want to copy this. And a few rows down, I'm going to come up to the formula bar, and I want to create a very similar measure, but this one's going to be called Average Transactional Revenue Iterate the fact table F. The only difference here is going to be, instead of the iterator that aggregates a sum, we're going to use the iterator that aggregates as an average. The table is the same. We're iterating over the transaction table. And the formula is the same. When I hit Enter, it does all of the same things, but now it's calculating the average of all of those internally-generated numbers. I'll add some number formatting. Now, we want to think about this. We had a choice here when we did AVERAGEX. We had a formula here to calculate the line item revenue, but we could have, if we wanted to, since we already had a measure here, we actually could have put that measure as the second argument in AVERAGEX. Now, there's a bunch of problems when we do this. And so we want to see what those problems are. I'm going to copy this. And below, I'm going to come up to the formula bar, Control-V, and I'm going to change F, which means we have a formula in the second argument. I'm going to say, in the second argument, we're going to have a measure. And instead of that formula right there, I'm going to square bracket, and there it is. There's our measure. It'll give me exactly the same answer on this data set. Close parentheses and Enter. I'm going to add some number formatting. We can see, on this data set, whether or not we use a measure in the second argument or we put our formula. We get exactly the same answer. Now, I'm actually going to change the name of this to-- in the second argument, there's a formula. That way we can differentiate easily between these two. Now, for the measure, there's a bunch of problems with this. The first problem is internally inside that measure, this is the formula. And what's inside of it? The fact table. So that formula is iterating over this entire fact table, 2.2 million rows. But guess what? That measure, as it starts iterating over the transaction table inside of AVERAGEX, that means for every row inside of fTransaction, it has to then iterate over a fact table inside this measure. So visualize this. That means the measure, the first row in this 2.2-million row table, the first row it has to iterate over 2.2. Then it goes to the second row, iterate over 2.2 million rows. That's a lot of extra calculating as compared to this one right here. The second unnecessary thing is, well, one advantage of using a measure in an iterator is context transition. That just means that the hidden CALCULATE function around every measure brings in all available row contexts and converts it to filter context. Well, the F transaction table for every row is already there, so it doesn't need to do that. Now, the third problem, we'll see in the next example in the next file. But what we want to do right now is go measure and time the difference between using this iterator, AVERAGEX, with a formula and with a measure, and then we'll clearly see that this formula takes longer. We'll also be able to see in DAX Studio that an extra table is materialized internally that we don't really need. And that helps slow down the calculation time. Now I want to save this and close it and open up DAX Studio. Now we have this Excel file open. I've already searched Google for DAX Studio and downloaded it. It appears in Add-ins, so I click DAX Studio from inside of the Excel file. It's smart. It knows that I opened this from an Excel file. So I simply click Connect. This opens up the DAX Studio window. Over here, we see into our Excel data model. For example, we can open up fTransaction and see our measures and our columns. This is the only one that's not hidden. That one is hidden. Now, we want to come up to the Run button, and I'm going to select Clear Cache and then Run. That way, every time we test one of our formulas, it will be as if we're running it the first time. In order to time our formulas, we'll come over to Traces and select Server Timings. Now, we're going to come into the white. I'm going to hold the Control key and roll my wheel to zoom in. We can see the percentage zoom right there. And we saw this earlier in the class that we have to use the EVALUATE command in order to visualize a table. Now, wait a second. We're trying to time formulas, not visualize a table. If I were to select a table like this and now click the Run button, over on the Results tab I could see the whole table. But that's a table. So we have to use a little trick here to get a table to materialize with our single formula. And then that way, we can time it. And the trick-- I'm going to delete this. And the trick is we have to use the ROW DAX function. Now, ROW DAX function just needs a name of a column and then some expression. It will return a one-column, one-row table. In double quotes, oftentimes people just put an X because they don't care what the column is called. We're just trying to use the second argument. I'm going to point to the middle here and click and drag. There's our iterator's second argument formula. So I'm going to double-click. I'm going to Control and roll to bring this on the screen. Close parentheses. Now, when you're in a hurry, we just put X. But I'm going to second argument formula, then come to the end and close parentheses. Now, I want two different formulas. So I'm going to hit Enter, Shift-Tab, Enter, EVALUATE, Enter, Tab, ROW, open parentheses. And in double quotes in the second argument, this will be the measure, N, double quotes, comma. We'll change that to second. And now I'm going to select second argument measure. Double-click. Close parentheses. Now, we have two different tables here. And because we don't want both of them to materialize and be timed, I'm going to highlight just one, then come up to Run. I can see it's very polite. It says boom, boom right there. It looks like I have an extra parentheses. Now we're going to try this again. Over here, we can see the results of the single-column, single-row table. There's the same answer. But what we want to look at is server timings. Now, there's a split for each formula between SE and FE. SE is Storage Engine. That's where all the data is stored. In order for a formula to calculate, it has to go down to the Storage Engine. And remember, that's a Columnar Database that stores each column as a unique list. So if it needs to work on a full table, it's going to have to go down to the Storage Engine and materialize it. You can see, over here, Formula Engine, that's the calculation part of the DAX that can't be done down in the Storage Engine. Now, the total time is the addition of Formula Engine and Storage Engine-- so 8 milliseconds. Now, if you click this a few different times, it'll give you different times each time. That's the same way we time things over in Excel. Oftentimes, you have to time it a number of times and then take an average. So there it is. It looks like 6 milliseconds in the Storage Engine, 1 millisecond in the Formula Engine. Now, there is a preference between these two. Formula Engine only gets to use one processor. Storage Engine gets to use multiple processors. So if we can force the calculation of our DAX formula down to the storage engine, that's a big advantage because it can calculate more quickly. And later, we'll see this little calculation right here, when it is using multiple processors, it'll tell us how many processors here. Over here, here's the one we want to look at, Rows. This will tell us, in the Materialize table, how many rows were materialized. We can also, if you know how to read xmSQL, look at the code over here. Now, I actually took Marco Russo and Alberto Ferrari's optimization course and read their book, but I am no expert in reading a query plan or this xmSQL. But some of you might. It's similar to SQL. That expression means the formula. SELECT and then there's SUM, comma, SUM, and expression equals expression. This actually happens when you're doing an average. That's trying to add. This is trying to calculate a count. Then select from what? Transaction-- LEFT OUTER JOIN, just like in SQL, dProduct on, and there's the ProductID column with the relationship. All right. So what we're going to look at is how many rows in the table are materialized and the total time. Now, if we come down to evaluate the one with the measure, now we click Run. Oh, you can already see. Look at that. It had to materialize that full fact table. It had to run two different queries. Down here, 974 milliseconds in the Storage Engine, 551 in the Formula Engine. And there we can see a 1.5. That's why this total, the actual total down in the CPU, is higher than this Storage Engine. If we take 1,454 and divide it by a 1.5, that gives us that amount. And since these are the actual total times, when we add these together, there it is. So we can see two things when we have the measure. It takes a lot longer to calculate, and it had to materialize internally a big table. All right. So lesson number one, when we're iterating, since we were iterating over a fact table and then had a measure with a fact table, we don't want to do that. We want to use the formula. Later, we'll see in this video and in the previous two videos, definitely sometimes we want to measure to do context transition, but just not over a fact table. All right. Now, we can actually save this. I'm going to Control-S. I gave it a name, chose a location, and click Save. Now I'm going to close this and go back to Excel. Now, back over in this Excel file on the Power Pivot for Excel window, we have our measures. I'm going to click PivotTable, on a new worksheet, click OK. I'll call this PT. Now, I want to drag-- from the dDate table, I'm going to drag year and month. Then from our fTransactions, I'm going to drag Total Revenue. Let's see how long it calculates-- basically instantly. Average transaction with a formula calculates quickly. And then our measure-- it's taking a while. So it took a while. So even over here in the pivot table or over in Power BI, we can see there's a lag in the calculation because when we drag that measure into sum report, it has to calculate for each cell with a different external filter context, and then it has to do all that iterating and context transition over a fact table. Now, I'm going to save this file. We want to go see a third problem with a measure iterating over a fact table. And we're going to open up a Power BI Desktop file. Now, over in this Power BI Desktop file, we have a much smaller data set with an fTransaction and a dProduct. If we go over to Data or Table View, we can see we have a very small table. If we go over to Report View, I've already created a multi-row card and created two measures. Now let's look at these measures-- total revenue. Oh, it's exactly the same as we did over in Excel on a larger data set. And for average transactional revenue second argument formula, there it is, that's the amount. But let's create this same formula with a measure in the second argument. Come up to Modeling, New Measure. And the second argument will have an M. And over here in Power BI Desktop, we use an equal sign and then AVERAGEX, fT, that's our fTransaction fact table, comma, square bracket, and we'll get our total revenue measure. Close parentheses, Enter. We'll add some formatting, come over and check. And what in the world is going on? Why is it when we have a measure in the second argument instead of a formula, we're getting a larger number than this average transactional revenue? Well, if we go look at the data or table area, the problem is there's 1, 2 transactions that are exactly equivalent. Down in the middle, there is another duplicate record. Now, as we'll see in just a moment, when we have duplicate records, the measure is doing context transition and getting confused and double counting the duplicate records each time it sees that duplicate record. Now, we'll see more specifically why that's happening in just a moment. But think about this. This is just a small data set. But if you have a large transactional data set, it's not uncommon to sell a particular product at a particular price and have a duplicate transaction with the same number of units, or whatever it might be. It's not uncommon to have duplicates. So we want to see that there's two different ways to solve this problem. Now, if we go look at the formula, the reason we're getting double accounting, an average that's larger than the correct average, is because of context transition. Now, context transition, any time in an iterator or a calculated column, the measure with the hidden calculate takes the row context and converts it into an equivalent filter context. Now, what that means for duplicates is that when it gets to this row right here, instead of filtering down to a single row, it thinks that there are two matching records. So it filters down to two rows. Now, to prove this to ourselves, let's add a new column. We'll call it Line Item Revenue, and we'll do Units times RELATED, and we'll get our dProduct, close parentheses and Enter. Whoops-- dProduct Price, close parentheses and Enter. We can clearly see we have the correct line item revenues. Now let's do the same thing. We'll call it Line Item Revenue Measure, square bracket, and we'll get our total revenue. And when I hit Enter, what did it do? It double counted. When it got to this row, context transition took the row context, converted it to filter context, which meant it filtered this table down to two records, doubling the sales for that line. And of course, it did it twice because when it got to the other duplicate, it did exactly the same thing. Down here, also, for this duplicate, boom, there they are, double counting. Now, obviously, the best way to solve this is to use the formula because the formula is not going to invoke context transition and filter the fact table down to two matching records. There is another way we could solve this problem, also. Now, if we go back to our Excel file and open up the Power Pivot for Excel window, there's our two measures we did over here in Excel, formula measure, but we got the same answer. The reason why-- there were no duplicate records. We had a Transaction ID column. So every single transaction had a unique identifier. So that's why we didn't run into this problem over here in this data set. So back here in Power BI Desktop, we actually could amend our table. I'm going to go to Edit Queries, Edit Queries. Here's our fTransaction. We can go to Add Column, Index, From 1. And just by doing this, now there are no duplicate records. So when I go to Home, Close and Apply, now we get the correct number here because there are no duplicate records. The moral of the story, of course, is, in this situation, we just don't want to put a measure in the second argument here. And the reasons are simple-- we don't want to have the formula iterate over the fact table twice, we don't need the context transition, and if there are duplicates and no primary key, we don't want to get the wrong answer. In fact, one other thing, if we go back over here, we can see, of course, there's an index. So now, that doesn't happen here in the measure column either. All right. I'm going to save this as the finished file. All right. Back over here an Excel file in our Power Pivot window, we don't want to use a measure and iterate over this fact table. But guess what? What if our goal was, instead of calculating average transactional revenue, we wanted average daily revenue? Well, in that case, average daily revenue, we definitely want to use AVERAGEX and the table. Well, we can't use fTransactions because that would give us the individual line item revenue, and we want daily totals. So we use DD, the dDate table, comma. And in this case, I want to use my measure for total revenue, and I want context transition to bring each individual day, which is a filter, to flow across to the fact table so that internal fact table has only the records for each one of those days. Also, when this measure iterates over the dDate table, there's never going to be that double counting problem because the dDate is a unique list of dates. Close parentheses and Enter. We'll add some number formatting. We'll go back to our pivot table sheet. I'm going to remove the average with a measure. And there's our new average daily revenue. I drag it down to Values and it calculates quickly. Now, we did these two calculations this video and last video, and we were lucky because the average transaction for each one of these calculations, we just happened to have the right table with the right grain, transaction and day. But what if we wanted to calculate monthly average revenue? Now, before we see how to do that, let's go remind ourself about grain, the grain or the granularity of a particular table. Now, all granularity or grain means is what is the size or level of the detail? In this fact table, we see this is the transaction or product level. Invoice 27002 sold product number four. But it also sold product number three. So we've decided to break up the numbers at the transaction line level or the product level. Now, that's different in this fact table where the two numbers are combined into total sales for the invoice. Now, the grain of a fact table is important because it determines what criteria conditions or filters we can apply to the table. Now, for this invoice-level fact table, we cannot apply a product filter, whereas, over here, we can apply a product filter. So the grains are different between these two fact tables. Now, later in this class, we'll actually solve this problem and combine them into a single fact table where the grain is at the smallest level, which will allow us to apply conditions, criteria, or filters however we would like. Now, in this video, we'd like to talk about grain as it relates to the table inside an iterator function. Well, in fact, we just learned how to use SUMX to iterate the transaction-level grain of the fact table. We also saw how to iterate across the fact table using AVERAGEX. And then we saw how to use AVERAGEX to iterate across the date table. Now, the grain of the date table itself is the day level. Now, what if, instead of like last video calculating the average of daily revenue, what if we wanted to calculate the average of monthly revenue? Well, we don't have a table, like a date table, because if we get a measure to iterate this table, it is going to do the day level. Well, in this video, we'll see the DAX functions VALUES, ALL, and CROSSJOIN to access certain columns and get a unique list. So internally, inside of our iterator functions, we can iterate at whichever grain we would like. Now, back here in our Power Pivot window, we'd like to create our calculation. But let's go look at the dDate table. Now, luckily, we created a date attribute column that has year and month. We can't just use month because then all the Januarys are grouped together. We need an attribute column that shows both year and month. If you don't have this column, then you have to use CROSSJOIN to create such a column or such a table combining year and month. Now, we'll see both methods. But by all means, when designing the data model, if you're going to calculate average monthly revenue, then this is the attribute column you want. It'll make our life easy. We come back over to fTransactions. Down here, up in the formula bar, Average Monthly Revenue. Now, notice we're acknowledging the grain in the name of our formula, transactional, daily, and monthly. Now, AVERAGEX. And for the table, we're going to use a brand-new DAX function called VALUES. VALUES can look at a column or a table and return a unique list. So right here, Year Month, I'm going to double-click. Close parentheses. And now we have our table that the formula can iterate over. This is the month granularity or grain, comma, square bracket. I'm going to get my total revenue, and there we go. Close parentheses and Enter. Add some number formatting. Down here in the measure grid, that's the average monthly revenue for all of the transactions in the fact table. When we drop it into the pivot table, it'll give us the correct average monthly revenue for each one of the years. Now, VALUES is a brand-new function. So I actually want to show you two ways to see what VALUES is doing inside of a pivot table. I'm going to close this Power Pivot window. Now, before we go over and prove what VALUES does, let's drag our Average Monthly Revenue down to the Values area. And look at that. Uh oh, it's exactly equal to total revenue for the month. Well, that's correct because the total for the month divided by one is the total revenue. But it's the year area of the pivot table that we get the correct average monthly revenue. Now, actually, we could come over to the side and use an Excel function and average these monthly totals. And when I hit Enter, I better get exactly the same average monthly revenue. Yes, we do. Now, I want to go over to DAX Studio, Add-ins, DAX Studio. We'll connect. In the white area, EV, Tab. Enter, Tab, VALUES, Tab. And from the dDate table, I'm going to double-click Year and Month. Now, remember, that column is an attribute that has many repeats. But when I click Run, sure enough down in output, I can see I get a unique list of all of the years and months. Now, while we're over here in DAX Studio, let's see what happens-- EV, Tab, Enter, Tab-- if we don't have this attribute column, which, again, would be silly because it's so easy to create in your data model. But we can use the CROSSJOIN function, and CROSSJOIN takes two tables, and we're going to use VALUES of Year and then VALUES of the Month column. Close parentheses, close parentheses. Now, CROSSJOIN does a Cartesian product where it matches up each item from this table with every single item in the second table. And because the values are delivering a unique list of years and months, when I highlight this and click Run, I get the same basic thing, but with two columns. Since our goal is to use this as a grain to iterate over to calculate monthly totals before we average an AVERAGEX, that will work just fine. Now, this will take a little bit more calculating because it has to do multiple calculations, whereas this one's looking at a single column and delivering a unique list. Now, I want to go look at the VALUES functions in another way because when I do this over here in DAX Studio, this is showing me every single possible combination. Now, when I do EVALUATE on VALUES, Date, Year Month, that's a single column. But notice, it's given me every single item from the Date table. It is a unique list, but it's, in essence, without any external filter context. So I want to show you another demonstration of VALUES that will prove, internally in each cell on the pivot table or over in Power BI Desktop, it will give you a unique list in the current filter context. Now, I'm going to close this and not save. I'm going to save this Excel file and then close it. And we want to go over to a different Excel file. And here's the name of this new Excel file. Now we have this fact table, a Product dimension table, and a Sales Rep dimension table. If we go over to our data model, Power Pivot Manage Data model, in Diagram View, we could see the fTransaction, Product and Sales Rep. I'm going to close this. Our goal to illustrate VALUES is this-- I've already created, from the data model, a pivot table with sales rep. But notice, here's the sales rep ID. And for any particular sales rep ID, there should be a number of different products sold. So for example, for Sue, I want my pivot table formula to list all of the products. Well, that's the perfect job for VALUES because if I ask values to look at this column, well, without a pivot table filter context, it would give me every single product. But let's look at Sue. Sue is number one. There's a Quad and a Bellen and, also, a Sunspot. So if I ask VALUES to look at this entire column, when it gets to that cell right there for Sue, that condition filters this table down to one record. It flows across the relationship, which will show just the records for Sue. And since VALUES will be pointing at product, we'll see only Quad, Bellen, and Sunspot. Now I'm going to click inside the pivot table. We could go over to the measure grid. But another way to create measures in Excel is right-click, Add Measure. This brings up the Measure dialog box. We can put a measure name, list of product values, and the formula will be-- well, I would like to just use VALUES, fT, and I see right there, Product column from the fTransaction table. Now, the problem with that formula is that's a table function. We cannot display a table in a pivot table cell. But guess what? We can use this as our table, and we can join all the items together using another X iterator function, CONCATENATEX. And there it is, CONCATENATEX. Now, it needs a table. That VALUES will-- as the formulas copy down in the pivot table cell, for each cell, it will display a different unique list of products. Comma. The expression will actually need the same column. fT-- there's product. Comma. And then the delimiter-- double quote, comma, space, double quotes, or whatever you want. Close parentheses. Now, the way this iterator will work-- well, it has its table, which will be a different unique list for each cell in the pivot table. That column instructs CONCATENATE to go through that table's column and join them with that delimiter. When I click OK, now I have a new measure over here. And when I drag it down into Values, that is how Values is working. So now we can clearly see that as VALUES copies through whatever pivot table or Power BI visual, it will deliver a unique list given the current filter context. Now, we'll come back to this workbook because we're going to talk a lot more about table functions. But remember, our goal is to learn what the VALUES function does. So over here, in this Excel workbook, we saw that VALUES delivers a unique list of items in the current filter context. Now, I'm going to close this Excel workbook. We'll come back to it later. Back over in this Excel workbook here, now we get it. the VALUES function in that cell right there, here we can see the measure VALUES is delivering exactly 12 months for the year 2017. So VALUES, as we copy through a pivot table report or a Power BI Desktop visual, sees the current filter context. Escape. Over here in the Power Pivot window, there's our average monthly revenue measure. I'm going to copy this, Control-C-Escape. Click right below. Up to the formula bar, Control-V. And I'm going to change the name at the end to CJ for Cross Join . And just to see that this does work, we could use CROSSJOIN and then VALUES. There's our dDate Year, close parentheses, comma. The second table in CROSSJOIN-- VALUES. And I'm going to down arrow to Month, close parentheses, close parentheses. There's the table. CROSSJOIN of year and month. And when I hit Enter, it adds some number formatting. I get exactly the same grand overall total for average monthly revenue. Back over here in the pivot table, I can drag this down to Values, and I get exactly the same numbers. So in this pivot table here, we have 1, 2, 3 different grains, transaction day and month. So the first argument of an iterator, you always have to think about the grain. Now, another topic for consideration-- when picking the table for the first argument of an iterator is cardinality. Now, cardinality just means how many items are in that table. Now, to illustrate cardinality, I'm going to close this Excel workbook and we're going to open up a Power BI Desktop file. Now, we're over here in this Power BI Desktop file. In Relationships View, we have a fact table with one measure for total revenue, the Date table, and the Product table. If we go back over to Data or Table View, if I click on Date, I can see there's 730 rows. Product has 630 rows. But the fTransaction table has over 21 million rows. So if we're going to calculate total revenue-- and here's the measure under the fTransaction table. We have, for total revenue, SUMX. There's the fTransaction, 21 million rows. We're looking up the price from the dProduct through the relationship, and we're multiplying it by units. If we look over in Relationships View, what we're doing is we're iterating over this table. And because we're iterating over this table, we have access to units, and then we're looking up through the relationship price. So that means there are two components to calculating total revenue, units and price. Well, what if, instead of iterating over the fact table, we iterated over the dProduct table? That means there's a unique identifier product. So as we iterated across this, for each row we'd see the price. Then all we need to do is access all the units. Well, guess what? If we did the sum of the units, that means we'd have to go through the relationship. Well, as it's iterating, the problem is there's no filter context. But if we wrap SUM inside of CALCULATE, then for each row in this table, which means product ID, unique identifier, that single product would flow into the SUM function as a filter. This table would be filtered down to just the units for that particular product ID, and then, of course, the SUM function would add. So we're going to try that alternative total revenue formula. Now, any time we try something like this, we want to test it by dragging it into our report and seeing if it calculates quickly, or we can go to DAX Studio and time, because sometimes, depending on the data and the data model, we can get a significant advantage by changing the calculation, like we're going to try. Other times, it's not such a big difference. All right. Let's go back to Data or Table View. Select Transactions. Up in Modeling, I want New Measure. I'm going to call this Total Revenue for the product table-- 630 rows, SUMX. And the table in the first argument of SUMX will be dProduct, significantly smaller cardinality. 630 rows, comma. We want to access the price column inside of dProduct-- times. And if we just did SUM of fTransaction units. But the SUM function does not have the ability to pull the row context from dProduct into the fTransaction table. No problem. We know by now in this class, CALCULATE can change the filter context. And for us, we're taking row context and converting it to filter context. Now, internally, inside of SUM, fTransaction units is filtered down to just the records for individual products from the dProduct table. And that formula, close parentheses on SUMX and Enter will give us the same answer as our total revenue over the fact table. We'll add some number formatting. We can see our two formulas over here. Let's go over to Report View, over to Visualizations. I'm going to use this multi-row card. Now we're going to come over and check and check. There are our two measures. Now, let's come over to the paint roller, Format, Data Labels, text size 15, Category, text size 15. Now, we can take this one step further. What if we took, instead of iterating over the product table, let's go look at the Product table. Here's Product. Look at that. There are tons of duplicate prices. Actually, if we go back over to Report View-- and from the Product table, I'm going to check Unique Count. Look at that. Even though there's 630 products, there's seven unique prices. Now, I constructed and created this whole data set just so we could illustrate. If there's a situation like this, then I think we could do one better. We can use VALUES on the price column and iterate over just the seven. By the way, Matt Allington, who's the author of three DAX books and a teacher and answers questions online, is the one who taught me these formulas here of how to go from iterating the fact table to iterating on the other side on the Product table. All right. Let's go back to Data, fTransactions. And I want to select that measure right there, copy it, Control-C, Escape, New Measure, Control-V, and I'm going to rename it. We'll call it VALUES Price 7. And for the table, we're going to use VALUES. And right after dProduct, square bracket. There's the price. Close parentheses. So now we've changed the cardinality significantly from 630 to just seven. When I hit Enter, add some formatting. Come over to report, and we're going to check total revenue. And look at that. They're all three of them the same. Now, we want to go and try and use DAX Studio to time these formulas. Now, unlike Excel where we have a button right inside of Excel, DAX Studio I'm just going to open up from either the Start menu or a shortcut. Because I'm not opening it from Excel, I don't see any option here. Here it is, Power BI Desktop, and it sees the file we're working on. Click Connect. We can see our measures over here, over here on the white the Editor pane. I'm going to type EVALUATE command, Enter, Tab, ROW, and then in double quotes, I'm going to call this 21 million, comma. And there's our expression. So I'm going to double-click, close parentheses. Now, I'm going to create the other two EVALUATE statements. We have our three EVALUATE statements. I'm going to change this to Clear Cache and then Run. Over in Traces, I'm going to turn on Server Timing. Over on Server Timing, we want to see the total time. I'm going to highlight the first EVALUATE statement. Click Run. I get 31 seconds. I'm going to click it again. 43 seconds. I'll go on to the next one. So that was the 21 million record. So now let's try the Product table with the 630 rows in our table. Click Run-- so about half the time. Click it again-- about half the time. Now let's try our VALUES on the Price. Highlight. Click Run-- almost the same as the previous one. I'm going to click Run again. So about the same-- these two are about the same, but both are about twice as fast as the fact table. Now, the truth is these are milliseconds. So there's probably not a big difference. But we can clearly see that when we change the cardinality on these two, they're going to run faster than if we're going over the full fact table. Now, I'm not going to save this DAX Studio. I'm going to close this and not save. Now, back over here in our Power BI Desktop file, I'm going to save this as the finished file. Now I'm going to close this, and we're going to open up a new file. Now, we're over in this Excel file here, and our last topic of discussion in this video is we want to talk about some of the different table functions. Now, we already talked about VALUES. But to start off, I want to compare and contrast the table function ALL and VALUES. Here's what we did earlier with VALUES. If I come over to the pivot table field, let's right-click, Edit. There's our CONCATENATE with VALUES. And VALUES will give us a unique list, but it sees the filter context. So that's why, down here for GG, it only sees two products, because in the filter context of GG, Sales Rep three, this table is filtered down just to the threes and VALUE sees just those particular products. But ALL does something totally different. If I right-click-- and I've already created this formula. I'm going to click on Edit. We did the same formula, CONCATENATEX, but instead of VALUES looking at product from the transaction table, I used the ALL function. And look what it does. It totally removes the filter context and shows all of the products for every single one of the cells. So right off the bat, VALUES and ALL, they both give a unique list. But whereas VALUES sees the current filter context, ALL removes all the filters from a particular column, in our case, product, or a complete table. Now, there's one other similarity between VALUES and ALL. I've created two pivot tables, Total Revenue. We could see our measure right there. But what I would like to do is see what happens when I drag Product from the Product table and then Product from the Transaction table. So I'm going to drag Product down to Values. And there it is, all the products and a blank. Now, if I come down to this pivot table, I'm not going to drag Products from the Product table. I'm going to drag it from the Transaction table. So I'm dragging it down to Rows. And what? There's no blank, but there's an Aspen. And there it is. Up here, Product from the dProduct table, I get blank 24,000. Down here, I get Aspen 24,000. This is what happens-- and we actually talked about this back in video number 15. There are four products in the dDimension Product table. Over here, if we create a unique list, there's five products. Aspen is an unmatched item in the relationship. That's why when we drag Product from the fact table down to this pivot table, it lists Aspen. But for this pivot table, when we drag Product from the dDimension table, because there's a missing item in the relationship, it shows a blank because it wants to have the correct total at the bottom. So blank gets 24,540. And as we mentioned back in video 15, I could have 20 products here that are not in the dimension table, and that one blank picks up all of the extra numbers for those unmatched items. Now, how does that come into play for VALUES and ALL? Well, I want to go over to DAX Studio. So I'm going to go to Add-ins, DAX Studio. And over here, there's no external filter context. But watch what happens when I evaluate VALUES on the dProduct table and run. I get four products and an empty cell. That only happens when there's a relationship and there's an unmatched item. All does exactly the same thing. Now, because there's no filter context out here, VALUES and ALL will deliver exactly the same answer. Now, if we change the column from dProducts to fTransaction, VALUES, when I click Run, it shows me all the items, including Aspen. ALL will do the same. VALUES and ALL can also be used on tables. Now, we're still out here in a situation where we have no filter context. So these will deliver the same things. But if I were to evaluate this-- I'm going to click Run. VALUES will just return the entire table. ALL will do the same. I click Run. It returns all the records in the table. Now, VALUES and ALL can have a table or a single column. ALL function, however, can have more than one column from the same table. So if I were to put sales rep ID and product from the fTransaction and run this, it will give me a unique combination of all of the sales rep IDs and product. Now, this is different than CROSSJOIN. CROSSJOIN can do it from different columns and different tables. But ALL, when I click Run, does it just for these columns in the same table. Now, if we compare and contrast VALUES and ALL, both of them create unique lists. VALUES sees the current filter context, whereas ALL removes the filters. Each will return one blank cell when there are unmatched items in the relationship. And if there's no filter context, they'll both return the same items. And you can put a column or a table into VALUES, whereas for ALL you can put a column, multiple columns from the same table, or a complete table. Now, as we will see next video when we study CALCULATE, VALUES and ALL are very common functions that help change the current filter context. Now, we want to go back over to DAX Studio and see how to get rid of the blank if we have an unmatched item in a relationship. Now, VALUES on dProduct product, because there's an unmatched item, shows an empty cell. But instead of VALUES, we can use DISTINCT. DISTINCT does the same thing as VALUES, except for when I run it or use it in some DAX formula, it does not return that empty cell for an unmatched item in the relationship. Now, ALL removes all the filters. So of course, it removes all the filters wherever it is. But it will show that empty cell. ALLNOBLANKROW-- well, it does exactly what it's supposed to. It removes all filters, and it will not return that empty cell when there are unmatched items in the relationship. Now, if you actually had an empty cell in one of these columns here instead of just an unmatched item, they would both return that because it would be considered one of the items in the unique or distinct list. Now, really, if we are building our data model smartly, we're not going to have an unmatched item. So most of the time, we're just not going to use DISTINCT or ALLNOBLANKROW. Now, there's a few other table functions we want to talk about. We already talked about CROSSJOIN. ALLEXCEPT-- that's a function we'll see next video when we learn about CALCULATE. But it takes a table and a column you want to exclude. And then ALLEXCEPT will return a unique list of records. Another important table function is called FILTER. You give it a table and a filter based off of a column in that table, and it returns a filtered table. And FILTER is not only a table function. It's an iterator, also. CALCULATETABLE-- that's like the CALCULATE function. It can change the FILTER context, but it returns an entire table. And then ADDCOLUMNS-- you give it a table, the name of the new column and an expression, and it adds a new column to that table. Now, we want to have an introduction to these three table functions in this video. And then in the next video when we talk all about CALCULATE, we'll talk a lot more about FILTER and CALCULATETABLE. Back in DAX Studio, let's see an example of the FILTER function. It needs a table. I'm going to fT to get my fTransactions, comma. And whatever filter we construct here based on a column in this table, FILTER will return a filtered table. I'm going to say fTransactions, square bracket, and we're going to do a test on the Units column. I'm going to say, please return all the records that are greater than 250. Now, when I click Run, and on the Results tab, we can see our Filter table. Now, the FILTER function is an iterator. It's not one of the X iterators that delivers a single item. FILTER is an iterator that returns a complete table. Now, another table function is not FILTER, but it's CALCULATETABLE. Now, the difference is that CALCULATETABLE can take a table in the first argument, comma. But we can put filters or conditions from anywhere in the data model. So for example, I could say dProduct, square bracket, and I'm going to look at a completely different column in a different table than fTransactions. And I'm going to say, hey, please only give me, from the dProduct Product column, the products that are named Quad. And so CALCULATETABLE is not an iterator like FILTER, and it can use any column that's in a relationship with fTransaction to do the filtering. CALCULATETABLE, just like the CALCULATE function, can see the entire data model. Then when I click Run, I can see I have filtered the table based on a column in a different table. Now, we'll talk a lot more about these two functions and CALCULATE in next video, including learning how to do AND logical tests and OR logical tests. For the time being, we just want to realize that these are both table functions. FILTER is an iterator and can filter a table based on a column. Look at that. Did I leave that out, fTransactions? That was a naked column. That violates our rule that square brackets alone represent a measure. So I want to definitely amend that and run it. So FILTER is an iterator, and the condition has to be based on some column in the first arguments table. CALCULATETABLE changes the filter context and delivers a table as a result. Now, ADDCOLUMNS-- ADDCOLUMNS. We need the table, the name of the new column, and the expression. I'm going to say fTransactions, comma, and the new column will be called Total Revenue, comma, and we'll use our measure, Total Revenue. Close parentheses. Now it'll take the entire fTransaction table and Date, Products, Sales, Sales Rep, and Units, five columns. So the resulting table will have six columns. There's our five columns and our total revenue. Now I want to go back over to Excel, and I want to run at least a couple of these formulas and remind you about the existing connections feature. Now I'm going to scroll over to the side. And in cell X13, go up to Data, Existing Connections. And what I would like to do is I'd like to use DAX code to look into the data model. And as we learned back in video 15, existing connections-- we can go to Table and pick any one of these. I'm going to pick a small one, dProduct, click Open. There's our Import Data. I'm going to click OK. That's the Product table, but it doesn't matter. I needed to connect somehow to the data model. Now, because I used existing connections, I can right-click, down to table, and there's a new option available called Edit DAX. I'm going to change this to DAX. Now I'm going to go back over to DAX Studio and I'm going to cheat because I need this same code. Control-C. It's much easier to create the code over here in DAX Studio because in this Edit DAX dialog box through existing connections, it's hard to type down here. I'm going to Control-V. Now when I click OK, I have a filtered table that is looking into the data model and pulling data out. Now, check this out. I have a hurdle right here, and I've converted it to an Excel table. I already imported this into the data model, and this is a great trick. This is how to get a variable from Excel into the data model. The only question is, how do I access that 250? Because maybe I want to use it in some DAX formula in the data model. But right here, I would like to use it as my condition for filtering this column. Then I could change this number in the Excel spreadsheet and refresh and pull data from the data model. Now I'm going to go back over to DAX Studio. And the way we do that is we use the VALUES function. Now, we saw how to use VALUES to return a unique list of items or a table without any duplicate records. But now, because I already have this in the data model, there it is. And there is a special feature about the VALUES function. Normally, it returns a table. That means the field name and all the record. But if it returns a single item and the DAX formula needs it as a scalar value-- that means a single value-- it will convert this to that single value. That means the 250. If I highlight this and click Run, it works exactly perfectly. Now I'm going to copy this, go back over to Excel, right-click, Table, Edit DAX. And down here, Control-V. Now it's connected to that cell right there. Click OK. If I change this to 200, it's not going to immediately update, because it's not like a formula. But I can come up to Refresh All or use my keyboard, Control-Alt-F5. And look at that. Now I'm pulling data from the data model based on a value from the Excel spreadsheet. Now, in the finished file, I did a number of existing connections, DAX queries to the data model, and you can come and look at those. All right. That was an epic video about DAX iterators and DAX table functions. We reminded ourselves about SUMX and AVERAGEX, and we iterated over a fact table. We reminded ourselves about context transitions, converting row contexts into filter contexts. We were careful with context transition when iterating over a fact table. And we saw that sometimes materializing unnecessary tables can really slow down DAX measures. We definitely learned how to use DAX Studio to time DAX formulas. We saw the context transition and double count problem. And we solved it using a formula rather than a measure, but we also saw that adding a primary key will fix that problem. We reminded ourselves about AVERAGEX iterating at the day grain on the dDate table. Then we learned how to use inside of AVERAGEX to get a month grain, VALUES, and CROSSJOIN and VALUES together. We talked about how reducing the cardinality of a particular iterator can help speed up calculations. And then we learned a lot about DAX table functions. And we finished it off by reminding ourselves about Excel's existing connections to pull data from the data model into an Excel spreadsheet. All right. If you liked that video, be sure to click that thumbs up. Leave a comment and subscribe because there's always lots more videos to come from ExcelIsFun, including next video, video 19. We'll learn all about the CALCULATE function. All right, we'll see you next video. If you like what you see in this video, click that Subscribe button and the bell icon to get notified about new video.
Info
Channel: ExcelIsFun
Views: 48,914
Rating: undefined out of 5
Keywords: Excelisfun, Highline College, Data Analysis, Business Intelligence, BI 348, Mike Girvin, Microsoft Power Tools for Data Analysis 18, MSPTDA 18, DAX Iterator Functions, DAX Table Functions, DAX Grain, DAX Granularity, DAX Cardinality, Materialize DAX Tables, DAX Studio, timing DAX Formulas, VALUES DAX Function, ALL DAX Function, CONCATENATEX DAX Function, SUMX, AVERAGEX, Day Grain, Month Grain, Learn DAX, Basic DAX, DAX Formula Lesson, Double Count DAX, Context Transition
Id: _DHeTnDa2Q4
Channel Id: undefined
Length: 67min 0sec (4020 seconds)
Published: Sun Dec 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.