Handling MULTIPLE fact tables in Power BI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Yooo! What's up, this is Patrick, from Guy in a Cube, and in this video I wanna talk about what do you do when you have multiple facts in your data model? Stay tuned. (upbeat music) If you're finding us for the very first time, be sure to hit that subscribe button to stay up to date on all the videos from both Adam and this guy. Okay, you've connected to your data warehouse and you look at the schema, or you building out your own model against, you know, some other source, and you've realized that you have multiple tables that contains your additive values, things like sales amount, and the quantity sold, and the number of enrollments and things like that. But you have multiple tables at different grains, and you need to bring them in at different grains. How do you handle those in your model? What do you do to make sure that you can report against them effectively, while at the same time not affecting refreshes or introducing complicated DAX, or missing values and things like that? What do you do? So in this video, that's what I'm gonna talk about. I'm gonna start by talking about two things I see people do that they should try to avoid. And at the end, I'm gonna show you what you should do. And a lot of you guys probably are already looking and go maybe I know, but I'm still gonna watch the video. Alright, so enough of all this talking, you guys know what I like to do, let's do what, let's head over to my laptop. So the first thing I see people do is that they duplicate dimensions in their model, let me show you what I'm talking about. So let's say for example, you have this model that has reseller sales and internet sales, and you wanna report against them. And you have some common things across those two tables like date, you can see there's a date, and then a product key. Instead of trying to figure out how I can relate one table to both, what I see people do is this, let's go to this layout and what you'll see is they'll have a the fact table, and then there they'll have a calendar table for reseller, but this particular fact table and a product table for this fact table. And then for the other fact table for my internet table, they'll go to Internet Sales, and then they'll have a product table for that one and an Internet Sales table for that one. But why are you doing this? Do you realize one, you're increasing the size of your model? Because now you have these duplicated tables? And number two, you could potentially slow down refreshes, because now it's more tables that you have to refresh that wanna refresh. That's two things. Number three, number three is a little more, you know, something you might not even think about. Let me show you. So let's say you have a requirement to create an element that displays both internet sales and reseller sales. Alright, so that works, but you wanna look at it annually, by year. Hmm, so I have these two calendar tables in my model, which one do I use? Do I use Internet Sales Calendar, or Reseller Sales Calendar, I'm gonna use Internet Sales 'cause it's first in the alphabet. Hmm, something's wrong, what's wrong? Well, if I look at this from a table perspective, you can clearly see what's wrong. It's repeating the total amount of reseller sales over and over and over again, it's because there's no relationship between that measure, Reseller Sales, and the Internet Sales Calendar table, what you need to do is, take a quick copy of this, what we would do is get rid of Reseller Sales here, get rid of Internet Sales here and here and take the year from the Reseller Calendar table. Now you can see that I have these two values in two separate tables. But what if you wanna put them in the same table? How would you do that? You can't with this model. So there's three problems with this model, you're introducing more data, so you're bloating the size of your model, you're increasing the refresh time, potentially increasing the refresh time, and now I can't correlate values on the same axis or using the same dimension. I just can't do it because they're across two different dimensions that are completely unrelated to the corresponding table. So what do I do, right? So then people get really creative. So number one, don't duplicate your dimensions. The second thing people do is they create what I like to call the consolidated fact table, they pretty much stack facts on top of each other. So if I have Internet Sales and Reseller Sales, they'll create a view or use power query to append the two tables together. Let me show you what I'm talking about. They'll create what I like to call the consolidated fact. And so if I go over to my data view, you'll see if I go to my all sales, first they consolidate them and during that consolidation, they'll add a type to each individual fact table before the consolidation, so when I consolidate it, I can distinguish between those two types. So now I can see I have my Internet Sales here, and if I sort this the other way, I have my Reseller Sales, and if you're paying attention, you'll notice that I have blank values. That's because there's no corresponding sales order number or sales order line number, yada, yada yada promotion key, currency key, for reseller but if I look at Internet Sales, there are corresponding promotions and sales order, but there's no employee key. So now I gotta deal with these blank values. If I start reporting and blank starts showing up in my reports, my end users, my report consumer is gonna go, what the french toast is this? Get rid of those blanks and you try to do some filtering. There, you can get creative with DAX to omit 'em, but why even introduce that problem? The second thing is, now you got this huge table that can take a long time to refresh, you don't wanna introduce, you know, something that could potentially cause problems in your models like increasing your refresh, right? Another thing is that when you write your measures, you will have to add an additional filter on the measures, not that this is a huge problem, it could cause a huge problem if you start if you have a lot, a lot of data in your model. But you'll have to do this for every measure, every time you wanna create a measure that is corresponds to Internet Sales, you'll have to add this filter. Every time you wanna create one that goes to Reseller Sales, you'll have to add this filter, if you don't, if I have a table that looks like this, and I just do a sum of total sales, and I add it here, what's gonna happen is I'll get this blank value I'm like why why do I get this blank value? Well, then they'll go, oh, Patrick, I can use this cell type and what I can do is use it as a filter, and then I can toggle between Internet Sales and Reseller Sales, now my Internet Sales is gone. You can add this on other things in the in reports also. But why jump through those unnecessary hoops? Somebody tell me why would I do that? You wouldn't, you don't need to do that, right? So duplicating dimensions, bad idea. Consolidating fact tables, another bad idea. Patrick, what should I do? It's really simple. If you have someone that's building a data warehouse, just follow the schema that you get they gave you, they're probably a really good data warehouse developer, that's why you hired 'em. And so use the data warehouse that they develop. A lot of times I see people with data warehouses, and then they adopt these patterns I have no idea why if your data warehouse doesn't do it, then you need to probably get a better data warehouse developer, or at least shape the model a little better. And let me show you how you shape your model. So let's go to this model. Let's create a new layout. So I have two fact tables. I have my Internet Sales, and I have my Reseller Sales, they both have a date, what do you think I should do? Use a consolidated date table that's related to both. The same thing for product, a central product table as a one-to-many relationship with both. Any dimensions that's just specific to that particular fact table like employee, for example, there should only be a relationship between Employee and Reseller Sales. If I do that consolidated table, then it's related to, it doesn't matter, it's gonna relate to Internet Sales and Reseller Sales, because they're in the same table. Now, what I can do is, remember the visual that I was creating earlier, if I drop Internet Sales and Reseller Sales here, and then I use my centralized calendar, boom, all right, now I can see that I was selling some stuff through resellers for a couple of years, and then it just completely fell off. But now you may be thinking, well, Patrick, how are they gonna know what to, you know, filter and aggregate by, because I could also make the mistake of bringing my employee name and drop in my Internet Sales right here and there's no relationship. What I like to do is I give my report consumers and my authors, my report authors a little hint and on those corresponding tables, I write a little description that says, hey, you can group by these tables. I put it on the table, and I put it on the measure. So now when they're over in the Report View design, in the report, they do a quick hover, and they'll say "Oh, wait, I can't use employee against none of these tables. Oh, let me hover over reseller I can for reseller." Okay, okay, great idea, so I'm gonna get rid of internet sales. I probably shouldn't be looking at this anyway. And it just works. So use the model, how it was designed. Use your conformed or shared dimensions, whatever you wanna call them, connect them to both facts, and then you can correlate to values in a single visual. Okay, what do you guys think? Are you running, have you ran into anything like this before, you solved it a different way? I love to know. You know what to do, let's continue the conversation where? In the comments below. It's your first time visiting the Guy in a Cube Channel, hit that Subscribe button, like my video, big thumbs up. As always, from Adam and myself, thanks for watching. See you in the next video.
Info
Channel: Guy in a Cube
Views: 75,204
Rating: 4.931406 out of 5
Keywords: power bi, power bi basics, power bi dashboard, power bi data modeling, power bi dataset, power bi demo, power bi desktop tutorial, power bi for beginners, power bi training, data modeling power bi, introduction to power bi, what is power bi, business intelligence, data modeling, data warehouse, data warehouse concepts
Id: TnyRsO4NJPc
Channel Id: undefined
Length: 9min 2sec (542 seconds)
Published: Wed Feb 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.