Data modeling best practices - Part 1 - in Power BI and Analysis Services

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
yo what's up this is Patrick from Guyana Q and in this video I'm gonna talk all about data modeling best practices stay tuned okay data modeling best practices everybody should follow him everybody should think about them when they're building the data model for power bi and so the reason what sparked all this what sparked the video was Adam did a video recently on reducing the data size right reducing your data set size and he mentioned a couple of really good practices for a data modeling you should go watch that video before before you get into this so I'm gonna wait go watch the video hang on I wait for you okay you're back alright watch that video he provides these two good tips about you know high cardinality columns and turning off auto daytime you should absolutely do those there's some other best practices and other data modeling best practice used to follow I'm gonna talk about them right until the first thing the first thing I want to talk about in addition to the great stuff that Adam did in this video is avoid wide wide tables make your tables narrow you may be going Patrick well I have a wide table and it's working just fine for my data model my reports right now and you're right right it can't work I'm not saying you can't use wide tables but the problem with wide tables is as the data volumes grow that as it starts to get longer and longer I get more and more rows in that data model it could perfect affect the performance of your data model of your slicers and things like that another thing it can do is it can start to abuse memory it just beats it up and misuses it just misuse your memory bloat your model gets big and it just you know use this more memory than it really needs because you're duplicating all these values over and over and it does probably does a good job of compressing compression but as those table get gets wide and more columns I had it right it can really start affecting it okay and you go patrick mindette my data is not gonna grow so I'm gonna have to worry about that and you're right you're right have you you're that person all right you can probably probably get away with a wide table but let me ask you a question I'm actually question what if someone says hey I need to introduce another table that has different but very similar data and I want you to correlate the values across those two tables how you gonna solve that problem so we're not talking about the number of roles in your data you know in that flat table we're talking about introducing a new table how do you solve that okay so you guys know I like to do instead of all this talking Hetal in my lap so I'm gonna show you head over my lap so let's say you have this data model right here right and it's just one flat table this works it's perfectly you create a bar chart and you analyze sales across product it's pretty easy so let's do that so I'm gonna go here and I create a very simple report so am i wide factor in that sales table all right all these columns I get it all up and down all these columns I'm gonna choose English product name and I'm gonna create a nice little bar chart out of it all right I'm gonna create it to be you know specific a clustered column chart and I'm gonna add my sales amount my internet sales sales amount right there okay perfect this is what they need let's just pretend let's just pretend this is all they need and then they come back and say hey we have this new table that we want to introduce because I want to compare internet sales to reseller sales all right what are you gonna do how are you gonna correlate those values in a single chart well so you go over click this one then I go to my reseller sales and I find the sales amount there you can name this stuff better and clean it up a little bit right but I find the sells them out there and I add it what's wrong well you guys know what's wrong with dumb videos on this in the past there's no relationship between those two tables so you get this Cartesian product and it just duplicates the same value over and over and over how do you solve this how do you just make this simpler so you can do it you can create two charts right that's easy so if you look back on my two charts report you can see I have two different charts and maybe you can all work now you don't do that how do you solve this so what what somebody said to me recently was oh this is easy right all you need to do is what are you trying to compare where I want to you know look at this by-product so make a relationship between product key and product key because it exists in both tables now I'm gonna get this nice little warning it's gonna say hey this is a mini many unless right if you don't completely understand many how many to many relationships work you should totally avoid these right now did a video and it's a while ago you should totally avoid many too many but if you know how it works and know that it could possibly introduce unexpected results because of the data right I'm dependent on this relationship to handle all that and so if you don't understand how it works it could introduce some unexpected results and so you should avoid meaning too many unless you really understand how they work and really understand your data so if I click OK something else you're gonna know this is what it does is it's going to make it a bi-directional relationship something else you need to know something else you need to consider when you set up something like this but then it gets even more complicated it gets even more complicated and they go hey Patrick but not only do I want to look at this for a product but I also want to look at this by sales territory so say ok well that worked really well for me so I'm gonna go find my sells territory key and I'm gonna drop it right there BAM then you get the same warning if I click OK but now this relationship all the lines look a little different because it's an inactive relationship how do you handle that hmm some another complexity that you'll have to deal with so now my data model is complex my Dax is gonna be complex and now your performance is truly gonna be affected by this because you have bi-directional relationships you have some really complex taxes you have to deal with and accommodate every time you create measures with this right so maybe your data volumes won't grow and you know performance your performance won't be affected by large volumes of data but when your data model starts to get complex and you have to figure out how to handle multiple tables then you have all these new things that are introduced how do you handle these things how do you manage with these things well that leads into the next topic the next best practice so the first thing is try to avoid white tables for all the reasons I said before write for all those things the next topic though I mean it just leads right into it what you should consider doing it's creating these really focus and narrow tables and this write this for those that are familiar with this these are called a start schemas so you want to create a star schema so there's two things I want you to do there's some guidance document Adams gonna post a link in the comments below to it you should go read that guidance document it tells you all about star schemas but also I wrote a vide I wrote a video I did a video with my daughter a while ago flat filed today to model I think it was with her and you should go watch that video and it shows you how to take this flat flat table and create a data model this is if your if you don't already have a data warehouse right if you don't already have a star schema and so what you can do is take a look at my screen instead of having these very wide tables you see these very wide tables you can make them narrow and start pulling out the things that are really specific into look-up tables or dimensions so if you take a look what I've done here I have this product table and now I have my my two sales tables these will be called fat tables and they were contained you know summarization data and my dimension tables they contain you know data that I will use for grouping and filtering it the de star schema is optimized for reporting okay it's optimized for reporting so now what I can do using this model is easily alright go to a single product table not as many columns right now click my product name make my bar chart and then I can go to each one of these go to my sales table my internet sales - sales amount and go to my reseller sales - sales amount probably give it better names but now you can see that the correlation that just happens it works because I'm using a single table this is just great my star schema optimized for reporting optimize for grouping filter filtering a hat fill team filtering and summarizations or aggregations right it's perfect so let me show you right let me wrap this up let me show you what a huge a nice nice star star schema looks like so you can see here I have my star schema I have my internet sales table right there at my reseller sales table and now I have my really narrow tables that are focusing in my dimension tables that are focusing in on specific things like product sales territory customer geography and this is great and what you can do and so this makes it easy for reporting right because they're categorized into these little buckets I like to call them little buckets or look-up tables or the formal name dimensions they're categorized that way and I can easily drag and drop and build my reports but part bi provides you with this ability to create different views of data right and so let's say I want to just have a view of my internet sales I can Greg drag this table right here on the modeling view and then say add my related tables it automatically adds it maybe the product has more related tables and it continues to add it and so I can call this internet sales and then if I want to write I can do one for reseller sales and just repeat the steps and my related table and it just does it for me and rename it so now I can see the different views what it's related to I can look at it in a consolidated view on an individual view by my fact tables this is great right so two quick tips two quick tips remember try to avoid the wide tables because they can produce some inconsistencies and affect performance and then instead of using that look at using star schemas which are optimized and built for reporting your dimensions for grouping and filtering and your facts for you know summarizations and aggregations alright what do you guys think you got any questions comments you know what to do post it in the comments below this is your first time visiting the guy in the cube Channel hit that subscribe button if you liked my video your big thumbs up as always from Adam and myself thanks for watching see you in the next video
Info
Channel: Guy in a Cube
Views: 301,031
Rating: undefined out of 5
Keywords: data modeling, kimball data modeling, power bi data modeling, power bi data modeling best practices, data model, data modeler, data modelling training, data warehousing basics, data warehousing tutorial, database design, logical model, physical model, power bi, power bi desktop, power bi star schema, power bi star schema vs flat table, power bi tutorial, power bi tutorial for beginners, star schema
Id: kiVXI7zjSzY
Channel Id: undefined
Length: 10min 5sec (605 seconds)
Published: Wed Aug 28 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.