2 ways to reduce your Power BI dataset size and speed up refresh

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
no sack some guy in a cube is your model too big is your report kind of sluggish and too slow is your refresh taking too long have I got the answer for you introducing the new revolutionary magic model powder introducing magic model spray maybe puppers drool can help you with your model come on guys who we're kidding there's some work to be done let's get into it if you're finding us for the first time be sure to hit that subscribe button to stay up to date with all the videos from both Patrick and myself Oh model performance and refresh and all this stuff I've been working with some customers and we found some great tips and tricks for how you can actually go identify these items and I wanted to share a few of those with you there's obviously a lot of things you can do to improve performance I just want to cover a few things just based on some customer interactions and I think will be really helpful for you and I honestly I see it quite a bit in a lot of stuff alright enough of all this talking let's head over to my laptop and actually take a look all right we are looking at my report inside a power bi desktop this is based on the contoso retail DW sample database and I modified it a little bit by cranking up the number of rows so it's way bigger than the default sample typically when I hit a power performance issue I asked them to send me the power bi desktop file because I want to see what it behaves or how it behaves inside of power bi desktop the first thing I go do is I open it up I add a blank page in this case I put some text on here this page left blank intentionally this is so I can avoid the visual cache and once I add that blank page I'll save it close it reopen it then what I want to do is go to the View tab and turn on performance analyzer this should be your first stop let's just see how things run so I'll hit start recording and then I'll go over to the actual report page and then this will go and do its thing and render those visuals so okay they came up you know took a little couple seconds on the it looked like a couple seconds I'm mr. spinny and if we go and look at some of these visuals we'll go in and look and we'll see we can look at the DAX query and the visual display and none of these items are really like huge we do see a good chunk of other but there is just you know background threads and stuff inside of power bi desktop not a lot we can do to improve that outside of maybe reduce visuals but looking at Dax and visual everything looks okay nothing that really sends alarms to me and so from this standpoint you know the report performance is kind of okay let's pretend that they actually came to us and said hey the real issue is you know my refresh is slow it may be that report performance is slow as well and you may see like hi Dax query numbers or things of that nature that's signal that maybe we've got some data model issue or Dax optimization that we can go do and so from a refresh perspective if we could jump over to the service and look one thing we'll see is we'll see some areas here we're still see two failed messages and I'll just close this so you can see the amount of uncompressed data on the Gateway exceeded the limit of ten gigabytes I was pretty pretty big model pretty hefty do some model optimizations or jump over to power bi premium to give you a little more breathing room and then going back to that refresh history looking at the times we can see it took a while to even get this error so it ran for almost an hour I can tell you refreshing and power bi desktop this model is about an hour hour and a half to finish all right so what do we do in this case I'm going to use two tools to dig into the model to see if we can improve something so the reason I want to go that way is because if refresh is taking a long time or report visuals are slow a lot of times it's a model optimization exercise and or you know optimizing Dax itself there's a lot of stuff you can do in terms of optimizing to get things to perform way better all right so let's start with Dax studio so both of these tools Dax to do and very PAC analyzer are available for free over at sequel VI comm shout out to Marco and Alberto and when we're in Dax to do we're gonna want to select PBIS SDT model choose the drop-down and choose our power bi desktop file or model it connect and then what we're gonna see in the lower right is localhost calling port number so this is our connection string to our model armed with that we can go to Verta PAC analyzers let me jump over my files Verta PAC analyzer is just an excel file but it comes pre-configured with a bunch of stuff and so what we want to do here is you may have to enable some things because there's some custom scripts and whatnot but once you're in go to the data tab manage the data model this will take us into PowerPivot for excel and then what we need to do is go to existing connections we're going to edit that s SAS connection hit build and then this is where we're going to put in that port number so come in go and hit OK or we need to choose the drop-down to make sure we choose our data model go and hit OK save close and then under refresh go ahead and refresh all pulled in all the relevant data from the data model it's going to close we're go ahead and close powerpivot and if we go back to our excel file we'll see all the information that is in our data model so this includes tables cardinality all sorts of useful information for us one item jumps out at me right away and that is all of these local date table tables so this is because auto date time is enabled in the model and if we look some of these are you know not insignificant like they've they've got some heft to them bunch of them are pretty small this is the first tip I'm gonna have for you which is disable auto day time you should have a date table in your model that can handle all of your date slicing and dicing needs so let's go back into power bi desktop real quick we'll show that we'll go to file options go to options there's gonna be two items here first under current file if you just want to do it for this one power bi desktop file you can go to data load under current file and uncheck auto daytime your other option is from a global setting perspective you can actually turn it off for all of your power bi desktop files I actually recommend unchecking this item under time intelligence Auto date/time for new files so in this case just from a demo perspective I'm going to uncheck Auto date/time from time intelligence before I hit ok first things I want to check out here is the file size of the power bi desktop file for reference so if we go to properties we will see that as 592 Meg 621 million bytes right so it's pretty good size model lets keep tabs on that number 5 ninety-two Meg so let's go in and we will hit okay can also stop the performance analyzer here let's get rid of that and then let's go ahead and save the file having Auto date/time enabled in your models every date field in your model is gonna have a hidden table underneath it that supports those hierarchies the default hierarchy so year quarter month day and it just adds overhead for your model especially if you have a really big model with a lot of dates right back in let's go back to verdict and that this is done because we can just say refresh all and once that's done BAM all the local day times are gone which is great so now we just have our main tables then the other thing we're gonna do is check the file size real quick and see where we're at alright so we can see it was you know six hundred and twenty-one million now it's six hundred and twenty million so size did reduce wasn't as drastic in this case I did have one personally where the file size immediately dropped twenty Meg on it so that was pretty insane and I've heard of others where they reduced the file size by fifty percent just by unchecking that checkbox so definitely check it out if you want us to try and recoup some size and you're not even using them just have a central date table you will thank me later on alright so that's the first thing I saw once I got rid of that the other thing that stood out to me the first thing I go look at is cardinality right so I want to see where that high cardinality is because that's gonna hurt in terms of compression and how the Verta pack engine actually works and when we look at Vertica analyzer one table screams at us which is the sales column and we see 25 million there and my initial thoughts when I'm looking at this is well we probably have an identity column that's just got a bunch of unique values in it right so ID columns are used for relationships and this one is actually probably we have an ID field on the sales table that's individual unique entries and there's 25 million of them and I can tell you there's 25 million rows in this table so that makes sense and you'll also see that this table in general this is where almost all of our size is coming from suite see total column size we can see data size we can see column hierarchy size as well also see dictionary size so these are all factors that play in from avert attack engine perspective what's going to expand that and see what we got and we look at all those columns in there that is a mess and so let's go look for our guy here and that is online sales key right so we said it's probably a key of some kind that's being used what I've done before is we say okay this is a key column do we need it that's your first question do we need this ID column the biggest answer with any column when I asked do you need it is well you know I may need it later or a you know the business told me that we've got to have it all right do you actually need it though is it it's taking up a lot of space what is the benefit of this column and if we actually go back to our power bi desktop file let's go to the data side and we'll look at the sales table the other thing we'll see here is boom this is our order ID and then we've got our online sales key right so the online sales key so we've got order ID online sales key and we've got sales order number right so there's a lot of identifiers in this table order ID also has got looks like a text string too so that's a that's a bad sign let's go back to verdict analyzer here real quick all right so that's our so we've got those two and then the other one we saw was the sales order number which is the third largest in terms of cardinality and whatnot so these are all really interesting so one thing we can do here this online sales key let's go back and we'll look at our data model from a relationship perspective we can let's go in and look at look at the relationships that are being used right so we've got one in here and let's go down that's gonna be currency so this is using currency key we can see that on the currency side if we go down here this is using customer key up on here this is product key so this is our product dimensions or unlikely yet product and then we've got a few other relationships here one is a product subcategory label and then also product category label and then this one's our date key and then there's another dimension I know that those are all the relationships we have so nothing is referencing order ID nothing is referencing online sales key and nothing is hitting our store whatever that other ID field was let's go back so our sales order number our online sales key and our order ID so there's no relationships there at all so why do we need them are we actually going to report on them one thing I've heard from folks is well you know I want to use this column to get a row count no no you don't need that field to do a row count I can do count rows and it will just count rows right it doesn't matter what columns they're alright so with that let's clean this up a little bit right so let's get rid of we will go into our edit queries will hit our sales table and we'll notice this is a little sluggish again this is hitting a very large table so your experience in the power query editor may vary and what we're waiting for this to come up let's go back to where to pick analyzer because one of the things I'm going to show is that we can remove these columns from a power query side of it right so we're just not even going to pull them in but this is also an opportunity to look at what else can we get rid of and one thing we'll see here is that there's a lot of fields in here this is an exaggerated example just because of the fact that what I did is I flattened all the tables in the contoso retail database and I just stuck them all in here right so I just have one giant table there's like over a hundred columns it's insane I've actually seen customer examples though where we have these really wide tables with a lot of columns and typically what I find is folks don't necessarily do the work to really pick and choose what are those columns that we need to bring in a big big big thing you can do to improve performance is don't pull in columns you don't need this ID columns are an example of it so if you're not doing relationships on them get rid of them you don't need them you're not gonna report on those the only one I could probably argue on is that sales order number like because maybe that or some sort of invoice ID like there may be some detail tables where you do want to represent that or to be able to link off to something else that may have to do with whatever you're trying to do from a report perspective so but just think about what's actually going to be used and don't pull them in it's much better to maybe pull them in later than it is to have them in early and not be able to get rid of them out to the facts so do take that time to go through your columns and figure out do I really need this in my table so what I'm gonna do is actually go and remove all the columns from this table that I'm not using from a reporting perspective and then we'll see how this affects our actual both our refresh time and the size of our data or power bi desktop file in general all right it's done so let's go up to choose columns and remember we're going to remove all these columns that we don't want so let's go through here a couple columns I'll point out here right away so one is this ETL load ID so this is used for your data warehouse right for your ETL operations for loading that back-end I don't need that in my report I'm never gonna report on it probably so let's go and remove that we'll leave title and check because we're gonna come back and uncheck all of these guys and then the other thing I'll come down down below you're gonna see load date and update date so these are two dates again these are used from an ETL perspective now you may want update date if you're doing something like incremental refresh and you can refer back to Patrick's video on how you can do that but that may be something that may actually be useful in your model but if you're not doing incremental refresh or this is just gonna be really small anyway don't bring those in right you don't need these they're just taking up space you're never gonna report on it let me go through and just uncheck all these real quick a few moments later alright that's all the columns I'm gonna go and hit OK alright it's done and look at this this is just amazing right now I said before there's a lot more things that we can do here but all I wanted to do is just remove these columns get rid of the auto date/time stuff as well and then we're going to hit close and apply and it will go ahead and refresh that table and pull everything back remember 25 million rows and it's gonna go back now and pull all that back in based on the change that we made but just like a cooking show cuz this will take a little bit I've already got a model up where I did all those changes and we can see over here if we go to sales a lot smaller which is great let's minimize that and we'll look at the file size here look at the difference there you know around 600 Meg down to 74 Meg that's same like look at that it's it's like just lost almost all of it just because of those columns that we didn't even use primarily those ID columns that we were using in this model that we had no one used for so getting rid of those cleared this up a lot and I've seen this at customers too where we just removed that one column and it just drastically improves the model size as a result the Refresh times a lot faster too so let's go back over to the service then we'll compare so that was looking at the the large one you know it took almost an hour if we go back and look at the refresh history here lay that about eight minutes that's insane same model we just removed the unused columns and bam refresh is faster model sizes a lot lower and we are potentially improving report performance as well because the engine doesn't it can it's optimized more to handle the load the data that's actually in the model again lots more you can do from a performance and optimization standpoint these are just some quick wins that you can get that I see in almost every model that I look at with customers so definitely take some time look at your model and see what you can use vertic analyzers a great tool dak Studios a great tool to help you do these things check out the new guidance documentation out on the power bi documentation that talks about star models and why that's important as well as data reduction techniques all of those links down in the description below for you alright I want to pass this off to you what do you think are there other techniques that you see a lot or you do with your models that are great leave them down in the comments below I know there's a slew of items and I'd love for you to share those down below so other folks can see that as well and just add more to this video if you like this video be sure to hit that big thumbs up button smash it if you so desire if it's your first time here hit that subscribe button and as always from both Patrick and myself thank you so much for watching keep being awesome and we'll see you in the next video
Info
Channel: Guy in a Cube
Views: 184,557
Rating: undefined out of 5
Keywords: power bi, power bi dashboard, power bi dashboards, power bi dataset, power bi desktop, power bi desktop tutorial, power bi desktop tutorial for beginners, power bi modeling, power bi modeling best practices, power bi refresh, power bi reports, power bi training, power bi tutorial, power bi tutorial for beginners, power bi videos, ms power bi, bi, microsoft bi, business intelligence
Id: c-ZqToc85Yc
Channel Id: undefined
Length: 17min 23sec (1043 seconds)
Published: Thu Aug 22 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.