Avoid the full refresh with Incremental Refresh in Power BI (Premium)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Yooo! What's up? This is Patrick from Guy in a Cube. And in my last video on incremental refresh I took you on a magic carpet ride where I was showing you how to handle deletes with incremental refresh. In this video, in this video I'm taking it to the next level. I'm gonna take you to infinity and beyond. Stay tuned. (upbeat music) If you find this for the very first time, be sure to hit that subscribe button, to stay up to date all the videos from both Adam and this guy. All right. So, a lot of times the main reason we use incremental refresh is because we have a lot of data and we don't want to refresh the entire model every time we do a refresh. But the initial refresh does have to pull all the data If it's a billion rows, if it's 2 billion rows if it's a hundred thousand rows it's got to go back and get all those rows. And sometimes, sometimes when you do that incremental refresh because of the volume of data and because of the overhead that all the process that power BI may do you may get a timeout and you may be thinking to yourself, well, how do I get around this time out? Well, that's what I'm gonna show you in this video. Before we get into the details, there's three tools you're gonna need. If you're using SQL server as a source you'll need secret server management studio to do some alters to the source, using a different source. You need to have access to that tool. You'll need the ALM tool kit and you will need your XML end point to the workspace where the contents that data sets are deployed the data set or data sets are deployed. Yes. I said the XML points. So that does mean you will need premium per user or you will need a workspace that's backed by premium capacity. If you don't have premium if you don't have premium per user, still watch this video cause there's some things that you probably just can learn of how we actually solve problems here at guy in a cube. So enough of all this talking, you guys know I like to do, let's do what let's head over to my laptop. So this video is going to take two paths, one path. We're gonna assume that you have access to the source and the other path. I'm gonna assume that you don't have access to the source. I'm gonna start down the path assuming that you have access to the source. So in my case, the source is SQL server. And so what you would do with either one with either if you have access or don't have access you design your model out test your measures, make sure all the filters and the relationships and everything works as expected. And then what you need to do is you would go over to the source. And I like to abstract myself from the actual schema by using views in SQL server. And so what you would do is if you have access to this access to the source, or maybe, you know someone that you can pay, you can dose out a little money, who will make some changes or create some objects for you in the source. Either way, they have access to the source get them to create a view for you or some object that you can reference in power BI. That doesn't return any data. So like in my case, this is the data that I wanna return from my fact internet sales, but I'm adding a where clause that's gonna stop it from returning any roads. And this is just temporary. This is just temporary. So we'll run this and then we'll go over. We'll go back to my model and then I'll publish it to Power BI. Once it's published to Power BI what I'm gonna do is head over to the workspace. You can see my workspace right now is backed by premium per user. And you would go ahead and just refresh that model. And we'll stop right there because after the refresh of both that's where the paths will come together. So you do a refresh, you do a refresh of that model and then we'll come back and look at it later. Now, the second path that we're gonna take is assuming that you don't have access to the model. What if you don't have access like Patrick? I don't know anyone that will go and change the database for me. I don't have access to change the source. What do I do? Can I not use this method? Of course you can use this method. Let me show you. So what you do is after you just like if you had access to the source you go design your model, test everything out make sure your measures and all your relationships work. You go into transform, open up the query editor and you add an arbitrary filter to the model. So I'm going to do sales territory and an arbitrary filter that stops the source from returning data. So I'm just going to say equals. I'm gonna set a filter on this one. Cause I know there aren't any values that will match zero. So I'm gonna click OK. And then I'm gonna close and apply and then I'm going to publish them the model right after it does everything it needs to do. It'll grab my data. You'll see it's not gonna return any rows and I'm gonna publish it to the workspace. And then what I'm gonna do with that point is refresh both of these. So both of these will be refreshed. So I refresh the one. If you have access to the source and then I ask refresh the one where I don't have access to the source. And then this is where, regardless of what path you're using, you'll need to use SQL server. You grab your XML end point, you open up SQL server, you click connect, choose analysis services pop your end point in, choose, you know active directory with the password or with MFA and go ahead and get authenticated and then look for your model. So you can see right now I have several models in that workspace, but look for your model. So I have two models here regardless of which path you take we're gonna verify that the partitions are created. And what you'll do is expand it out, prying the table where you have incremental refresh configured, right click on it and choose partitions. If you using the power query path where you're gonna apply the filter, make sure that filter is applied to the table that hasn't, that you're configuring incremental refresh against. I forgot to mention that, but just remember that. So now once I do my incremental, my initial refreshes you'll see that all my partitions are created based on my incremental refresh configuration. And you can see all my roles as zero. And then if I go to the other one go to partitions and then you'll see it pop open. And then all my partitions will be created. And they're all zero. One thing that I need to point out is that when you configure incremental refresh and you're using this method to avoid the timeout. So if I go here and pop up on incremental refresh you see that I'm using months as my, for my partition but you can choose. You can test this out if you choose years because of the volume of data and the overhead of the processing it may still time out and you may need to get a little more granular. You may need to go down the quarters, month, days. And so the configuration that you choose is totally gonna depend on your data and your environment. So you'll have to test these things out before you publish it and try to get around that time out. But you need to test it out in your environment. In my case, I'm using months. So once that's done then you need to do a few things before you actually can start pulling data in. So if you have access to the source all you're gonna do is, or if you know someone that has access to the source you're gonna modify your view not to include the where clause that eliminates the roles. So we'll do that. And so now I have my rows there and now I can go and refresh partitions on that one. So if I go over to this one that's based on that view right there this internet sales table is based on this view. If I just go to partitions and open it up and then there's a little like circle that's made out of these green arrows, if you click it what you'll see is gonna give you a list of partitions. And at the important thing here is the mode do not use default, do, go ahead and do process data. Just go ahead and do process data trust me on this. And then you can do one partition maybe your environment allows you to do multiple partitions. I'm just gonna do a couple. And then you click OK. And what's gonna happen in the background is gonna start, it's gonna run queries it's gonna reach out to the source run some queries, pull the data in for those two partitions. If you're using the, and we'll let this run and we'll come back to this in a few minutes if you are going down path where you don't have access to the source. You had to do your filter and power query. You gotta take a little different steps. And so what you would do is you would click transform right here and that arbitrary filter that you set on the table that has incremental refresh configure, get rid of it, just get rid of it and close and apply. Let it load up the data. You'll see it loads, however many rows you got there. And then go ahead and click save. And now this is where you're gonna use the final tool the ALM tool kit, go ahead and click external tools and click ALM tool kit. It's gonna launch the ALM tool kit and you'll get a connection dialog window. You'll see it's connected your power BI desktop pop the workspace in. The XML point for the workspace that contains that data set and click OK. You may get this window where you need to authenticate just go ahead and authenticate. And so then we're gonna head over here. And before you do anything right here go ahead and click options. And just depending on what's selected. So I like to choose these options right here cause I want it to retain all my refresh partitions. I want it to display warnings and yours made, you know, the process and option may be default may be selected. I like to, just to make sure. And you've probably, you can accept the default but just to make sure it doesn't do anything else I choose do not process. And I check this box for a process only affected tables and I click OK. And then I do my compare. So I'm gonna go ahead and do my compare again. Cause whenever you change it it has to do some different things. Whenever you change the options, then go here select action and say, hide all the objects that have the same definition. Cause I only care about the things that are different. And then you'll see that there's one row here for my internet sales. If you scroll down, you'll see that the filter doesn't exist in power BI Desktop but it does exist right here in the service where the, the model is published and you may be thinking Patrick, well, why don't I just publish that up to power BI don't publish it up to power BI because it's gonna overwrite what you did and it's got this filter. And if you try to do the refresh it's gonna do the initial refresh again. I needed to persist the initial refresh because now I only care about my subsequent refreshes. And I'll show you that towards the end of the video. So what you do now is I want to do a schema only deployment. And fortunately the ALM tool kit is here and it's gonna allow me to do it. So if I go ahead and validate my selection, I'll click OK. And then I'm going to say go ahead update and then say, are you sure? Yeah I'm sure. Now we do this and now it says Look deployed metadata success, metadata deployed. And now let's let it do another comparison and we shouldn't have any differences at that point. So if we go ahead and choose this, say hide everything. We don't have any differences. Now let's head back over to management studio and we'll see that this finished processing, which is great. We'll go on it. We're gonna just try the refresh. There we go. Wonderful. And then let's go over to management studio one more time and let's go to our the one that used the power query. Let's go here. Let's go to partitions. And let's just process two of these. Let's just process a couple of these and remember, remember, change this to data. So regardless of which path you take if you have access to the source or don't have access to the source at this point, all you need to do is figure out the balance of what can you refresh. So in my case, I probably need to refresh one in each individually, or maybe I can do a couple at a time cause I've tested it out, but you can do all of these. I'm not gonna do all of them cause I want to show you something at the end. Go ahead and click OK. And we'll let it process. But now you may be tempted to go over to power BI after you've processed data on all your petitions you may be tempted to go over to power BI and start building the reports or live connected at data centers. So our building silver reports, but wait let me show you some more. So on the one where I had access to the source. If I go over to it, it's this particular dataset. If I say, go ahead and create a report I'm gonna choose, create a report. I'm gonna choose a measure or choose a measure. Where's my measure and boom. And you're like, all right, it's ready. If I go and choose a different column let's use calendar and let's choose month. We get an error. And this is a really good error because what it says is pretty much the database needs to be recalculated. And that's what happens when you do a process calculate on the database it creates your relationships and calculate columns. So when you do a process they don't want all your partitions when you're done. The very last thing you need to do is do a process recalc on your database. Let me say it again. So once we go over them, SSMS SQL server management studio and do a process data on every partition before anyone can actually start building reports against the dataset. You need to do a process calculate on a database. Let me show you how to do that. So before you can actually do anything with it let's go back to SSMS and you'll see that this processing is done on the one where I didn't have access to the source. It'll do a refresh. You can see the rows. And if I head over to SSMS and on either one, either one I'm gonna choose this one right click on the database itself. You'll see where it says process database. And then you'll choose. See where the mode is processed the fault. Look for process recalc click OK. It's gonna take up Cuppa could depending on the size of the database number of calculated columns, number of relationships things like that. It could take a little bit. This shouldn't take too long cause my model is relatively small in comparison to, you know, models with billions and billions of rows processing in progress. And it's a success. Go ahead and click close. And this is not gonna bring in any additional data. Cause I'm just doing a recalc on the data that's there. But if I head over to power BI to this particular one and I just click refresh, you'll see now that I have two months of data. Cause those are the two partitions that I processed. Now the big question is what happens if I do a refresh now because we don't wanna refresh the entire model cause our incremental refresh configuration policy says only refresh the last month in my case as a recording of this video, I mean, January of 2021. And that's the only day the only month that I should process. So let's see what happens. Let's see what happens. So what I'm gonna do is head back to the dataset either one of these data sets and what I'm gonna do is just say refresh I'm gonna choose this one. I'm gonna refresh the data set and we'll go into the settings for that dataset. Let's see, shouldn't take that long to refresh the data set. Cause there's only refreshing the latest month. Shouldn't only be refreshing the latest month. Cause that's how I have it configured here. All right, it's already done. That was really fast. Let's head back over to management studio and let's take a peak at the partitions for that table. And what you should see is only three partitions process. Perfect. The two that I manually did and only the most recent one. And I did it to show you that after you processed all those partitions it's not gonna go back and process the entire model. It's only gonna process based on your incremental refresh configuration pattern. All right. What do you guys think? Have you ran into this before? Have you got hit with that dreaded time out are you ready to go to infinity and beyond? I'd love to know if you have any questions, comments you know what to do post it in the comments below your first time visiting guy in a cube channel, hit that subscribe button. You like my video, give me a big thumbs up as always from Adam, myself. Thanks for watching. We'll see you in the next video.
Info
Channel: Guy in a Cube
Views: 24,019
Rating: 4.9582248 out of 5
Keywords: incremental refresh, incremental refresh in power bi, incremental refresh power bi, power bi incremental refresh, incremental load, power bi refresh, power bi refresh data, alm toolkit, alm toolkit for power bi, microsoft power bi, power bi, power bi desktop, power bi desktop tutorial, power bi for beginners, power bi premium, power bi premium per user, power bi training, power bi tutorial for beginners, power query, sql server management studio, ssms
Id: 5AWt6ijJG94
Channel Id: undefined
Length: 15min 41sec (941 seconds)
Published: Wed Jan 27 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.