- Yooo! What's up? This is
Patrick from Guy in a Cube. And in my last video
on incremental refresh I took you on a magic carpet
ride where I was showing you how to handle deletes
with incremental refresh. In this video, in this video I'm taking it to the next level. I'm gonna take you to infinity and beyond. Stay tuned. (upbeat music) If you find this for the very
first time, be sure to hit that subscribe button, to stay up to date all the videos from
both Adam and this guy. All right. So, a lot of times the main reason we use incremental refresh is because we have a lot of data and we don't want to
refresh the entire model every time we do a refresh. But the initial refresh does
have to pull all the data If it's a billion rows,
if it's 2 billion rows if it's a hundred thousand rows it's got to go back
and get all those rows. And sometimes, sometimes when you do that incremental refresh because of the volume of data and because of the overhead that all the process that power BI may do you may get a timeout
and you may be thinking to yourself, well, how do
I get around this time out? Well, that's what I'm gonna
show you in this video. Before we get into the details, there's three tools you're gonna need. If you're using SQL server as a source you'll need secret server
management studio to do some alters to the source,
using a different source. You need to have access to that tool. You'll need the ALM tool kit and you will need your XML end point to the workspace where the contents that data sets are deployed the data set or data sets are deployed. Yes. I said the XML points. So that does mean you
will need premium per user or you will need a workspace that's backed by premium capacity. If you don't have premium if you don't have premium per
user, still watch this video cause there's some things that
you probably just can learn of how we actually solve
problems here at guy in a cube. So enough of all this
talking, you guys know I like to do, let's do what
let's head over to my laptop. So this video is going to
take two paths, one path. We're gonna assume that you have access to the source and the other
path. I'm gonna assume that you don't have access to the source. I'm gonna start down the path assuming that you have
access to the source. So in my case, the source is SQL server. And so what you would do
with either one with either if you have access or don't have access you
design your model out test your measures, make
sure all the filters and the relationships and
everything works as expected. And then what you need
to do is you would go over to the source. And I like to abstract myself from the actual schema by
using views in SQL server. And so what you would
do is if you have access to this access to the
source, or maybe, you know someone that you can pay, you can dose out a little money,
who will make some changes or create some objects
for you in the source. Either way, they have access to the source get them to create a view for you or some object that
you can reference in power BI. That doesn't return any data. So like in my case, this is
the data that I wanna return from my fact internet sales,
but I'm adding a where clause that's gonna stop it
from returning any roads. And this is just temporary. This is just temporary. So we'll run this and then we'll go over. We'll go back to my model and then I'll publish it to Power BI. Once it's published to Power BI what I'm gonna do is head
over to the workspace. You can see my workspace
right now is backed by premium per user. And you would go ahead and
just refresh that model. And we'll stop right there
because after the refresh of both that's where the
paths will come together. So you do a refresh, you
do a refresh of that model and then we'll come back
and look at it later. Now, the second path that
we're gonna take is assuming that you don't have access to the model. What if you don't have
access like Patrick? I don't know anyone that will go and change the database for me. I don't have access to change the source. What do I do? Can I not use this method? Of course you can use this method. Let me show you. So what you do is after you just like if you had access to the source you go design your model,
test everything out make sure your measures and
all your relationships work. You go into transform,
open up the query editor and you add an arbitrary
filter to the model. So I'm going to do sales territory and an arbitrary filter that stops the source from returning data. So I'm just going to say equals. I'm gonna set a filter on this one. Cause I know there aren't any
values that will match zero. So I'm gonna click OK. And then I'm gonna close and apply and then I'm going to
publish them the model right after it does everything it needs to do. It'll grab my data. You'll see it's not gonna return any rows and I'm gonna publish it to the workspace. And then what I'm gonna do with that point is refresh both of these. So both of these will be refreshed. So I refresh the one. If you have access to the source and then I ask refresh the
one where I don't have access to the source. And then this is where, regardless of what path you're using,
you'll need to use SQL server. You grab your XML end point, you open up SQL server, you click
connect, choose analysis services pop your end point in, choose, you know active directory with the
password or with MFA and go ahead and get authenticated and
then look for your model. So you can see right now
I have several models in that workspace, but
look for your model. So I have two models here regardless of which path you take we're gonna verify that
the partitions are created. And what you'll do is expand
it out, prying the table where you have incremental
refresh configured, right click on it and choose partitions. If you using the power
query path where you're gonna apply the filter, make sure that filter is applied
to the table that hasn't, that you're configuring
incremental refresh against. I forgot to mention that,
but just remember that. So now once I do my incremental,
my initial refreshes you'll see that all my
partitions are created based on my incremental refresh configuration. And you can see all my roles as zero. And then if I go to the other one go to partitions and then
you'll see it pop open. And then all my partitions
will be created. And they're all zero. One thing that I need to point out is that when you configure
incremental refresh and you're using this
method to avoid the timeout. So if I go here and pop
up on incremental refresh you see that I'm using months
as my, for my partition but you can choose. You can test this out if
you choose years because of the volume of data and the
overhead of the processing it may still time out and you may need to get a little more granular. You may need to go down
the quarters, month, days. And so the configuration
that you choose is totally gonna depend on your data
and your environment. So you'll have to test these things out before you publish it and try
to get around that time out. But you need to test it
out in your environment. In my case, I'm using months. So once that's done then you need to do a few things before you actually can start pulling data in. So if you have access to the source all you're gonna do is, or if you know someone that
has access to the source you're gonna modify your view not to include the where clause
that eliminates the roles. So we'll do that. And so
now I have my rows there and now I can go and refresh
partitions on that one. So if I go over to this one that's based on that view right there this internet sales table
is based on this view. If I just go to partitions and open it up and then there's a little
like circle that's made out of these green arrows, if you click it what you'll see is gonna give
you a list of partitions. And at the important
thing here is the mode do not use default, do, go
ahead and do process data. Just go ahead and do process
data trust me on this. And then you can do one partition maybe your environment allows
you to do multiple partitions. I'm just gonna do a couple. And then you click OK.
And what's gonna happen in the background is gonna
start, it's gonna run queries it's gonna reach out to the source run some queries, pull the data
in for those two partitions. If you're using the,
and we'll let this run and we'll come back to
this in a few minutes if you are going down path where you don't have access to the source. You had to do your filter and power query. You gotta take a little different steps. And so what you would
do is you would click transform right here and that
arbitrary filter that you set on the table that has
incremental refresh configure, get rid of it, just get rid
of it and close and apply. Let it load up the data. You'll see it loads, however
many rows you got there. And then go ahead and click save. And now this is where you're
gonna use the final tool the ALM tool kit, go ahead and click external tools
and click ALM tool kit. It's gonna launch the ALM tool kit and you'll get a connection dialog window. You'll see it's connected
your power BI desktop pop the workspace in. The XML point for the workspace that contains that data set and click OK. You may get this window where
you need to authenticate just go ahead and authenticate. And so then we're gonna head over here. And before you do anything right here go ahead and click options. And just depending on what's selected. So I like to choose
these options right here cause I want it to retain
all my refresh partitions. I want it to display warnings
and yours made, you know, the process and option may
be default may be selected. I like to, just to make sure. And you've probably, you
can accept the default but just to make sure it
doesn't do anything else I choose do not process.
And I check this box for a process only affected
tables and I click OK. And then I do my compare. So I'm gonna go ahead
and do my compare again. Cause whenever you change it it has to do some different things. Whenever you change the options,
then go here select action and say, hide all the objects
that have the same definition. Cause I only care about the
things that are different. And then you'll see that
there's one row here for my internet sales. If you scroll down, you'll see
that the filter doesn't exist in power BI Desktop but it
does exist right here in the service where the,
the model is published and you may be thinking Patrick, well, why don't I just publish that up to power BI don't publish it up to power BI because it's
gonna overwrite what you did and it's got this filter. And if you try to do the refresh it's gonna do the initial refresh again. I needed to persist the initial refresh because now I only care about
my subsequent refreshes. And I'll show you that
towards the end of the video. So what you do now is I want
to do a schema only deployment. And fortunately the ALM tool kit is here and it's gonna allow me to do it. So if I go ahead and validate
my selection, I'll click OK. And then I'm going to say go ahead update and then say, are you sure? Yeah I'm sure. Now we
do this and now it says Look deployed metadata
success, metadata deployed. And now let's let it do another comparison and we shouldn't have any
differences at that point. So if we go ahead and choose
this, say hide everything. We don't have any differences. Now let's head back over
to management studio and we'll see that this finished
processing, which is great. We'll go on it. We're gonna just try the
refresh. There we go. Wonderful. And then let's go over to management studio one
more time and let's go to our the one that used the power query. Let's go here. Let's go to partitions. And let's just process two of these. Let's just process a couple of these and remember, remember,
change this to data. So regardless of which path you take if you have access to the
source or don't have access to the source at this point,
all you need to do is figure out the balance of what can you refresh. So in my case, I probably
need to refresh one in each individually, or maybe
I can do a couple at a time cause I've tested it out,
but you can do all of these. I'm not gonna do all of them cause I want to show you
something at the end. Go ahead and click OK.
And we'll let it process. But now you may be tempted
to go over to power BI after you've processed
data on all your petitions you may be tempted to go over to power BI and start building the reports or live connected at data centers. So our building silver reports, but wait let me show you some more. So on the one where I
had access to the source. If I go over to it, it's
this particular dataset. If I say, go ahead and create a report I'm gonna choose, create a report. I'm gonna choose a measure
or choose a measure. Where's my measure and boom. And you're like, all right, it's ready. If I go and choose a different column let's use calendar and let's choose month. We get an error. And this is a really good
error because what it says is pretty much the database
needs to be recalculated. And that's what happens when
you do a process calculate on the database it creates your relationships
and calculate columns. So when you do a process they don't want all your
partitions when you're done. The very last thing you
need to do is do a process recalc on your database. Let me say it again. So once we go over them, SSMS
SQL server management studio and do a process data on every partition before anyone can actually start building reports against the dataset. You need to do a process
calculate on a database. Let me show you how to do that. So before you can actually
do anything with it let's go back to SSMS and you'll see that
this processing is done on the one where I didn't
have access to the source. It'll do a refresh. You can see the rows. And if I head over to SSMS
and on either one, either one I'm gonna choose this one right click on the database itself. You'll see where it says process database. And then you'll choose. See where the mode is processed the fault. Look for process recalc click OK. It's gonna take up Cuppa could depending on the size of the database number of calculated columns,
number of relationships things like that. It
could take a little bit. This shouldn't take too long cause my model is relatively small in comparison to, you know, models with billions and billions of
rows processing in progress. And it's a success. Go ahead and click close. And this is not gonna bring
in any additional data. Cause I'm just doing a recalc
on the data that's there. But if I head over to power
BI to this particular one and I just click refresh, you'll see now that I have two months of data. Cause those are the two
partitions that I processed. Now the big question is what
happens if I do a refresh now because we don't wanna
refresh the entire model cause our incremental
refresh configuration policy says only refresh
the last month in my case as a recording of this video,
I mean, January of 2021. And that's the only day the only month that I should process. So let's see what happens. Let's see what happens. So what I'm gonna do is
head back to the dataset either one of these data sets and what I'm gonna do is just say refresh I'm gonna choose this one. I'm gonna refresh the data set and we'll go into the
settings for that dataset. Let's see, shouldn't take that
long to refresh the data set. Cause there's only
refreshing the latest month. Shouldn't only be
refreshing the latest month. Cause that's how I have
it configured here. All right, it's already done. That was really fast. Let's head back over to management studio and let's take a peak at the
partitions for that table. And what you should see is
only three partitions process. Perfect. The two that I manually did and only the most recent one. And I did it to show you that after you processed all those partitions it's not gonna go back and
process the entire model. It's only gonna process based on your incremental refresh
configuration pattern. All right. What do you guys think? Have you ran into this before? Have you got hit with
that dreaded time out are you ready to go to
infinity and beyond? I'd love to know if you
have any questions, comments you know what to do post it in the comments
below your first time visiting guy in a cube channel,
hit that subscribe button. You like my video, give me a big thumbs up as always from Adam, myself. Thanks for watching. We'll see you in the next video.