- Yooo! Adam Saxton with Guy in a Cube and in this video, we are gonna break down why your Refresh may be taking awhile even if your data source
is fast, let's do this. (tech music) If you're finding us for the first time, be sure to hit that subscribe button to stay up-to-date with all
the videos from both Patrick and myself, sometimes I
just get things in waves. And I've had this
question a bunch recently where folks are asking why
is my refresh taking a while? My database is fast, what is going on? The refresh takes forever. So I wanted to dig into that
and just think about a couple of things that may be coming
into play, to be aware to where you can try and
optimize, or at least know that it will have a potential
impact on refresh time. Alright, enough of all that talking. Let's head over to my machine
and actually dig into this. I'll start off first, I've got a query that I'm going to, or a table
that I'm gonna be looking at from the Power BI desktop perspective. In this case, I just did a select star and I want to see how long this takes. But this is our baseline, right? At the bottom of this,
you can see 497,314 rows. And right next to that,
you'll see 52 seconds, it took to execute this query, right? That's our baseline. It took 52 seconds to grab this data. So now I've created a
Power BI desktop file. It's a template file. So I want to get a clean
slate, bringing that data and see how long this is
actually going to take. So I'll add in my parameter. And then let's hit load. What I want you to do is pay attention to this internet sales
where it says loading data to the model, right? 'Cause this is the same. This is not the Internet Sales2. This is Internet Sales. All right, so that's done 497,314 rows. And if we look down here Internet Sales2, taking longer, what's going on? Why'd the other one return fast? And this one's taking a
little bit longer to run. Now it's done with that data, but now it's still taking some time to run and then bam, we're in. Alright, so let's break
this down a little bit. We're gonna go to power query. There's a couple of things
to realize as part of this. The first thing to consider here is your actual data
source performance, right? So say it's SQL Server, if you're issuing a query, how long does that take to come back? That's the first thing I
do when I break this down. Let's get a trace. Let's actually see the queries
that Power BI is issuing. How long does that take to respond? And then maybe there's
some tuning you can go do on your data source side. So that's the first potential bottleneck. That is your data source. In this case, we know
that when we just issue that select star, it comes back in 52 seconds
for that particular table. That's our baseline. So now that we know the data source let's move onto the second piece of that. And that is the actual
power query transformations and what you're doing
from that standpoint. So what do I mean by that? The power query
transformations and whatnot. If we go look at Internet Sales2 and we go to advanced, there's a couple of things that are a potential warning sign here. The first here we're gonna
see query equals select star and the actual item. This is what's referred
to as a native query or you're actually
supplying the sequel query that's going to be issued
to the data source. Now this is the M syntax. So it looks a little scary. Let me actually go click on the source of the
gear next to the source so you can see this. So when we're connecting to something like SQL server or like another relational database if you expand this advanced options area you'll see the SQL statement where you can supply your own query. That is native query. This blocks what's referred
to as query folding. And so the idea of query folding, a term I coined in one of
our live streams is think about it like query squishing. So if I have all of these transformations, like maybe I'm renaming
columns, I'm removing columns. I'm doing a calculated
column of some kind, those get written into the SQL query. that's gonna go back into the data source. When I issue this native
query, that doesn't happen. So let's go back and look, and we can see here that
in our applied steps we've got our query, which is the source. And then we're doing all of these items. I'm changing types, I'm replacing values. And I actually also created, or maybe not in this one, but if you create a calculated column that will also take play in this. All of these transformations happen after we execute that, right? So that's something to consider for how long something is taking. So the more transformations
you do after that native query, that's adding overhead to what
you're actually trying to do. Typically what I recommend is if you are gonna use a
native query you want to do everything. All of your transformations
in that SQL statement. You need to manually push
it back to the source. Try to avoid doing
transformations after the fact. The third bottleneck is going to be what actually happens
Vertipaq engine itself. So we're getting all of that data and we want to
put it into the data model. So the Vertipaq engine's
gonna come into play. It's gonna do all of its
crazy stuff with compression and sorting and, and whatnot. But the thing to re%alize here is if you're doing any type of calculated columns
or calculated tables, those happen after all
of the regular table with like what you're getting
from your power query, it all happens after that. So that can add additional
time onto your Refresh. Let me show you a good example of this. What I'm gonna do here is I'm
just gonna enter some data. And if you remember, in a previous video I had where I looked at like auto date time and how that can really
bloat your actual model size. I'm gonna actually do that again here just to illustrate what this is doing. Okay, so what I've got here
is I have a date table, and I've got two columns here, start and end with four rows, right? One table, two columns, four rows that I manually entered the data. So there's not even a data
source that we're going out to. So I don't even have to worry about the data source performance. So let's hit close and apply and see how long this is going to take. Four rows. And I'm not gonna cut any of this. So you can actually see that
this is gonna take awhile. What it's actually doing is building out this a calculated table that's getting generated for
both of those two columns. That's that large local date table. And then bam, we're up. That took a while, that
wasn't like, insanely fast. That's because it had to build out those two calculated tables that have like three plus
million rows in each one. It's got to build that out. That's gonna take time. That's another thing to consider is if you're doing a lot
of calculated columns, or calculated tables, that is added overhead. There's other factors there too and it relates to the
structure of your data. So one thing I would definitely
recommend you do is pick up something like this book here, Definitive Guide to DAX, and I'll have a link in
the description below. There's great stuff at
the end of this book that talk about how the
Vertipaq engine actually works, how that processing works. So if you really want to understand that, that book is gold from my perspective, highly recommended. All right, I want to hand this off to you. What do you think? Does that help you understand
what's actually happening when your data is being
refreshed inside of Power BI? Let me know in the comments
below, I want to hear it. If you like this video, be sure to hit that big thumbs up button. Smashed, if you so desire. If it's your first time here,
hit that subscribe button. And as always from both
Patrick and myself, thank you so much for
watching, keep being awesome, and we'll see you in the next video.