Why is my Power BI refresh so SLOW?!? 3 Bottlenecks for refresh performance

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

- Yooo! Adam Saxton with Guy in a Cube and in this video, we are gonna break down why your Refresh may be taking awhile even if your data source is fast, let's do this. (tech music) If you're finding us for the first time, be sure to hit that subscribe button to stay up-to-date with all the videos from both Patrick and myself, sometimes I just get things in waves. And I've had this question a bunch recently where folks are asking why is my refresh taking a while? My database is fast, what is going on? The refresh takes forever. So I wanted to dig into that and just think about a couple of things that may be coming into play, to be aware to where you can try and optimize, or at least know that it will have a potential impact on refresh time. Alright, enough of all that talking. Let's head over to my machine and actually dig into this. I'll start off first, I've got a query that I'm going to, or a table that I'm gonna be looking at from the Power BI desktop perspective. In this case, I just did a select star and I want to see how long this takes. But this is our baseline, right? At the bottom of this, you can see 497,314 rows. And right next to that, you'll see 52 seconds, it took to execute this query, right? That's our baseline. It took 52 seconds to grab this data. So now I've created a Power BI desktop file. It's a template file. So I want to get a clean slate, bringing that data and see how long this is actually going to take. So I'll add in my parameter. And then let's hit load. What I want you to do is pay attention to this internet sales where it says loading data to the model, right? 'Cause this is the same. This is not the Internet Sales2. This is Internet Sales. All right, so that's done 497,314 rows. And if we look down here Internet Sales2, taking longer, what's going on? Why'd the other one return fast? And this one's taking a little bit longer to run. Now it's done with that data, but now it's still taking some time to run and then bam, we're in. Alright, so let's break this down a little bit. We're gonna go to power query. There's a couple of things to realize as part of this. The first thing to consider here is your actual data source performance, right? So say it's SQL Server, if you're issuing a query, how long does that take to come back? That's the first thing I do when I break this down. Let's get a trace. Let's actually see the queries that Power BI is issuing. How long does that take to respond? And then maybe there's some tuning you can go do on your data source side. So that's the first potential bottleneck. That is your data source. In this case, we know that when we just issue that select star, it comes back in 52 seconds for that particular table. That's our baseline. So now that we know the data source let's move onto the second piece of that. And that is the actual power query transformations and what you're doing from that standpoint. So what do I mean by that? The power query transformations and whatnot. If we go look at Internet Sales2 and we go to advanced, there's a couple of things that are a potential warning sign here. The first here we're gonna see query equals select star and the actual item. This is what's referred to as a native query or you're actually supplying the sequel query that's going to be issued to the data source. Now this is the M syntax. So it looks a little scary. Let me actually go click on the source of the gear next to the source so you can see this. So when we're connecting to something like SQL server or like another relational database if you expand this advanced options area you'll see the SQL statement where you can supply your own query. That is native query. This blocks what's referred to as query folding. And so the idea of query folding, a term I coined in one of our live streams is think about it like query squishing. So if I have all of these transformations, like maybe I'm renaming columns, I'm removing columns. I'm doing a calculated column of some kind, those get written into the SQL query. that's gonna go back into the data source. When I issue this native query, that doesn't happen. So let's go back and look, and we can see here that in our applied steps we've got our query, which is the source. And then we're doing all of these items. I'm changing types, I'm replacing values. And I actually also created, or maybe not in this one, but if you create a calculated column that will also take play in this. All of these transformations happen after we execute that, right? So that's something to consider for how long something is taking. So the more transformations you do after that native query, that's adding overhead to what you're actually trying to do. Typically what I recommend is if you are gonna use a native query you want to do everything. All of your transformations in that SQL statement. You need to manually push it back to the source. Try to avoid doing transformations after the fact. The third bottleneck is going to be what actually happens Vertipaq engine itself. So we're getting all of that data and we want to put it into the data model. So the Vertipaq engine's gonna come into play. It's gonna do all of its crazy stuff with compression and sorting and, and whatnot. But the thing to re%alize here is if you're doing any type of calculated columns or calculated tables, those happen after all of the regular table with like what you're getting from your power query, it all happens after that. So that can add additional time onto your Refresh. Let me show you a good example of this. What I'm gonna do here is I'm just gonna enter some data. And if you remember, in a previous video I had where I looked at like auto date time and how that can really bloat your actual model size. I'm gonna actually do that again here just to illustrate what this is doing. Okay, so what I've got here is I have a date table, and I've got two columns here, start and end with four rows, right? One table, two columns, four rows that I manually entered the data. So there's not even a data source that we're going out to. So I don't even have to worry about the data source performance. So let's hit close and apply and see how long this is going to take. Four rows. And I'm not gonna cut any of this. So you can actually see that this is gonna take awhile. What it's actually doing is building out this a calculated table that's getting generated for both of those two columns. That's that large local date table. And then bam, we're up. That took a while, that wasn't like, insanely fast. That's because it had to build out those two calculated tables that have like three plus million rows in each one. It's got to build that out. That's gonna take time. That's another thing to consider is if you're doing a lot of calculated columns, or calculated tables, that is added overhead. There's other factors there too and it relates to the structure of your data. So one thing I would definitely recommend you do is pick up something like this book here, Definitive Guide to DAX, and I'll have a link in the description below. There's great stuff at the end of this book that talk about how the Vertipaq engine actually works, how that processing works. So if you really want to understand that, that book is gold from my perspective, highly recommended. All right, I want to hand this off to you. What do you think? Does that help you understand what's actually happening when your data is being refreshed inside of Power BI? Let me know in the comments below, I want to hear it. If you like this video, be sure to hit that big thumbs up button. Smashed, if you so desire. If it's your first time here, hit that subscribe button. And as always from both Patrick and myself, thank you so much for watching, keep being awesome, and we'll see you in the next video.

Info

Channel: Guy in a Cube

Views: 38,828

Rating: undefined out of 5

Keywords: power bi refresh, power bi refresh data, power bi refresh performance, power bi refresh time, power bi, power bi dataset, power bi dataset refresh, power bi desktop, power bi desktop tutorial, power bi for beginners, power bi reports, power bi tutorial, power bi tutorial for beginners, query folding in power bi, vertipaq engine power bi, vertipaq power bi, power query, query folding power query, vertipaq, vertipaq engine

Id: nP-8_st1wWk

Channel Id: undefined

Length: 8min 9sec (489 seconds)

Published: Thu Sep 17 2020