How do you even use VertiPaq Analyzer with Power BI???

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

yo what's up this is Patrick Samara and in this video we're going to show you a practical use of the verdict analyze it with Harvey I stay tuned so Marco I did a video not too long ago I'm further back analyzer it's really popular video but I got questions okay Patrick this is cool it's nice and integrative um what do all these numbers mean how do you know how do I use all these numbers to really help optimize my model the first thing I can do okay first 3/3 the first thing we can do is just to reduce the sides of power bi file okay which also ties the performance and you'll probably refresh quicker faster faster and you also can improve the performance of your major okay you can show yeah so wait then we all know Paul is talking head over to Marco's oh haha so here is the the file I'm gonna use this canto so a large file is that 150 megabytes I already opened this file and this is the Reaper I have what is interesting here is that if I take a look at the say the number of rows in the safe stable you see that we have a 12-minute rows here wow that's a pretty substantial amount and you say maybe 250 megabytes for 20 rows is not bad it's not bad let's take a look at where we're spending our memory okay I would say our money because actually I bought my laptop and you don't want to abuse your memory you don't want to be just wasted right you I'm going to be really efficient with the user yeah yeah the first thing I can do I can open Docs to do give download install you already shown this yes I already showed so I connected my model here it's very easy you have to open power bi then you connector and second thing you can to go in the Advanced tab that you have to be you have to enable using the option screen yep and you click on view metrics when you click on View metrics you see these a wonderful numbers here and you see that now I just enlarged a little bit the colors because you see we have a lot of memory used here so you see that there are several color so the first thing is okay what are these columns this cause basically tell you what is sides of the table for example gotcha and what is the sides of the corners of the table because part of the side of the table depends on some structure like the relationship that is not actually in the columns yep but you see that most of the data is made by columns so and one of the things that the X was despite megabytes kilobytes bytes by memory is in bytes okay so but you have the you know thousand separator so you can actually see we have around 340 make up got it which is bigger than the side we have seen before why because the PBX file is a further compression of the data compressing memory now if I treat down I can explain you better what we have here as you see I sorted the the the content by the side of the column stable and the columns table is the amount of bytes that every column is consuming within a table because the data is sorted by column so every con has a price has a cost right now the cost column is in reality the sum of other two metrics which are the data and the dictionary size sorry any hard side okay so the data in hierarchy size makes up that column the dictionary heuristic and you send them together you have this number okay that's a deal yep now the dictionary size depends on how many unique values you have because there is a list of the unique values in the column unless you have value encoding here will you have a small dictionary because basically the columns doesn't have a dictionary just store the data in a color guard and if there are multiple rows close each other with the same value that it can be compressed of course if you have many unique values the compression is very low cut it now the hierarchy size is another structure that is used for MDX queries you may say actually per pet doesn't need that well but if you use analyze X and you could use it and it also is also used as an index for system operation but I don't want to go too deep sure sure the thing is the more unique values you have in a column the bigger are these structures data hierarchies and diction sure so the sum of these three elements is what want to be care I want to be I want to save this dismembered III if I realize for example the online cesky has 12 million of unique values karen'll it economies the car integrity of each column how many unique ladies you have and this number is identical to the number of rows of the table is there an indication of a problem yeah because every row has a different number yes but the real problem is do I care about this color how do I use this column in my data in my analysis report probably not right so I can just move it so if I go here and I say Oh online say is key delete from the model yeah because you don't need it yeah you don't need it the best optimization remove what you don't need remove what you don't you heard what Marco saying I've been saying this forever if you don't need it the only important now if I go back here yeah if I click on view metrics again now we refresh the view Gibbs of course and ice - I saw this by combs again and now it is the second problem the order number you're the number cuz that every order has two three four rows and your enamel is again maybe you want to see the order around maybe what if I do want to see it no problem you keep it yep but you're paying 32% of the sides of the entire database or just one column so I ask you do you really want this number maybe I don't know it's your money right is there money you say well so now we remove the order number here and the order line number and I just this one here we go if I now go back to the studio yep and I refresh new matrix again here we go now you see that if you remember we started with from 350 megabytes why we remove the ready 200 megabytes good job right yeah very good job now let's see what happens next cuz that if you follow the the suggestions at particle perfect said in other videos yeah you should be ready here right now you should be yeah we can still do something okay now look at that the colors that that color on it is more expensive is line amount line amount is 11 percent of the entire model we already made a good optimization here it is dead now 10% of these sides it just one column let's take a look this column is line amount let amount is a calculated column this is computing quality by net price mm-hmm row by row row by row why so this is abusing my memory this is I mean you want to do this calculation yeah right you want to do this copulation but why do I necessarily need it here let's see okay the reason why you have this is because maybe that you moved the the column line amount in the report or maybe you just created like I did here a sales amount measure that is just the sum of column an implicit measure is always the sum of culture and I measured that the same yep but the measure if I write a measure I have a better way to do this I can simply write sum X use an iterator says where I multiply the point by the net price which is the former I had in Lam online them now I'm moving this logic in a measure okay and the advantage of this is that now I no longer need the line amount column thank you rid of it completely get rid of it yeah because I can I can compute this on the fly and by the way I can do the same for the second column that is the more expensive one let me just to check this which is actually it's not the it's not the second one but it's very close line cost here has the same structure is that is used for the total cost measure so I remove this again so I go here and I change the total cost now changing the measure is just a step required before removing the column because a firm of the column I imply bracket I don't want to break yeah yeah Sam axe Sasa quantity x net sorry buddy cost unit cost this one and I remove this mm-hmm as you see I make these changes and my Reaper continue to work yes now I go back here in my table and I remove this so you're removing the calculated columns replacing them with measures before I replace yeah I I use the measures in the report first read rule number one yeah use measures not implicit you know don't don't just move the common scenarios measure so you can control what you're doing second I make sure that the measures depend on Connors dead now I'm not gonna remove and then I remove in this case these were calculated colors but if I had this calculation empirically which could have been a good eight yes however I can remove them from the power of credit transformation because I don't need that so I remove this column and now I go back to Dax to do and i refresh again now at this point we moved if you remember who is around one hundred forty megabytes here and now we are one hundred sixteen megabytes we saved other twenty meters which is not much effusive but we start with three hundred yes but after first optimization yes we still remove that twenty percent thirty percent the updater yep you know at this point I could say okay you know what it's Sifton point you reach a point where you say I need these columns yep I don't want to remove that but that's fine now I say the file that I have here and I go back to the Explorer the file explorer yeah you just have to wait for it to finish yeah that's the compressor and if you remember we had that 250 megabytes now 50 megabytes and we just in maybe 10 minutes explaining everything we made a wonderful tuna dish we removed 80% of the initial size of the model so we got rid of columns we don't we didn't mean yeah right because we can use other ways disability law scholars that have a lot of unique values yes high cardinality correct right they have high cardinality and then we found replacements cuz we probably shouldn't be using calculated columns because we can use some X or but the same would have been true if I imported those columns so the problem is not that the calculate column by itself yes our aim is that if I have two computer the sales amount and the sales amount can be computed by summing one colleges or by summing the result of the product between two columns I try to store in the model those columns that have the smaller Cardinal yes because this is what enables me to get a better compression got it so what do you guys think right this is amazing now we have a practical use of the vertic analyzer you know how to use it go how optimize your models if you have any questions comments for myself or Marco you know what to do post them in the comments below it's your first time visiting on the cube channel hit that subscribe button if you like my video your big thumbs up as always from Patrick micro thanks for watching and we'll see in the next video off

Info

Channel: Guy in a Cube

Views: 20,508

Rating: 4.979764 out of 5

Keywords: power bi, power bi data modeling, power bi data modeling best practices, power bi dataset, power bi dax, power bi for beginners, power bi performance, power bi performance optimization, power bi performance tuning, power bi training, power bi tutorial, power bi tutorial for beginners, dax measures power bi, vertipaq analyzer power bi, vertipaq power bi, dax, dax measures, vertipaq analyzer

Id: m34IOHcNPPo

Channel Id: undefined

Length: 11min 59sec (719 seconds)

Published: Wed Nov 13 2019