Increasing compression in Power BI and Analysis Services - Unplugged #15

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
ciao friends today i want to spend time trying to do extreme optimization invertebrate vertipack is the engine that runs the tabular model for analysis services and power bi desktop and i will show you what you can do using two settings which are the encoding hints and the available in mdx setting now what i'm gonna show can be used in production for analysis services and power bi the service using xmla endpoint however i will show you all the demos using power bi desktop but in order to do that i have to enable an unsupported feature because what i'm going to do is not supported for parabay desktop but is supported for the models that you publish on the service or on azure analysis services or on analysis services on-premises i will use power bi desktop just because it's easier to test and to see the differences i would also i would also use the tabular editor index studio and finally i will usually this is an unplugged video where we don't do any edit but because i have to refresh the model and a table that is particularly large several times i would probably squeeze that time so that you don't have to wait a few minutes every time i refresh the model ready let's start i prepared a demo here which is a parade desktop file with a contoso model which has the classical star schema we focus on the sales table that has two air miner rows so far so good we have probably you have seen other videos with the same model so let's take a look at how the table says is consuming my memory so if i go here advanced view metrics i'm gonna see in dac studio how did vertical analyzer shows the information about the model so let's start from the summary the summary shows that the model has a 750 megabytes which is a lot and the classical and simple optimization is to remove the columns we don't need which in this model are the online says key the order line number and the order number so you see that these three columns here can be removed and we will save a considerable amount of memory so we have 200 and something megabytes so let's do that i can just go to power bi desktop i can go here and i can say okay i want to get rid of this column or the line number order number and let's i hope i don't break everything here i don't know why it is not updating this this is funny so and let's go online says key delete from model here we go now once i do that uh hoping that the the the data is refreshed here so let's go back to dac studio and let's try to refresh the view matrix and let's see what happened so you see that we reduced the size of the model so we have only 540 megabytes which is not only it's a lot of memory but we got rid of the columns we didn't use before so now we can focus on why we are spending so much memory and you see that we have two columns unit price and net price that have a lot of memory you see this column size here and the reason for that is that they have a lot of unique values stored now the best optimization would be reducing the cardinality but let's assume we cannot let's assume that this is some metric about how we store number about numbers about the the the the revenue so we don't want to lose money we don't want to lose the information for the accounting so let's assume that we cannot round the number to the closest integer for example which would be the easiest way to reduce this so now we can spend our time looking at why we have so much sides in memory so this size is made by data dictionary and hierarchy size so let's start from the dictionary you see that the the data volume so i take for granted that you know how the vertipack engine works and this is explained in other videos is explained on our optimize index video course is part of the vertipac engine we also have a book about that an entire chapter discussing that so i have to take for granted you understand the difference between data dictionary and hierarchy size and i noticed that the dictionary is bigger than the data sides with which is expected but because i have so many unique values as cardinality the dictionary is going to be big now these columns contain just numbers and the vertebrate engine made a decision to use the dictionary i could try to see whether if i try to suggest to the vertebrate engine look these are numbers you could try to skip the dictionary usually the vertex engine makes the right decision maybe it didn't make the right decision this time so let's try to see what we can do so now i open tabular editor so i go here in sorry not here i go in parivedesto i go here and i open the tabular editor connected to this model and here if i read down in tables remember we're looking at two columns let's just make sure we do the right things so we go here and we go here so we have unit price and net price are the columns that i want to examine and so i go here so let me remove my face so we can see better net price has these settings and unit price has this setting so what i want to focus here so now i can move this here so i can show my face again and let's take a look at the two information i want to discuss the first one is the value encoding but as you see i don't have all the properties here that if you use the tabular editor or visual studio with uh models that you publish on analysis services you know that there are many more settings here but by default when you connect tabular editor to pyruvate desktop it doesn't show all the properties because they are not supported you cannot change them in a safe way so now we do this we go in preferences and we change a setting let's see where is the setting here come on okay so let's sorry i try to do this on another window here we go so this is the setting window and the wind the setting i want to change is this one allow unsupported power bi features experimental so when i enable that i'm saying to tabular editor look i want to see all the metrics uh all the properties and you see here that now i have this when i select here unit price i have these options that were not available before now once you set this setting as true i show you another thing if i go back to power bi desktop and i open tabular editor again you see that you have this dialog box this warning when you edit the power bi model use using tabular editor with this setting enabled this warning is telling you look you are using the experimental feature you are you have access to properties that are not supported if you if you change change them for a parabolic desktop model as i said this is not supported but for what i want to do i will probably do something that is relatively safe or at least will not crash the demo let's say that and i want to discuss now about a few options we have here so now we have let's just reduce this to a size that we can and xamine so now i think this is visible in the screen and if i go to the options here i have two settings and the first one is the encoding hint this one the encoding hint that we see here is set by d to default default means that we we asked to the vertebac engine to make the right decision basically now we know that the engine chose the dictionary which correspond to this setting so if i use as encoding in hash i would suggest look try to use the dictionary as the best way to compress this column we want to do the opposite we want to ignore the dictionary and try to do the compression using the value encoding of the vertebrate engine i apply the same setting here and i do this only on these two columns because i know that in these two columns the dictionary is particularly expensive because otherwise i should usually i trust the the vertical engine i just want to do this for the purpose of this demo so now i i modified these settings and i saved this now nothing happened now because this setting only applied to the next refresh data operation which is what i do now i go here let's go in the table window and i go in sales i right click and i refresh the data of the sales table in a few seconds for you minus for me this will be ready to use okay now the data has been compressed with the new settings and we can go back to uh dac studio that i open here this is the previous setting for the compression you see that we had this total here 560mhz in total for the entire table and 260 to 117 megabytes for each of the two uh columns remember only we have 4541 megabytes for the data of these two columns now when i refresh this when i click on view metrics again i will read the new settings so let's take a look at what changes when i do that so i open this and i notice that the dictionary is still here so actually my value encoding didn't have any effect and i have still the same uh situation why that because the problem is that i tried to use this for the double data type so actually i should have tried to modify the data type of the column before because with this data type usually my encoding hint could be ignored i mean it's not certainly sure but the the waiting coding type is evaluated is a little bit different when you have a decimal or um fix a decimal so the the decimal number which is the floating point we have now is actually enforcing the engine to use this this technique so let's try to do this so let's go to the net price here the net price column and you see that this is set this is stored using decimal number which is a floating point so now we can try to use this as a fixed decimal number and again i will squeeze the video because it takes a few minutes because as soon as i click this it will try to compress the data it didn't take too much but i do the same for the unit price because unit price has the same problem here so now i set this to fix a decimal number okay now i can go back to dac studio and see what happened now so i click on view metrics again and now you see that because i re-encoded the two columns where i changed the data type and i also had the constraint to use the value encoding i have this situation this situation shows that i reduced a lot the compression of these two columns now some time using the the the fixed decimal number would have been enough but as you see one explanation why we actually reduced the compression improve the compression reducing the memory is because we reduced the number of unique values why this happened the reason is that it's true that we had a larger number of unique values but remember we were using a floating point and so probably i had different numbers because of different imprecision that were not relevant for my numbers i'm actually storing in a fixed decimal number four digits after the decimal point which is more than enough for the values in u.s dollar i want to store so whatever is after that is just noise i want to ignore so i could have done this rounding in the data source but again when you use a floating point calculation like i had in my in my views it could be hard to do so i could cast the data type but or just choose the data type nevertheless when i have as a data type a fixed decimal number i could still have the dictionary chosen as a way to compress the data because sometimes the engine could could be you know not correct so no could not make the right choice because it depends on the way the algorithm works but long story short if i notice that i have a dictionary that exists in a table that has a very large number of nick values i could try to use the encoding hint to reduce the dictionary to zero this actually increased the data if you remember it was around 40 megabits an hour is 50 but because the table is not too large compared to the cardinality of the columns actually it's a benefit to to use this type of encoding but can i do more can i do something more well if i analyze the data i see that i still have an hierarchy size here which is consuming for 24 megabytes which is still more than 10 percent of the size of the table in certain conditions it could be much more and if i extend this analysis to the entire table actually it could be i could have other columns with a large consumption now i want to make the demo relatively easy and i want to show you what can i do and what is the side effect of doing the of doing something that will affect the hierarchy size so i can go again in tabular editor and i can say okay here i want to change the setting here available in mdx to false here so i change this to false let me just move this here so it's easier to to read and also this one now i can increase the size so you see that i chose available in a mix to false for unit price and i do the same for the net price here we go when i do that and i could do i could actually do this for all the columns of the sales table if you want but now let's do this only for these two columns again i save this nothing happens immediately unless i refresh the table sales again and again i will squeeze the video now to wait just a few seconds for you okay now we are ready to see the result of that so we go back to dac studio and we refresh the view metrics again remember we had the table size at 200 megabytes now and we have 24 megabytes for the hierarchy size i click view metrics and i see that i saved other 20 megabytes in the table size and i reduced to zero the hierarchy size for net price and unit price so this is on further increasing compression however we have side effects everything we do could have some side effect there was a reason why we had a hierarchy size there so let me show you what we lost by doing this change so if i go back to the visualization here you see that all the columns here are visible which means that if i open this model in excel i have these columns visible too now let me show you this so if i go to sales you see in this area so let me remove my video here so you see that we have in this list here we have date we have a sales at a certain point with all the columns that we have in the saves table now you see that we have all the columns here but we don't have we don't have any more the columns unit price and the column net price so the columns as individual columns are no longer visible in excel and this means that we cannot use the columns in a filter imagine you want to apply a filter saying i want to filter the unit price greater than 100 in excel you can no longer do that because we removed the visibility of those columns from mdx and excel talk with mdx to the tabular model whereas in the power bi desktop report i still have unit price here i could apply unit price and i could apply a filter here saying oh i want to say that this has to be greater than 100 oops something happened here okay greater than 100 i click it on the function and i apply this filter and this feature is applied and my report is refreshed now there is a small price to pay in power bi 2 because the lack of these additional structures slow down the filters applied index over those columns especially a filter like that but it still works so it's just something if you say i almost never use this column as a filter it doesn't matter however both excel and power bi can still use the column that is not visible in mdx within a dax measure so if i have a measure that is using the net price it's still working let's take a look we have a measure here called sales amount let's go to the demo you see that the sales amount measure here is using salesnet price and if i go back to excel the sales amount measure is visible here and can be used in my pivot table without any problem and of course i can browse the data by continent and country and so on and it just works i didn't lose anything in performance when i aggregate the column that is no longer visible to excel but the idea is that i have an effect so if i remove the setting from all the columns what can i get i could save other 240 kilobytes but i will remove these colors to be visible in excel which could be a good idea because these columns usually the colors in us in a fact table should be should be made invisible to the user to both power bi and excel and so probably i don't spend too much time but remember if you have a condition in your measures where you apply filters that can be applied using the sort order of the column you will lose something in performance when you remove the mdx the visibility in mdx so let's just go back to the properties in tabular editor because i want to remind you we have seen the effect of this setting available in mdx and encoding hint in the compression of your data the final result i obtained is that my model that was initially 750 megabytes is now just 170 77 megabytes and of course this could be very different in your model you have to try the techniques that i shown you because of course i cannot share the large database i have here it would not be much value to provide you a sample file in this case my suggestion you can use the technique that we have seen to make your test on a separate copy of your data in paragraph desktop because remember i showed you something that is not supported so because of that you could break your power bi desktop file so do this in a copy of your data just as a way to see whether you could get some improvement in the compression of the data that you can apply to the model you publish on power bi service or on analysis services using for example tabular editor through the xmla endpoint i hope you enjoyed this session remember you can see how the vertipack engine works in other videos in our book the definitive guide to that we have an entire chapter and we have an entire video course about optimizing dax where we explain more in detail why this setting have this behavior enjoy dax you
Info
Channel: SQLBI
Views: 4,378
Rating: undefined out of 5
Keywords:
Id: XHmiAItzuwg
Channel Id: undefined
Length: 22min 56sec (1376 seconds)
Published: Sat Apr 17 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.