Understanding memory used by Power BI - Unplugged #7

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
ciao friends in this unplugged video i'm going to explore tools we can use to diagnose performance issues in power bi that could be caused by the lack of ram on your machine power bi consumes memory but you have to understand how this works and we have to explore a few concepts in windows to understand counters we can easily retrieve from task manager to understand which part of power bi is causing problem and what we can do to to solve them or at least how how and where to start during this video i will try to record everything in a single shot but because i will perform a potentially dangerous operation at a certain point in case i crash everything i have to stop the video and restart the system point but otherwise we will try to do no edit and no cut because this is the unplugged video so let's start with one example and let me show you here a power bi file i have nothing strange here what is interesting of this file is the following first of all the five sides of this pbx that already open i didn't want to wait just to open a file is one gigabyte and 435 megabytes so 1.4 gigabytes more or less now when i load this file in memory you you clearly have to consume ram and when i do some operation i could consume more ram too now how can we know which is how much ram are we consuming the way to do that the easiest way to do that is to use a tool that is called task manager the task manager is a tool that every windows system has and when you open it here on my windows 10 machine you see that by default it shows just the list of the high level tasks that you have available just because you can end them but in reality if we want to know more we have to click on this more details button and here one option i set is this always on top with which is a setting that stops me from doing the zoom because if i do the zoom locate what happens this window is still here you see the zoom there is confusing but this allows me to see what happens when i do something in the windows behind so when i execute something i want to see the task manager always on top now you see that just because i clicked on more details i can see that each app that is running on windows is displayed here with a list of items a list of elements which are called processes now the process in windows is the minimal unit where windows allocate resources and power bi runs using many processes there are several technical reasons for that but one of them is that when you run powerbit desktop you are actually running a complete client server system what does it mean let's go to the whiteboard to describe this a little bit better so here we have a situation where we have i mean we launched one file one executable called power bi desktop right this is what we made this top right my i am no longer able to write manually with a pen but this should be more interactive so let's try now when i run parabidesto i see a window right and this window is actually managed let's say the external process is actually a process called power bi desktop here but this is the process that actually has the icon the yellow icon let's let's do here this is this yellow icon this is the process of parabolic so if i go back one second to the demo machine you see that i have a file here i have a process here which is power bi desktop but in reality in order to display the content and do the evaluation and perform the evaluation this process uses other processes uh what does it mean well at the bottom we have analysis services even though i don't have analysis services on my machine what happens is when you run purvi desktop a local instance of this service is running on your machine this is a analysis services engine which is responsible of dax vertipack and so on what does it mean here when we load our model remember we had a 1.4 gigabytes file the pbx file actually contains several things one of them the largest one is the data compressed in memory that you import is a copy of this data and the data that was on disk on our file has been moved in memory so we know that the engine the analysis service engine has to consume some memory we will see this in a minute in a minute but when we see the report we also have other elements here actually more than one let's draw a few of them these processes are the browser subprocess browser subprocess what does it mean the user interface of power bi is mainly html5 and javascript and if i go back to this visualization you see that there are a number of processes here which are responsible for different parts of the ui that you see in power bi you have the report you have the diagram view you have many other elements the ribbon so there are processes that have to manage this which is the same thing that happens when you display the report in a browser so this is the reason why powerbi use this technique because they can share the code for the user interface between the experience you have when you display a report in a browser and the experience you have when you run power bi desktop there is something else what is this max mashup evaluation container well when you run your windows in power query and you import data and you execute the code the m code for power query you have here another element called the mesh up engine which let's say is our engine for power equity okay this is uh we could rename it to power query so let's say that this is a m power query and this is the this one is dax it could be also mdx right because when you run a query in excel this is the part of the engine that does the same so dax and dx everything is executed there each of these processes consume ram and when and how they consume ram could be different because of the action you do in power bi now let's see what happened now that we opened the file we just opened the file we didn't run any report what happened now if i go back here you can see that for every process we see two important numbers the cpu consumed and the memory as you see power bi desktop so the line with the group a parabolic desktop is simply the sum of the cpu of all the processes that are part of barbados the sum of all the some of all the memory for the processes and when i drill down i can see where i'm consuming more memory now you can clearly see that analysis services is consuming double the ram of the size of the file we had at the beginning and this is normal because the pbx file has a more compressed version of the data that we use when we load the data memory so the data memory is compressed but not at the level that the pbx file is and this trade-off is required because the memory the data memory has to be compressed but also fast to scan to provide a response to the query now the thing is that this is the minimal amount of memory i need because as soon as i do some other operation i will consume other memory and so we will see this number changing right we can do that i can move this here and i can show you that let's uh reduce this a little bit and let's go here and here even though we don't see this but we have here a column called order number and the table from which i use this column i can show you here is this table that has 125 million rows right so we are quitting a table that has 125 million rows and we are asking for the order number which is a column here which has 1 million 663 000 unique values now if you think for a moment just putting the order number in the rows here created a table that has one linear rows one million and six hundred one million six hundred thousand rows even though i see only a few of them right i see only maybe a few hundred throws here but when you scroll down you scroll down you see that this this um waiting time is showing that something is happening and you can see here something is moving cpu is consumed but the ram the memory didn't change much by just scrolling down so let's try to include here a measure i prepared a few measures like the quantity when it is just the sum of a column and i put the quantity here in this visualization what happens wait in time i have to wait but not too much you see that i already have my quantity displayed here right and once again if you look at the ram if you look at the memory it didn't change much now i can continue i can include the amount the amount is a measure that has just the sum of one column multiplied by another column for every row of the 127 million rows table still fast no changes in memory so far so good but now i have a measure that is not written well this measure let me highlight this measure one moment this measure is uh iterating 120 meters table and for each row is computing another measure this triggers a counter transition it is a particular expensive operation doing this for so many rows is expensive not only for the cpu but also for the ram look at what happens when i move when i sorry when i click this here so you see that this is spending time it is waiting you see the cpu here is spinning up no sorry i probably adjust the chain oh okay let me do this again i probably i didn't i didn't uh click the right thing so if i click here and i added this measure to the visual now this is causing not only analysis services to consume cpu but also is increasing the ram and this is an important information because the consuming ram i mean on this machine is not an issue guess what i have plenty of ram available not an infinite amount but i have 32 gigabyte of ram and what would have happened if i didn't have gigabyte but only 16 gigabyte now i'm still waiting but believe me you will wait more now i'm still waiting and there is still memory allocated but at certain point this stabilizes and the performance at certain point provide the answer you see that now the memory uh jumped down so it was up then down it is too long to explain what happens but the message here is when you run a query you might need temporary memory to create temporary tables to complete the computation now why the other two measures didn't need that again because depending on the dax measure this could change there are certain techniques index that are more efficient than other and i use the very bad one just to show you what could happen for the memory now everything worked so far but there is another element that i had to explain in advance before showing you the next example which is the following and let me go back to the white board my machine has so let's go let's move this here and let's see if i can draw this in another way okay my machine has 32 gigabyte of ram this is the ram i have however the way most of the operating systems work and in particular the way windows works is that every process has the illusion of having the entire memory of the machine available actually more than the memory that we actually have on my pc so let's call this this is the physical memory physical memory which is also called ram right random access memory but this is the physical memory of our machine every processor has the illusion of having a virtual memory much bigger than the physical memory usually what does it mean in windows if you run a 64-bit application the process image that there are many more gigabytes available let's say i don't remember the exact number but let's say one terabyte what is one terabyte one terabyte is let me delete this 1024 gigabytes 124 gigabytes and how is it possible that a process has more memory more visual memory than the memory available the physical memory available on the machine well the operating system performs some magic to do that but before i show you the magic the problem is that this is true for power bi desktop then we have let's say let's simplify i skip a few things i have analysis services here and analysis services has another 124 so let me move here so you can see it 124 gigabytes one tera one tera oh you we need two terabytes of ram to provide the entire memory so actually the operating system knows that not the memo not the entire memory will be used and so what is happening now if you think about what we have seen so far my analysis services process at a certain point requested let's right here let's write this here so requested let's say four gigabytes just to load the the data it was 3.5 let's say four just to to simplify and when analysis services requested this memory the operating system windows said oh you know what i have four gigabytes available here we go i have your four gigabytes so we can imagine that this was the memory the different memory mapped into the physical ram no problem so far but then analysis services requested other memory and in the meantime other processes for example this process could have asked for two gigabyte and we have other two gigabit and so on right so imagine that you have other processes using memory sooner or later windows will not have enough physical ram to satisfy all the requests coming from the processes now when this happens you might see out of memory error but most of the times windows tries to you know try to find a workaround and the workaround is well if analysis services now ask me for example 12 gigabytes this is not exactly what what happened because we had the additional 12 gigabytes but imagine that analysis services request 12 gigabytes and the 12 gigabytes will be here will be would exceed the amount of physical ram available at that point the windows operating system has two options first refuse the request and say no sorry no memory available or try to move out of the ram memory that was allocated by someone else because you know what if you're using power bi desktop maybe that you're not using the other big application i don't know autocad or something that was consuming a lot of ram so it could be temporary switch to a temporary file the ram that is not used so this is actually what happens there is something called page file that is used by the operating system automatically you don't have to do anything to create a copy of the content of the ram so if these two gigabytes for example are stored here and usually the the page file is stored on a mechanical hard drive or better on a solid state disk and this memory now is free so if this memory is free it means that we can imagine that we move this block down we move this block down and here oh sorry this one too and here we go now we are able to allocate our memory this is what windows does automatically but this now raises a question the number the task manager shows is so let's go back sorry let's go back to this situation right or sorry so this situation is good if you think about power bi desktop right power bi desktop asks for two gigabytes but at this particular moment has zero bytes in physical ram because the entire amount is here the entire amount of ram is now in the page file it's no longer in physical ram so there should be two numbers how much memory the process requested to the operating system how much memory is actually available in physical ram and the difference of course is paged outside and why this number is important because when powerblade desktop will have to use that ram it will have to wait that windows move to the page file something else and load the memory the pages of ram that was required by the the process and this requires time accessing the physical ram is an operation that requires a few microseconds when you go to the solid state disk is usually sorry nanoseconds for for the ram is nanoseconds whereas when you go to the mechanical hard drive is a few milliseconds for a solid stage this is a few microseconds so there are different orders of magnitude of difference that translates in you have to wait more when the ram that is used by a program is paged on disk how can i know whether the numbers i have seen here are one or the other so here this memory is something you cannot change you cannot configure but if you go in details details shows the list of the processes no longer grouped by window that you see on the screen which means that in order to understand where we are what is our our application so i know that this msmd srv.exe is analysis services but if you don't know that you can retrieve this information by going here you can right click here go in properties and you can see in the properties window somewhere probably here okay so let me see if i okay here you can see that we have in this properties window you have the name of the file and other information that allow you to identify the process now an easiest way to jump from this point the name of the process in the processes tab to the corresponding line in the details window is go to details go to details move me back move me there now by default the columns that are displayed in task manager are these columns oh by the way i forgot to mention i'm not sure we have to click on more details the first time you open task manager you have to click on more details to see this right probably i i don't remember if i told this now if i go here and i go to the details i can see that the number shown here by default is this long description memory for active private working set which correspond to the number we see here okay the number here the memory is identical to the number that we see here what is this working set the meaning of this sentence is this is the amount of physical ram physical memory that is currently available to the process so if i go back to my to my whiteboard the the working set is actually how much ram we have let me see if i have this here how much ram we have here how much part of these 32 gigabytes we have available which means that currently in in this situation i will see for my power bi desktop application here i would see zero right i would say zero because everything is in the page file at this point whereas as long as i have the entire allocated the ramp think it look at the analysis services here so if you look at this ram here the amount of ram we see here is the amount of ram we see here right the same thing so now i can go back i just wanted to highlight but the idea is that i can see in this number the amount of physical ram used but it would be interesting to see if there is a difference because if there is a difference i will be able to know how much memory has been moved to the page file and this could be the reason for slow performance when i go back and forth between different windows in my machine for example now in order to see this number you have to right click here in the task manager click on select columns and choose here commit size i already click it here but by default this is not enabled and commit size correspond to the amount of memory that the process requests sorry i have to go back to the demo machine i made an example without showing you this so let's uh let's let's repeat what i said because probably i i made a mistake here i didn't switch to the right monitor so the purview desktop here memory is the physical ram available we said if i go in details i see the same number but if i want to see the other number i have to go here select columns and choose commit size which is not enabled by default so this is the the flag you have to check when you check this flag you have this other column commit size now by comparing so let me move this closer here because probably what i would like to do when i have my task manager i would like to see this i would like to see for each process the amount of ram the amount of memory requested when the commit size is greater than the memory here i know that there is memory that there is a some data that has been moved to the page file and it will be slower when i activate the process to get this data back currently i see that my obs application which is a which i'm using to record this uh session is actually um allocated it requested 500 megabytes more or less that have been allocated but at certain point moved to the page file not a big deal but when this number starts to grow we could have a problem going getting back but more important this number tells me what the process actually requested so if you see that there is a huge gap you know that by increasing the physical ram of that amount you would improve the performance now this is not you not happening in machine because look my machine has 32 gigabytes of ram and with 32 gigabytes of ram i don't have actually the need to move data to the page in fact this could happen for other reasons but for small amounts so at the moment is not a real issue so in order to simulate the problem i have a small application here that when i run this application let me show you this we run this application this application is going to allocate all the memory and give me the memory allocated you see that the the memory is allocated now here we go and now we are in a critical condition from here and after windows can crash at any moment but don't do this at home what i can do at this point is in this condition if i go back here you see that my analysis service is still here but now we know that if we if another service will try to allocate more ram it will be in trouble right so let's try to do that so now i go here and i just have to scroll down because if i scroll down when i reach the end of the 500 rows that the power bi read from the query you see that now we are waiting and these numbers are changing you see that these numbers are changing at a certain point the number of commit size could grow now because analysis services let's say it needs physical ram we will not see a big difference but the thing is that in other condition we could see the two numbers diverge right and and the commit size is much bigger especially if you didn't use the program for y so imagine that you move your power bi window you know you recognize the powerball window you use something else and then you got you go back to power bi and you are in a situation of low memory condition so recognizing this that as i said for power bi doesn't happen often but it could happen more often for other processes like for example the mashup engine or the browser soft process looking at this difference could be important to recognize oh we have a low memory condition we want to do something for that right so at this point it worked so actually i'm able to work in this condition but as i said this is just to show you that the memory requested the memory actually used could be different because of the techniques that we have seen now let's go back to the whiteboard if i increase the ram i could reduce the problem of paging that we are not seeing now because i don't have something big enough and we are not actually measuring the seconds but i could show you that in this condition if i had more ram available i would have spent less time waiting for the result of the second query that i ran in power bi for example and in general this is true for every application not just power bi now as i said in increasing the ram could be a good idea but at the same time my problem with this particular power bi report is that i have a very bad measure so being able to identify which process is causing a large amount of ram allocated could be important to identify what is causing a slow down in performance why this there is another thing that is hard to see here let me go back here usually you look at the cpu but when a process starts to page data in and out the cpu is not used you only see the ram allocated the the gap between the virtual memory and the physical memory that increases or decreases but you don't see the cpu but you still wait right because the cpu does has nothing to do when it is waiting for the data coming back from the page file for this reason it's important to realize whether you are consuming memory in excess of the physical ram that is available and for digital looking at this context could be important once you identify which process is responsible for the slow performance you can you can decide to do something for example if i realize that let's say analysis services is responsible for the bad performance what could what can cause this dax mdx queries for example if the mesh up engine and large in memory during usually during an import when you import the data you might have some transformation that is consuming too much ram maybe you can try to create a to leverage on query folding to push the transformation over the data source or you can try to simplify the transformation but you have to solve the problem reducing the memory pressure now what happens if this is a problem of the browser sub-process and i draw a line here because if i have a memory issue in the browser's process now the issue is not really big four part of the desktop is bigger when you run in a browser let me explain you why when i run power bi desktop i am running everything on my machine but when you publish the report and you open the report in a browser what url what you are using at that point is only this part of the stack actually to be correct only this part of the stack the part that includes the browser subprocesses that you have seen here and sometimes not all of them because you don't have part of the editing experience of the model why we could have a problem in a processor process well because if you create a report that has many visuals every visual might allocate ram memory and memory could be paged out to the disk in the performance once again now that we understand that the number displayed by task manager the number we see here is not necessarily the amount of ram that is used that is sort of that is requested to display the page so sometimes you see that the cpu is is not very high this number is not very high but you still wait a lot of time just to render the report well this could be a problem uh when we see different numbers here in the processor process between this number and this number you can see here there is a difference of 70 megabits which is not much but in case you see one or two gigabytes of commit size and 200 megabytes of memory in a browser process you definitely have a performance issue because this paging especially on a machine that has maybe just two or four gigabytes of ram could happen very often and slowing down the performance now i want to show you this with another example i have so i can close this file and i can also close this window so let me close this so now i restore the memory and hopefully i don't crash everything okay and then i go back to my many visuals and i'm opening now another file to do this okay in a moment we will see this and let me recap so now we have seen that consuming the memory for analysis services for the engine the mesh up engine is something that usually happens when you process the model by the way also analysis services requires memory when you process the model because actually it has to import the data uncompressed compress them and then write them compressed but the the operation requires a large amount of uh ram to be completed but once you complete the the refresh and you publish the model you only have to consume the report which is still part of the power bi desktop experience but usually pyruvate desktop is executed on a machine that has a larger amount of ram whereas the visualization of the report could happen on your mobile device or on a machine that has a not so powerful device a powerful notebook for example and we want to understand whether we might have a problem for our visualization or we want to diagnose if the visualization is the problem so now we have here here we go we have this model here you see i saved this file with an empty page because this way when i open the file nothing is displayed the way powerday works only the page that is visible is actually consuming the resources including the memory to display the visuals and so now when i switch my tab here to dashboard what happens that when i click here what will happen is that we will see that here my power bi desktop will consume cpu and ram from one of these browser sub processes okay so let's go here look at what happens this is also another one but this is in particular this was the processor process that was consuming the largest amount of ram and cpu now like i did before right click go to details i can check whether the amount of ram i used is almost the same value of the commit size if there was if there is a small difference 10 20 megabits i don't care but if the difference was hundreds of megabytes why this could happen large map with thousands of points too many visuals in the ma in the report or very expensive custom visuals or visuals that have a problem for for some reason then we might see in this we might see this condition now usually the native visuals have a lower demand of memory compared to the custom visuals but in reality the most important thing you have to check out is the number of features you have in the page the more visuals the more memory required and so when you see a page that is low you can also take a look at the memory here because it's a very good leading indicator of a potential performance issue especially when you publish the model on the web because imagine what happens if i if i needed two gigabytes just for displaying this page on my pc is not a problem i have 32 gigabits but on a pc that has only four gigabytes of ram and is already running many other uh operations in other processes that would have been iu a huge issue right and we don't want this to happen so let me recap what we have seen today understanding how the memory works in windows this would be a very long topic but the most important thing to know is that the memory is virtual or physical and by default the task manager which is a tool that you have on any machine in windows by default shows you the physical memory not the virtual memory but the virtual memory is what actually the process requested to the operating system and looking at the difference between the two numbers tells you whether your process is affected by some memory lag lack of memory problem you can increase the ram or you can try to optimize that part of the report how many visuals the transformation in power query or the tax query the tax measure that you wrote because that behavior is causing an excessive demand of ram that results in bad performance so this is the takeaway for this session thanks for watching enjoy dax and power bi
Info
Channel: SQLBI
Views: 9,508
Rating: undefined out of 5
Keywords: DAX, PowerBI, Power BI, UNPLUGGED, Marco Russo
Id: kpAZD7AeRQw
Channel Id: undefined
Length: 41min 0sec (2460 seconds)
Published: Sat Feb 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.