DAX Optimizations Examples with Alberto Ferrari

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everybody or nah right now we say ciao France don't know why but that's the way of greeting right now from sequel VI what we are going to talk tonight is about not tonight for you is today so what we're gonna talk in this session is about tax optimization examples the idea is not to go too deep into optimization techniques just because we have one now we're available which is not enough to go too much deep but I want you to give you an idea of the kind of performance you should expect from a power bi report and we start from scratch so we start from a reporter we know nothing about it and we will investigate one step at a time on those low measures trying to find out where the problems are and how to fix them now before I start very very quickly who I am I'm Alberto Ferrari I work with Marco Reus at sequel bi that calm now we do a lot of things these lights are all the we write books but this is the old version of the book I didn't update my slides anyway the definite attacks the second edition is is our ducks Bible it contains everything about the Ducks language and how to optimize it then we teach courses we typically do that live but we can't right now so if you're interested in learning ducks attending one of our courses might be an idea and the content I'm talking about today is included in the optimizing ducks workshop we also do consulting we are be experts and also on anyway I'm not here to sell myself I'm here to run you through the answer to the typical question that a new buy or a new colleague might ask you at some point I mind taking a look at my report there's something something strange it's slow have an idea why but it will take you no more than a couple of minutes to fix it and when you hear that sentence you know that the other person is lying and a lot of problems are coming out so what I want to show you is what you do if you are in this scenario and you want to fix the problem we are not going to say oh no I still have the agenda to show so the agenda is basically understanding that optimal optimization is a set of steps that you need to carry on it's a technique it's not art it's not magic it's just knowing what you are doing so in the process of optimizing their report we will learn how to use the tools or to measure performance of a report and to understand that what the bottlenecks are understanding a bit about the way that acts engine solves your query sir and understand again the the first steps in understanding the formula engine and storage engine keep in mind whenever you need to optimize something the first thing that you need to do is being able to reproduce of the problem if you cannot reproduce a performance problem there is no way you can optimize it because there is no way to try and reproduce the problem fix it and make sure that the problem will not appear again I don't expect you to say what any point during my presentation this is not marketing it's a technical presentation so I want to show you how to do things in a real world and the results will be kind of amazing but nothing strange and nothing that you should not expect from your from your everyday life but with a report so let's start with the first look I have a report right now you see a blank page you should that's totally expected because we want to see what happens when you open the report and when I click on the first page of this report you see that it's not faster to be honest it's damn slow you see the images are popping out one at a time and this is not because it's a remote session it's actually a very slow report so by just by looking at the report that you have the feeling that something is wrong these measures are very heavy and there is some problem but thing is how do you know that where the problem is we have one two three four five different visual integers report and some are slow some are not that slow there is a culprit somewhere between all these visuals which is wrong and we need to find the critical break is finding it is not easy because you cannot just rely on your eyes so you cannot say well probably this is the fastest and thisis that's always that because this is the last that came up this is actually true but you cannot rely on it moreover the speed the changes because if I click on a slicer you see that it takes time to refresh everything you probably don't see as more circle here but this part is still being updated and it will take a while but because of the cache the system if I remove the filter it will be me very fast and if I put again the filter it will be fast again I need to change the query and go to something different to run the query again and see the speed so working with the report and that's the first important nation you don't have a way to understand where the problem is because you needed to be able to reproduce your scenario and if you just work with power bi there's no way to to do that so what we learn just by looking the report is that we have one two three four five visuals one of them is gonna be too slow one but we have no idea which one is actually the slow one so the point we reach that right now is just is we have a one two three four five visuals and we need to find which one is the slowest one how do I discover that well I can start investigating a bit over the problem at this point is where I typically ask the audience to tell me what they want to do doing that online is not very effective because I don't see a chart I don't see what your answer would be so I will just guess what you probably would say and go ahead we can start investigating on the model well the model is kind of simple it's a star schema so we have orders or this is our main table that contains our orders orders placed the two customers customers so an order is placed to a customer in a specific data and then we also have exchange rates the reason is all the orders are stored in US dollar and we want to show the values using two different currencies if you look at the report you see that we have the amount in u.s. dollars and the amount in Euro the conversion happens on the flight so that's why we have an exchange rate table that tells me the exchange rate at the end of every month from the side's point of view this model is not very large at the orders table is at the largest one and it only contains one point six million rows now 1.6 million rows is tiny you should expect the middlee second so to answer any kind of query from such a table nevertheless it is slow and if we go a bit further we see that we have the amount in USD as we say that the amount in euro so we know that some kind of currency conversion is happening here and that might be an issue and then we have at this measure open orders open orders tell me how many orders are open at any given point in time an order is open in the time between when the order was placed and when the order was delivered in between those two days an order is considered to be opened up so year by year how many orders we had open meaning that they started at some point in 2009 and it were not yet finished in 2009 or they were being handed in 2009 so we have three measures the amount in USD the amount in euro and the open orders several visuals and we don't know where the problem is see we can take a look at the measures so can take a look for example at amount USD and zoom it beta or open order sir or amount in USD use amount in Europe you see that amount in Euro has a lot of code but we still don't know where the problem is so the first and most important lesson to learn is to measure whenever you have a problem wherever you want to understand where a problem is you need to measure it and you need to understand it better power bi offers you a tool and two tooth measuring if you go in the View menu you have performance analyzer performance analyzer is a great tool even though the kind of results it produces are not easy to read them if you enable performance analyzer these the panel opens the performance analyzer you can start recording and since at the moment when you start recording every time a query is sent to the server it is being measured in terms of speed now if I click on bachelors ax you see that the query is being executed and here you see some numbers it shows all the visuals open orders by education open orders amount the slicer open orders by content in year and for each one we have the duration in milliseconds ax a click on it again it will compute it again but actually the best thing that you can do with these here's a to clear everything and then refresh the visuals if you refresh the visuals all the visuals in your report will be completely refreshed and the engine is measuring the time needed to execute everything it will take around 10 seconds to execute everything now you have the duration in milliseconds and for each visual you have basically the duration one of the nice thing is that if you click on a visual it is being highlighted in the report see that if I click on slicer that slicer is selected if I click on open order so T is a visual is being highlighted so I have a clear understanding what I'm clicking on to now we have numbers here 127 milliseconds so to refresh these open orders 11 seconds to refresh this visual which is the chart 140 milliseconds for the slicer these numbers are nice but the first thing that you need to learn about the performance analyzer is that this number have a very clear meaning which is nothing at all they don't have any real meaning because this is not time that the engine needed to execute the query this is the time that the engine needed in order to produce the visual and it is made up of three different numbers which are the time to execute the tax query the time to visualize the chart the time to procure the visual and other other is my favorite because it means nothing other is just a number a time that is needed in order to do other things in the case of power bi those other things is waiting for other visuals in this report we only have five visuals so other will be small nearly everywhere but if you have a hundred or 200 visuals in a single report other will be the time that the engine had to wait for other visual to finish before starting the current visual and it is not uncommon data under other you have the largest amount of time so you might see a visual that takes ten seconds to run out of things seconds nine seconds are in order and only half a second is on the tax query why do I mean that that this number is meaningless because and no matter how much you optimize the query of that visual you will never reduce the time for other the only way to reduce other is to reduce the number of visuals in our specific scenario this is not a big problem because having a small number of visuals other accounts for not too many our values so we have five visuals let's analyze them one by one the slicer takes only two milliseconds to run so it's not even worth our time then we have open orders by continent in year two seconds out of two seconds actually two seconds is spent on the backs query and this is this visual that is showing the open orders slice by continent and by year so it looks like open orders is not our biggest problem then we have sales by year in USD and euro sales by year in USD in Europe here it is takes 2.4 seconds to execute and then the worst one is open notes amount USD in Europe this chart this is going to take a lot of time to run it's 11 seconds and this report contains support at the open orders and amounting you see an amount in Europe so if we want to optimize the code we know the problem is somewhere in these three measures and it's likely grubbing the query which executes this report is our best chance of reproducing the problem if I want to find that the query that runs here what I want what I need to do is just go here under this part and just click on copy query this cop is the query on the clipboard so now in my clipboard I have the full query which feels that this reporter how do I run the query and measure performance well I cannot do that from inside power bi I need to use the X to lea to do that so I need to launch attacks to do connect it to my power bi file and from here I only need to pass the query you see that the query is just a top-end 1000 of a samurai's column sorry of a samurai's column that summarizes by year month and month number and confuse the three measures part of the query is totally useless the top end a thousand one is useless you will always see it open every time you see a query run by power bi because it never grabs at the entire set of values but only the top thousand or top 500 or top whatever and so the order by we are not interested in the result we are more interested in the timing son if I run the query now it will take 10 seconds to run and you can easily say it's gonna take 10 seconds by looking at this number which will grow up to 10 seconds up to the end and then I can also see the results I'm not interested at all in the result I'm only interested in the time the problem is that this time is not very precise and that does not give me very clean information about that the real time is needed to execute the query so when you need to optimize your code what you need to do is enable the server timings here in performance you have really plan a server times where mail interesting these two buttons the pre planner is sometimes useful but it's really really deep in the way the engine works the certain timing gives you a precise measurement of how long it takes to run a query you need to wait a while until you see this new panel coming out on server timings which right now is not showing anything I need to run again the query to measure it keep in mind that if you if I run the query now I will get some numbers but it will not get the correct one because there is a cache in analysis services which is the engine running in PowerPivot that is caching partial results so if I want to make sure that I get always the correct numbers to improve performance I always need to clear the cache before running a comma there is a button here index to you that clear said the cache of power bi and that's an option or where I have the run button I can choose to simply run the query or which is much better clear the cache every time and then run the query so I can simply click on run at ease will clear the cache first and then run the query let's do it now so we see what happens in the performance pane right now it's running the query we know it's gonna take around 10 seconds and at the end we will have the correct number so we will have the total execution time at the storage engine time formula engine a storage engine for now what we are interested in is only 11.9 seconds so this is the time we need to beat then all the other numbers we will go to them a bit later as we will try to understand that what are all these strange queries that appears everywhere we know it takes 11 seconds to run a query that contains three measures but out of the three measures what is the relative time of each of the measures we have open all this amount in USD an amount in Europe I want to measure the three measures separately so we can start painting out one by one I can run amount in USD and amount in use the runs in ten milliseconds so this is clearly not the culprit we can get rid of the mounting in USD because there is nothing interesting to see there let's take a look at open orders open orders will take a bit longer I have honestly no idea how long it will take around seven eight seconds I guess a bit more 9.5 second sir is that the time spent to compute the open orders what about about the euro if I run it done this is gonna take a bit less it was a chance second before it's 2.3 seconds so we have three measures and we know that this is around nine point five seconds this is about two point four seconds what I should do at this point is a focus my effort on the slowest one open order sir nevertheless we are not gonna start from open orders because that's that's a bit harder we start from amount in Europe now how do I fix it I have to option of fixing it the first one is well first I need to understand that what is measure computer Nina we need to change the measure one step at a time until I fix the backs code and make it work in a better way how do I change the measure one option is to go here click on a mountain gyro click on a mountain Europe look at the code and then starting start to to fix it but this requires me to jump up back and forth between power bi and X to do to do the changes to the tax code and then the measurement index tool there is another more interesting way of obtaining the same goal which is to find the measure local to the query if I search here in a click under orders you see that I have a mount in Europe I can click here then right-click on the mountain gyro and have the find measures if I click on the find measures loop what happens the full code of the measure is copied in my query with the same name and the same definition that I have in power bi what is the advantage of having the definition here if I run this query now the engine will not use at the original definition of a mountain gyro instead it will use this definition so I can do any change here and the engine will use this definition we can easily check it if I define the measure as being just one and I ran it you see that it's obviously very fast and the result is one everywhere so whatever change I make to my code here will be reflected immediately in my query and that's the easiest way to reproduce the things again and again another thing which is important is that the measure amounts in Euro uses amount in USD and I'm not sure where the problem is in the code here or maybe is amount in USD amount in USD is another measure so my measure calls another measure and the problem might be in my measure in my first measure or in the inner one what if I want to see also amount in USD there's an easy way of doing it which is not using the fine measure but to find the pendant measure if I click on the find the pendant measure tax to do will not add only the current measure but it will also add all the measure that are used by amount in Euro so because amount in Europe uses amount in USD I also have the definition of amount in USD here and again whatever change I do to one of two the two measures will be reflected in this query immediately with that said it's now time to start diving a bit more in the tax code and understand where the problem might be this requires a bit of analysis of the coder amount in USD is super or simple there's nothing that should surprise us here it's just a sum of a column a multi euro dasu a conversion on the fly of the value in u.s. into it value in donor and how does it do it it iterates over the orders for each order it computes the amount in USD and then it multiplies the value for this giant expression now despite being very long that this expression is kind of simple because it's first check if the average rate at the data which is the first of the mount of the current order in the year of the current order so it's searching for the first day of the month where the currency is a currency key where the code is euro so it's searching for the currency key of euro on the first day of the month and then it computes at the average rate average rate is in the exchange rate which contains at the rate the date and the currency so this lookup value is basically retrieving the exchange rate on the first day of the month where the order was placed the reason why it uses the first day of the mountain is because in this specific data model we don't have exchange rates for every day we always have the exchange rates at the beginning of the map so all the orders in amount will use the exchange rate of the first of the month and because of data quality that exchange rate might be missing for whatever reason what happens if we don't find a value so if these expression returns are blank it means that we do not have an exchange rate therefore we compute the average exchange rate of euro of overall time and we use that average it to the conversion otherwise if we found a value then we use that value so the amount in USD of that specific order is multiplied by the exchange rate on the first of the mountain or the average of all exchange rates depending on whether we have the data or not as I say the spine being some by somewhat convoluted that this code makes sense it's only longer but that's the only problem of that this piece of code where do we start optimizing it we have a lot of options but whenever I ask it in classroom one of the first thing that we notice is that we are doing a lookup value to search for next an exchange rate and then we are doing the same lookup value again the same exchange rate and this is not very smart oh we are doing twice at the same operation so run the query again because I wanted to have a base measurements this query runs in 2.3 seconds so let's start modifying it I can take this entire lookup volume and store it into a variable the final variable exchange rate where I put my lookup value we are on the return here we say if is playing the exchange rate then do that calculation otherwise just use the exchange rate let me show you one of the best features of the x2 you see that the code is now may snap but back studio is integrated with ax formatter so you can click on format' query and the query is formatted on the fly while you modified so that makes it super easy to always have the correct decoder the right way so I created a variable I avoided doing the calculation twice we start from 2.3 seconds if I run the query again I expect it to be a bit faster but not so much faster we saved the 300 milliseconds out of the query which is not bad but it's not still faster we can notice another thing look at this calculator this calculate is computed the average exchange rate overall time of the currency we are using which is charged US dollar again in Europe now this number never changes it is always the same so there is no need to compute it here I can compute it outside of the entire iteration the final variable here exchange rate or time but I compute my number and then format it again that there is a problem somewhere I'm missing it meant so now the computer outside of the iteration I compute the exchange rate over all time and I use it here inside the iteration I compute this lookup value and I use the value here only once again we start from two seconds put in the code of exchange rate or time outside goes from two seconds to two seconds of one nine seven eight is still very large so this doesn't save anything specifically what else can I do what can I do to make this code be faster well to make it faster what I need to do is dive a bit more into the execution into the query plan whenever you run a query index that there are actually two engines which are running one is the storage engine the other one is the formula engine storage engine is responsible of gathering values from the database form arranging computes values on top of it storage engine is extremely fast from one engine is extremely slow and if I look at these numbers I see a bit more information I see that it takes two seconds to run the query one point five seconds are spent in storage engine half a second is pending formula engine and this bar shows that 23% of my time is spent in pomona engine and 77% of my time on the other hand is spent in storage engine and these are the queries executed by the two engines you see that I have two queries that retrieve a lot of columns at the quantity the customer key the order number the sales amount that the order date of the delivery data and it this is retrieving 1.6 million rows meaning that the storage engine needs to retrieve the one six million rows pass them to formula engine and then from ranging will do some sort of calculation 1.6 million rows is not a random number is exactly the number of rows in my orders table if you look at the orders table it contains exactly one point six million rows so this query right now is materializing is building any memory data structure that contains the entire orders table why does it need it the reason is this call to amount in USD this query is iterating over orders for each order is a calling amount in USD amount in USD is a measure defined here that computes the sum of orders amount because of context transition here you have some sort of calculate around it calculator which always appear before any measure performs context transition in context transition forces this calculation to happen only for the current role transition is fine it works operation it's not very fast you can't rely on context transition if you need to do a small number of iteration but if you need to do millions of iteration and still relevant context transition your measure will be slow beside look at what I'm doing here i'm calling amount in USD an amount is USD is just summing orders amount the amount column in the orders table now what is the sum of all this amount for a a table debt because of context transition will contain only one row well the sum of amount for one row is just the amount so the best way to optimize this code is to take this order amount here instead of amount in USD and just rely on the row context avoiding calling a measure from inside an iteration that happens for millions of times doing this mole change and now I no longer need that the definition of that measure because I no longer use amount in your in lieu what happens we go from one point nine seconds to much better execution time which is 230 milliseconds that is 10 times faster most of the time is spent in storage engine and the value for formula engine are still it's still there but it's a reduced so the overall query will run much faster so we moved from 2.4 seconds to 200 milliseconds around 12 times faster but we can still do a bit better than that because think that we are iterating over orders for each order we compute the exchange rate on the data and then we compute the amount and we check this is blender so how many iterations will order will some acts operate some expansion it will do 1.6 million iterations lookup value will be called 1.6 million times and this if statement will be executed 1.6 million times both operation are faster but if you execute them a lot of time of course you you reduce the speed of your query and the question is how can we improve that how can we reduce the number of iteration executed by iterator actually if you think a bit about it do I really need to iterate at this code order by order line by line on my order table because we know that that we have the exchange rate and we know it because of this lookup value the change is only once per month we can go for a different algorithm we first computer the sales amount month by month we compute the exchange rate mouth by month and then we multiply them one by one if I go if I have a year of data and I group it by month instead of 1.6 million rows I will have 12 rows so it will be a hundred times smaller therefore it makes a lot of sense to group earlier by by date so what I can do is instead of iterating by order I can summarize order some can I do that by data your amount is a column that contains at the year and the Mount together so this is gonna return only 12 rows for the year and then I add the column twirl thisa which I call a mountain which is the value of a mountain USD you see then I'm going back to calling the measure which is not bad in this case because I'm calling the measure only for 12 times I'm no longer calling the measure 460 million times so now I have these Atkins returns of the Year amount and amount in USD I also need the first of January the first off of the mountain so let me retrieve last day in Mountain which is just a calculator max off day date this is needed to have a date that I'm gonna use here to compute the first of the so I have and then I still do my lookup value in the gyro and then I computer no longer orders amount but amount USD x if it's blank exchange rate and exchange rate or time otherwise exchange rate and that's it I doing this at canosa I am now reduced from 1.6 million times the number of iteration to only 24 12 or 24 or 12 times the number of years rows in my report again we have a baseline of 230 milliseconds and if I run it again I have an error of course--i a mountain you see has not been found why because I mismatched it that's it amount USD okay it was 200 milliseconds and now it's a 33 milliseconds again 10 times faster than my best option but if you do the complete math we move from 2.4 seconds to 33 milliseconds which is kind of Awesome just by fixing the tax code and understanding where we wanted to do the correction last thing that I need to do is copy this code I'm just copying it the clip order I can go to power bi and this last step I need to do that by hand so I need to go here get rid of everything and copy the code that I wrote and these will reduce the execution time of everything but I have still have to optimize the open I told you earlier how do i compute the open orders open orders is computed by checking how many orders in a given day are open and an order is considered open in between the order date and the delivery date let's take a quick look at our data set the orders table has two columns the order date and the delivery date this is the 8th of October up to the 18 of October so any date between the 8 and 18 the order need to be considered opener after the 18 is closed that before the 8 before the 8th of October it is non-existent it's not there so the open orders measure okay the open orders measure is much simpler actually than the previous one even a period in time open orders takes at the minimum date and the maximum data and then it computes it filters the orders of where the order date is before the end and the delivery date is after the start why do we need an period answer period because if you have one day it's only one day it's very easy to understand whether an order is open or closed but if you have multiple dates in the selection it's no longer so easy to understand it I don't remember if I have a slide explains that let me see as you're gonna learn I never followed those lines I do everything on the fly yeah think of that if I have one day it's very easy to understand whether an order is open or closed but if I have a period like October we I know that order one is not open but what about order to order to was not open at the beginning it is not closed it is closed at the end but it was open at some point in October do I want to count it as open or not that's a choice you need to make in the version I wrote I consider order to in October as being open the same for order three it was and not open at the beginning but it was open before the end not closed and so I consider it open order for of course it's neither open or closed because it started after the end an order five it was started and closed in October and again the question is should I consider it open or not in my algorithm I consider it open so I want the result to be three in order to compute three I want all the orders that end that did were not ended before the beginning of the period and were started before the end these will include ordered two three and five and skip for m1 and this is reflected in the code I wrote I'm returning all the orders that were ended up before the end not that were ordered before the end and ended after the start this produces the result I need to use this old data to get rid of the effect off at the relationship that I want to kill before doing the Martha the problem is that this measure is a damn slow let me repeat the process I did earlier I remove everything from here now I'm focusing on open orders and so let me uncomment and put this again I still know that it takes nine point five seconds to run it so I can save that first time in about executing it and I can define the measure the open order measure here so that I can change the measure and see the effect of my changes there's no need to run it that if I were let's do it if I run this query I know it will take 10 seconds up but the real question is how do i optimize this code it's utterly simple there's not much that seems to be needed in order to optimize it actually there are there's a lot of work that we can do here and even though it's very very simple to do anything the reason is kind of tricky what this code is doing is asking is to count the number of orders where the orders are in between the order date and that they live where the earth date is less than the end period and the delivery date is greater than the star period in to materialize the entire order table you can see that from the number of rows which are returned by this query is still one point six million Rosa and it's retrieving for 1.6 million Rosa the order date and the delivery data generating a table with 1.6 million rows and scanning it takes time there are multiple ways of optimizing this query the easiest one is a to split the condition because the condition involves order data and delivery data but you can write at the very same expression instead of using filter using calculator separating the two or the two filters one on order date one on delivery date you should remember always remember that tabular is a columnar database being a columnar database and means that operations executed on one column only are much faster than operation that involve managing multiple columns so how can I write it in a way that it works on column by column I can rewrite it saying give me the count rows of orders where order date and start period all date I just rewrote the expression using calculate counting the orders I removed filter so I removed the iteration from my from a expression and I that's the most important thing I split the two conditions one on order date and one on the libra date this change as Melissa T's moves the entire query from eleven milliseconds to 74 milliseconds which is kind of awesome because it generates a much better query again what I need to do is take these and therapies of code so here replace my query and at this point the result is pretty nice with these two changes to measure Sun you see that and I can click on that any filter here change or Claire can manage manual and these are the kind of performance that I would expect from power bi if I was Christian Way that will probably say that's a clicky click it drag it drop the experience but I just call it the normal experience of browsing data model there's 190 that I want to show you I still have ten minutes yet which is how to go a bit further with the optimization because look at the execution time of T squaring this query takes 74 milliseconds 31 milliseconds our storage engine CPU and the 41 milliseconds are for my engine CPU one important thing about storage engine formula engine is that the storage engine time is cached so whatever happens in storage engine is being kept in a cache by the engine and the ng will reuse the cache among multiple creation whatever happens in formula engine will need to be executed again in the game so if I run this query once more it run it once a little I know I need to get rid of - sorry let me move to just run query if I run the query once it will take it took 77 milliseconds if I ran the query a second time you see that the storage engine CPU now is zero and thirty nine milliseconds is the formula engine time if I run this query again and again and again it will always take around 30 40 milliseconds it's 41 milliseconds if I run it again it will be around 40 milliseconds so I cannot save store on Jack I cannot save for more engine CPU can only save storage engine CPU between different we listen the reason is that this code is still a bit tricky if I want to super optimize this model what I need to do is go one step further going one step further requires changing the algorithm to change the algorithm I need to change the data model and move from this model to this model instead of having struck the order structure in this way I can change the data model and create one row per day of order so I create one row for every day where the order is open doing that I will move the complexity of the calculation in a calculated table and computing the number of open orders will be just a matter of counting things to rows because this is October first in October first I have only one row one value here for October 2nd because both this order and disorder are open I will have two rows so what I need to do is generate a table that contains one row for every date where the order is open what I need to do is again as two ducks to do I need to create a table that uses generator I use a generator orders of dates between from the data between your data and orders delivery date this query it's gonna take a while to run because it contains 60 million rows but it's gonna return all of the for each order all the dates between the order date and the delivery date out of the entire query I just fix the code while we are waiting for it to finish I'm only interested in having order sir the number and then I wanted order sir customer key and I wanted a so it's actually taking a bit more than what I was expecting so this query okay returns no it's still not returning we don't mind this query returns the order number the customer key and the date we will see the result in power bi where it's a bit faster so I can go here create a new table modeling new table let me counsel but we hope we start now it looks like I have been able to crush power behind the open orders table open notice now is generating my table with 16 million rows disabled is much larger than the original one the first table is 1.6 million rows this contains one row for each day when the order is open so depending on how long the order stays open these will contain multiple of the rows on average contoso keeps an order open for around 10 days so my open orders the content of my open orders table contains that the order number the customer key and the data and I have one row per every day where the order is open if I sort it by order number you see that this order has a several dates one for each day when it is open once I have my table I can only build an economy to build a couple of relationship like the relationship between customer key and customer customer key and customer okay and between order number and order number okay that's for sure by again order number and order number no it doesn't like it oh yay that is not expected to work sorry my bad I want to be the relationship with data and the relationship with customer I didn't want to be the relationship with orders I want to build the relationship with the customer and they the orders is here and this is another fact table once I have that these open orders how can I count that the number of orders which are open that just a decent count of the order number so I can I can control see a while I want the original area and before it is evaluated okay now I can replace my measure with just a distant count of open orders let me run this quit again so we have a baseline we start from 41 milliseconds and as you are going to see this is not going to be much faster probably it's gonna be a bit slower than this one but I would expect the distribution between storage engine a formula engine to be in favor of four mobster changing so if I run this thing count of open orders open order number being at this a simple this encounter should be much faster not much faster but should favor for my engine let me see it's actually slower it takes a 233 milliseconds but what I'm interested in - in this case it's 7 milliseconds suspended from your engine and 226 milliseconds is spent in storage engine the good thing about writing the code in this way is that if I don't clear the cache and if I run into multiple times you see that now the execution time is 1/2 milliseconds why because it is in cache it is not recomputed again it's just using the value that was in cache which is available if you spend most of your time in storage engine it this feature is not available if most of your time is spent in formula engine in a multi-user environment with thousands or hundreds of thousands of users using the cache the right way makes a really huge difference between code written right way and the the wrong one so let's go for a conclusion I used all of my time if you want to do optimization well you need to understand of course the basic of optimizations that's in the book in the courses we teach it you need to learn the details and then the sets you need to do is first download a nice old x2 you get spend time learning how to use it and then prepare to measure taste update and then look back measured taste and look back measure test and loop a multiple times that's so you is your best friend optimizes this complex process but it's also fascinating it's beautiful to see how small changes in your code actually have a strong impact on and the performance of your model now I think I ate all of my time if you have any question just ask her if you have them later just send an email and I will be happy to answer any point [Music]
Info
Channel: Iteration Insights
Views: 17,387
Rating: undefined out of 5
Keywords:
Id: sCQQ-ZYBDAA
Channel Id: undefined
Length: 59min 39sec (3579 seconds)
Published: Thu May 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.