Comment review: RANKX performance - Unplugged #3

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

ciao friends today i'm going to share how we analyze the performance of a dax expression by comparing the measure we wrote in an article with a suggestion we received in a comment looking at which one are the pros and cons of the different approaches there is no edit so i will share exactly the process that i will follow including all the mistakes so no edits at all so let's start looking at the article here we have here an article we wrote it is a signed by alberto but actually we always review these articles together and the idea of this article is to explain how to solve a common problem which is a creating a measure that provides a ranking based on multiple columns for example here we have a different different customers with different sales look at the round and say so we we show sales amount just as a company as a comparison but imagine that you have only rounded sales and you have ties here now the regular rank x formula would provide always the same value for all the names we want to create a measure that provides a 6 7 8 9 10 11 in these cases where all the customers have the same value for the sales but at that point we want to use as a as a ranking condition the alphabetical order of the name of the customer so this way we have no ties so the idea is to having not just one expression but one expression two expression three expressions so as soon as we have ties in the first expression we start considering the second expression for the analysis now here we have a solution so let me go straight to the solution here which i think is this measure now we will analyze this measure later so i will just observe this at the moment so this is the measure we proposed and if i go to a common we received here we go we have received this uh request okay why considering the function substitute with index as a way to rank on multiple columns which is an idea we we didn't think about this so we said okay why why we should how how can we implement this with substitute with index so sergio that helped us in answering the comments ask it for example and here we have the example so now before looking at anything else here in the comments what i want to do is that okay this is the example provided by josh and i want to compare the performance of this approach with the performance of our original measure so let's start with the original measure so i have here somewhere this is exactly the file that we published as is and of course my starting point is having uh a starting point for the benchmark so here i have ranking column ranking major i think this one is the one that we want to analyze so i go in that studio here i open lux studio and i capture the measure the the query that provides this so how can i capture the query i run performance analyzer start recording refresh visuals so here we have a brand with a slicer with northridge traders i will try to to remove this later but we can do this later don't worry so this is the table this is the table and i copy the query here we go i paste the query index studio and i start running the query just check that the result is correct so we are analyzing the query that provides the result we see in the report and i see here we see let me check we have six seven eight nine and the ranking code major is six eight seven but this is just the ranking for the major here which is what i want so ranking name measure is what we want to analyze this one is the measure that we obtain at the end okay so what i do when i have the situation first of all because i just want to compare the ranking name measure let me check if we have to compare this or the ranking code measure maybe let me check this one so the answer here is using the customer name so we want to compare the customer name fair enough so we want to use this so i comment these two measures remove this now when i comment something here i could break something in the query because sometimes we have the same column [Music] also mentioned in the top end or in the order by it is not the case so it's fine i remove the measure that i'm not interested to and i have here the slicer which is something i could get rid of later but let me try to run the query with the server timings enabled and of course i always want to clear the cache you see i never use the query plan as the first uh as a first option i usually don't have to do that so the query runs in one of 109 milliseconds 100 milliseconds let me try this again just to see if the number changes if i see that the number is pretty much the same after two three executions i know that the execution is pretty stable you see that 10 20 milliseconds is something that i can consider as a possible difference in uh in the performance so okay the query is working oh let me try what happens when i remove the filter you see that the filter here this is the slicer the slicer is a is a table filter is only mentioned here and here i i want to comment it out so the numbers will be different oh sorry i have to always run without selecting anything performance is pretty similar yeah there are no differences with the slicer or not so we can continue this way we simplify without the slicer i think is better and finally i include here the measure the definition of the measure ranking name measure ranking name measure right click and define measure let me forward the query now so this is the query the query this is the measure we published in the article and we have here the query so i run this query again i expect the same result here let me enlarge this a little bit so it's more readable now we focus on this one when i need the code i had just remembered to remove my window of my small window with my face but i think we can start okay so this is our starting point we are running this query in 110 milliseconds and now that i'm here i want to make sure we are not having something that is too slow in this query so let me try to take a look at this because i remember that when we made a few consideration about this code there was something that we could have optimized for example here this has one value is looking for customer name now customer name is also retrieved here in selected value so has one value and selected value do pretty much the same thing i think we could do this before the if statement here and we can transform this into um not is blank it's just a small optimization but this way oh and i have of course to use current name here just because i want to make sure we don't have i i didn't read the other code so i wanted to make sure that code is optimized here so let me format the cops not this one formula code here so [Music] let me check so i think the other calculation should be pretty much good and the idea the principle of this is that we create a table in memory for each customer that is visible in the visual so for decision we use all selected we created this name ranked column which has a ranking for the customer based on the customer name and then we have the lookup table here that uses the maximum value of this name rank as a get as a as an interval to create an internal ranking uh within the ranking by the rounded sales major so the idea is that if i know that i have 1000 customers visible in the visual i multiply by 1000 the initial ranking that could contain ties this way for the values that have the same ranking i will sum the ranking for the alphabetical order so the idea is that if i have a one to one thousand because i have one thousand customers the ranking with the ties could have for example 100 repeated many many times so i multiply the initial value by 1000 and then i sum to the value with the ties the value of the customer in alphabetical order so there are no conflicts this way and i obtain the correct ranking if it is too confusing take a look at the article there is a good explanation there so now i want to focus here just on the calculation and i think that it should be good i don't think that we should see big differences in performance let me check whether the result is the same the result is still good it's still valid and the performance here in server timings is 111. i just probably removed one storage engine queries i think but that's uh that's pretty much the same i i think i didn't change much so this is my initial starting point now let's take a look at the other uh measure so i go back to the comment here and i have to get this code okay and so let me think so this one assuming it is the same so we can try the code here in the report so let's go here let's move this aside and let's enlarge this and we created the new measure here so a new measure and we paste the code as is hopefully this is correct so customer customer name this is different why so let me check maybe you never have to trust too much of the uh intelligence sometimes it provides uh bad information here so customer we have a different name customer name customer code and now just name just name so let's use just name here uh so it is just name so remove this and remove okay let's try this one oh and round the stacy with the space probably he he wrote of his own version of the code not using the same names so okay so that's fine that's fine pretty sure why the the customer name was there anyway so let's move forward so now let's check whether the numbers are the same and here we go here we have a first problem because the code is providing a result also in cases where we were not providing a result so there is something i have to do with this code in order to remove the value if we have a blank so let's let's check the code here and let's try to understand what is doing the code so there's a little selected little column so there is no protection so let's start with a simple thing which is if a knot is blank of rounded states then i do this i will optimize this later now i just want to make sure we have the same code and then we will try to optimize it so the code now is the same we have the same numbers so we this is a good starting point we have the same number okay so at this point i can start looking at the code but you know what if it's not formatted it is not dax so what i'm going to do i'm going to create another okay i can just replace so this is the measure we have in the model now so i can go back here to my dac studio i can actually create another i can copy and paste this code let me copy and paste this code in a new instance of dac studio so this way i will be able to compare black studio side by side if necessary now i remove this measure here which is the one that i pasted from my code from my model before and i want to include the definition of this measure here and of course i want to use this measure [Music] instead of the original measure here so here i'm going to replace this with this now let's write a comment here so new version so that when i switch back and forth between duck studio because now we i have i have two versions of ducks two two two two windows of the studio i want to see this is the new version and the other is the original version so now i can format this query so now this is dax i can read okay and i go back to the other window of dac studio and i just write here original version maybe with another graphic so this is original original at this point it would be nice having a feature in the studio to change the the color the style but i know that darren is already working on a dark mode for dark studio so when we have that probably we will be able to also change the color so let's move forward now this is the measure that provides me the same result so let's try to take a look at the initial performance here so run server timings we just have to wait a few seconds make sure we always clear the cache let's go in server timings here and run let's see what happens so it is slower is 208 milliseconds but of course usually i don't i don't just base my analysis on the first impression so i want to check what the differences are overall so here i have the original here and i have let me run this again just to make sure so you see that i want to compare side by side of the two versions i don't have to remove my face because i just want to focus on this area and this area so the new version is pretty pretty much similar from the storage engine execution it is executing eight storage engine queries it is lower in the formula engine is twice the time of that right so okay that's fine but let's take a look if we can do some optimization maybe that there is something that we can do so the initial idea of this approach i have checked the comment before so i i know this a little bit but let me try to analyze what is the approach here so here the idea is that we get the list of the customers we have visible in the in the in the in the visual so in the filter context all selected retrieves me the list of the customers that are visible individual let me enlarge this a little bit then we have another table where for each customer we just copy the customer name in this customer name column and then we use indexed table let's go okay why this let me think about this it is just a copy of the column but by doing that this copy doesn't have a data lineage that is the main effect because add columns always creates a column that doesn't have a data lineage if you use the if we use the selected select columns select columns when you copy column reference as is you keep the original data lineage this is not the case here let's let's move forward so here we have a join okay the the left join table with substitute with index is actually creating a my rank column where it will replace the right join table so this table using rounded saves as a way to do the descending or so as a way to to to find the position and customize okay i have to review subsidies with index because i exactly remember what this function does so let's take a look so substitute with index and i always go to dax guide because of course we have more details sometime but it's a good idea to also take a look at the micro documentation just in case because i don't think we have additional information here in substitute with the index so this function returns a table which represents the semi-join of two tables applied for which the common set of columns are replaced by a zero based index column the interest is based on the rows of the second table sorted by the specified order expressions so i get the point here the idea is that we have two tables imagine we we do a join between the two tables and the common set of columns are replaced by zero based index which means that the columns that are the same in the two tables instead of being duplicated they have just a number in the new in the new table so this is the reason why we have to duplicate the customer name because because the original customer name will be removed by this technique we need a copy of the customer name so that at the end we have the number and the customer name and the idea is that if i use two columns in the formula here so i want to use rounded sales descending order and if the rounded six is the same use customer name in an ascending order wow this is very smart because this way in a single function i obtain what we had to write in a more complex way in the other measures so potentially this is a very very smart idea because it's at least it's more elegant it's shorter and straight to the point now the result of that has however all the customers because remember we're writing a measure and the measure is executed for each cell of the report whereas we need to we're doing here an analysis and add columns and substitute with index for all the customers visible in the visual so if you have a 20 customers visible in the visual you end up having this temporary table in memory computed for every cell but the engine is smart enough to realize that it because the starting point is always the same all selected or all will provide always the same content for every cell so we expect that the query plan will reuse the same table computed once for all the following evaluations and this is true for the storage engine not necessarily true for the formula engine so maybe that the substitute with index because it has to be executed with the within the formula engine it will have to do this for every cell which okay okay that's just an idea then we have to do what so because this initial part is always the same the filter here is getting let me go to a new line so we can see this better so the result of the substitute with indexa will do this we do this analysis saying okay i want to check the customer name is in the values of customer name for a current filter context because values is only one value the current customer name um yeah this is executed for every row so we there are a couple of things we can do here because uh if you remember my measure was doing the calculation for all the rows but not for the total whereas this calculation is executing the same calculation for the total even though here values will provide a list of values all the customers are there and then we get the maximum number which by the way is probably not i mean i don't want to see any number in the total so probably i don't like this but now let me check what is the total is it correct so i want to check whether the value return for the total is different yes you see here that's so that's a point you see that here in this case we have the value for this rankings that started with index that shows that value for the grand total whereas we didn't want this we want it blank so let me fix this this could help the performance too so instead of using values i want to use the same technique i use it in the other code so let me just copy the code i don't have to so i can compute the current name in advance and i can copy this code right now if you look at this we have this rounded saves run that says greater than zero i can do the same i can actually do the same copy and i paste the code here at the beginning so instead of doing this if not is blank around the sales i can do this here we go now because i computed around the saves once if i need to run the sales later but not here this this has to be could be computed for every row that's fine that's that's good that's good i mean this is not reused but this current name this current name yes i want to reuse current name here and it should be equal current name instead of using values because we only want one here so equal to here we go current name so current her name okay let's check whether this produces the same result so i focus now on the result pane only i execute this code and the numbers are one two three four i don't have a grand total here which is fine okay let's move forward probably return a blank for a grand total so the grand total is not present that's that's fine uh okay so let's take a look at the server timings did we improve the performance oh yeah haha you see that by just using this technique and not doing the calculation for a grand total i reduced the number of search engine queries and the execution time wow now it's almost uh like our approach which is very very good okay look so let's move forward can we do something better um well the result is the same let me check so we have here an analysis okay we want to do the calculation only if we have a name selected and a value for around the sales for the current customer fair enough all selected here and our columns we create another table here the table is created based on this table so it's always the same these two tables must be different because substitute with index has to use both and this is the replacement here these are to run the sales and customer name are the two values that we find in left join table and right join table pretty good um [Music] yeah it should be it should be fine i may want just to check whether that run the states is not computed twice let me check the substitute with index documentation because this could be any expression so i'm just wondering whether the measure reference is getting because we already have the value no we don't have the value here right right because we we start with customer and so rounded sales has to be computed for every row makes sense makes sense probably i would have used a column here with a total pre-calculated in advance like the add columns we did at the beginning but not sure it could help i mean it's already good so it's probably not doing the calculation twice otherwise so let's check if there is there are so let me remove my face because now i need to analyze the xm sql query so what are we doing here in the xm sql query i'm looking for duplicated storage engine queries request because that would be the signal that we could have some redundant code index that we could try to optimize so here we have the list of the customer names for the entire table so it's like an all like all selected all that yeah because we don't have a filter over country or something so it's getting gold here we have the calculation that computes the quantity multiplied by net price this is the sales amount the original sales amount and for each customer here so for each customer we sum the expressions so that's uh uh with a join so this is just a table that has one row for every customer with a value the estimated number of rows is 2001. remember this estimated number of rows could be not the right one if we want to see the right number we have to go to the query plan and i'm doing this just because i want to make sure we have actually 2 000 customers with a value i'm not sure this is the right number and here you see that when we consume in the query plan the result of those for example this is probably the result of that storage engine query is just a finite e4 so yeah probably no 2001 here no actually we had 2001 2001 so actually this is the right number so 594 is probably the result of something else there is some other filter somewhere and this one is pretty similar by the way but this is by customer key oh this is interesting yeah because customer name that's that's good customer name could have duplicated values so if customer name has the same name across different customers we have the same value for the same customer name we didn't think about this but customer name could have a duplicated values and this could be a problem so so sorry i made i had to make a cut because i went i moved forward without actually showing the demo so i i started to talk and i started to manipulate the screen and i forgot to switch back to the screen so let me recap where i where what i was doing before the cut basically i wanted to analyze why we had here two storage engine queries that have the same behavior so let now i just remove my face for a moment i want to analyze the query here which is this this query here is executing the calculation of quantity my price by price by name and here i'm doing the same calculation by customer key so because these are different calculation these are actually different executions of the calculation and you can see this here too because we have four storage engine queries and zero search engine queries executed in the cache now why this happens why this happens well we have customer name here sorry we have customer name in the query the report is showing the customer name but in our calculation here we are we have customer used in all selected here now customer and customer and customer name are actually the same thing because we have we i probably we already have seen this we have here the customer name column displayed here which is actually name but is actually the customer name column of the table so there are no duplicated values i can make the assumption the customer key and customer name are unique the same they have the same number of nic values so instead of using customer here i can use customer name because i know that this is unique and this is the same granularity of my report and at the end of the day i wanted to do the ranking by customer name not the customer key so if i use this technique i have the customer name here i reduce the right join table the left join table two i just have one column instead of having all the columns of the customer table in this temporary table and this should run the squid faster so let's take a look first of all let's take a look at the result here so if i run this query again the result is still the same and if i go back to the query plan oh sorry to the server timings now we have 84 milliseconds so it is also faster and a reason why this is faster is also because we see that now we have four storage engine queries and two executed in the cache let's see what happens here the query here is identical to the query here the number line two line six line four and line eight are identical too you see that now we have customer name here and here so they are identical so now we are in a position where this execution has been optimized to a point where we actually have faster performance compared to our initial query so now to be fair let's go back to my code the code the original code of the of the of the article and i see that we use all selected customer here but we know that we wanted to use customer name so now this is a tricky consideration because i don't know okay for in order to make the the calculation generic enough we could have used the always customer i yes i get this but because we always use only the customer name as a way to do the ranking the alphabetical rank therefore the alphabetical sort order and so when we have ties for around the sales we want to use the customer name the customer name is unique so there are no reasons why i should use the customer table for this all selected so yes i think it's a it's a fair assumption to use customer name here so i'm optimizing i'm doing i'm applying the same optimization we considered in the we applied in the previous calculation in the so in the calculation suggested by the uh reader so here we have the same thing here all costs also the customer name customer name and current name so that should good okay let's try to analyze this in more detail so i remove my face because i need to look at this queries in more detail again before looking too much of the server timings let's take a look at the result so we have one two three four five six seven eight okay that's fine and now we can see whether we can remove some of some redundant query so first of all we have five storage engine queries here and we have three storage engine queries in cache so we are actually executing an additional storage engine query but only two are actually executed and the other three are in the cache the one executed is customer name so the list of all the customer names and for each customer name the quantity multiplied by price aggregation identical identical identical so at this point at the storage engine level we are spending the same amount of time we can just figure out whether we could optimize the formal engine because now the question is can we reduce with this approach the full ranging calculation so let's try to format the query in a single line so it's now now maybe it's better this one long line i can enlarge the video this way and this way so we have more real estate so let's see if i can do if i have some other idea so the customer with ranked name is a table in memory that for each name here does this okay so one thing i could try to do even though i don't think is part because i we are using all selected as a table expression and i'm doing this several times all selected customers is equal to this so i replace the same name the same table here you see that when i i when i highlight the expression it is highlighted by tax studio and so i can do this one two and three this should be okay so let's see if this reduces uh i press f5 to run the query and you see wow we already reduced the number of storage engine queries to just one because i reused the same uh calculation explicit way even though this didn't really change much because the cost is the consumption not the execution because the execution was was was in cash even though in a business in a in a busy server where the cash is not guaranteed to be available that could be a good idea for scalability okay let's move forward so this is one then we compute for each customer we do the ranking using all selected customers so there is a double iteration here because old add columns is an iterator rank x is another iterator but i don't have a way to replace that because this is what i need to obtain a number that defines the position of the name across the entire table because i want to use this number again i i need a number to to to implement my the mathematic i have later in the second columns this maxx iterates this customer with record name getting the maximum value of name ranked i could try to re okay let me try if i replace this with 10 000 just to for example in this case i don't know what is the cost of this but i can say okay if i replace this with a constant value is this a reason why i have a slow performance or i can just ignore the problem let me try so if i execute the code this way i know that 10 000 is large enough well i'm saving 10 milliseconds that's that's an interesting thing that's an interesting thing so the cost i mean i have a 10 percent cost for this measure just to retrieve the value i have to use for defining the range so i have an idea instead of using this so i could move this one here at the beginning because the number is always the same well no no i it is executed only once sorry it is executed only once i can i can use instead of using this think about this the number the name record has to be a value large enough to create a gap between ties but what is the maximum value i could get the number of customers i mean if i have 10 000 customers in the entire table i will never be able to get more than that so what if i write here count rows of all the customer right okay so let me try it could be much worse here we go we have now we don't have an additional execution and i think this works pretty well let me explain it a couple of times 1 2 1 0 2 97 you see we always have to execute several times let's put 10 000 with a common here and we have now 93 so the thing is that the number of cast the number of rows in a table is an information that is a that is known that is known from the from the engine i can write let me try another way it i mean it doesn't have to compute it it doesn't because it it's uh it's in the metadata of the table so if we do camrose of customers with the remove filter filters for all the filters okay so let's do this and this ends the calculator so let's remove this one and let's run this again just trying to see whether but i mean so this is ba i mean just because i don't want to use a fixed amount otherwise i could write one million but what happens if you have two millions of customers so let's let's move forward this way so let's say that i want to do this optimization it's saving something so why not so i format the code and let's move forward so now i have this value and here we iterate these all selected customers again computing this additional this additional column we use around the sales around the sales has never been executed before so it is executed for each customer this is a context transition multiply this by this number that we obtained before and we sum the name ranked which is the value we obtained here so an additional an additional optimization could be this one so we change the structure a little bit but instead of doing the calculation in a column here i could use uh i could use a variable for that so what d5 right here not not even a variable i could just write plus i mean it's less readable but let's try what happens if i do this so if i start this way and i comment the code here so let's use the common code and here we have the customer i of course i don't have to use this one i have to use this one and this one is repeated here to get the ranking for the current value uh yes because at the end we end up having two rankings because this one is the ranking to get to the ranking of the customers so i i think this will never be faster this would never be faster at the end of the day all selected columns yeah the only thing that can save us is because the calculation is always the same for all the cells because it starts from all selected customers here the all selected table customer here is computed only once yeah i'm trying probably the other solution is better also from the performance point of view so let's let's move forward so here we have this one current name current value we computed the value for okay this is the calculation for the current row we have here one thing we could do we could try to find the value in the lookup table which is actually what the other formula is doing so instead of repeating the calculation twice we could just try to find the same customer within this list which is probably faster so let me try so this one now is running in 111 seconds let's just check the numbers are correct yes so what if i say so my lookup table has the value i want to use so of xx x 1 x is enough so i want to filter the table here where the name of the column that has the customer is so all selected customer is this one customer name so customer name is the name of the column in the lookup table okay so in the filter car in the row context customer name is the customer name of the lookup table i want to check which one is equal to the current customer name the current name current name which i saved at the beginning if you remember i saved in a variable current name here so this kind of name i have so let me remove this so we have more space in the editor current name current a and current name so now the x here is featuring the lookup table now if i'm doing this right i have only one row because customer name is unique and here problem is that this is just customer sales column which is a value that we used to create a ranking yeah because we use the we used this technique we we multiplied the rounded sales by the number of customers we have in the ranking now we increase this number we we increase the gap between ties of rounded sales but in order to do that we do this expensive operation twice so probably the the cost is higher i i think i i cannot optimize this too much so let me try again so if i if i filter this so current name ranked i have this value by just getting the value of this axis so so let's get here we can get i know that i have a single value so i can get min x max x i just have i just need an aggregation because i cannot use values over a column of a temporary table so i have to say max x comma and i want to retrieve the customer name right so this is what i want to do and this should get rid of this step uh yeah this one so the current range ranked is this one but actually this is the same so wait a minute so here no this is the ranking no this is another thing this is the ranking no no no i make a mistake this has nothing to do because we have we we created a gap here so we need to retrieve the value of customer sales but the value of customer says is the value that already has the ranking so we could reuse this customer sales here and current value which is the value yes okay so yeah so current value current value has to be the max x of x which has only one row and customer sales so this one yes because i i reuse the value i already obtained before that's the idea yes now i get this customer says so let's see if i'm doing this right where is my square bracket come on keyboard hello this is what happens when you have too many international settings in your machine okay let's copy and paste i'm not able to get the square bracket here but okay so if i did this right this is not customary sorry customer i want to get this current value current value is equal to this okay so i can remove these two or let's just comment this i will remove the line but i need this final rank because the final rank x uses the value of the um around the states multiplied by the maximum number of customers plus the position of the customer within the table customers in alphabetical order in order to get a unique value for the ranking so the final rank is the right one and this is executed possibly only once when this table is iterated so the number here should be the same across all the cells of the report so let let's try so first take a look at the result run the query did i run the query i'm not sure yes so the result is the same server timings is 77 so let's see so here let's remove this let's go to a new line let's format the code here we go so now i can also remove this code here and i think one detail is that i can actually move this code here it doesn't really matters so much but just do okay so i have to copy cut and paste and run again so now we know that in any case we have to fix the article because the article doesn't show the more optimized code also for our formula and so now my formula here is executed in 84 milliseconds 90 milliseconds 83 you see that there is always some difference so there is a feature in that studio we can use not visible here so i have to go in options advanced and benchmark so if i use the benchmark button i go back here and i have here an advanced this benchmark that allows me to do several executions so i'm not interested in the warm caches so i just want to use the call cache execution even though for the former engine it could be a good idea to execute a worker so let's skip five here and run so this is my final test right so the final test is that okay let's run this code several times and see what happens looking at the different executions and now we do the same for the other formula so we will establish which one is better and let's go here one two three four five you have to wait because there is some time some some you know latency for each execution so let's take a look at the results here so let's usually what i prefer to look at is the minimum value because i'm executing the code in a safe in a stable environment because i don't have other users running on the same machine so rather than the average i want to see the best the best the best timing right because uh in the ideal condition we executed this in 71 milliseconds and 73 milliseconds when we were with a call cache so call cache is two milliseconds for a cache for the storage engine queries are just two milliseconds 71 milliseconds for the formal engine pretty good i know this is just the form ranges so okay anyway 71 73 these are the best timing i obtained so let's go to the other solution here and let's do the same so i go oh probably i have to opt because i already opened actually before so here in advanced show benchmark button go back and run the benchmark and five and five that's fine so let's run let's see what happens maybe i can take a second look at this code maybe there is still some possible optimization but to be honest the even though this is maybe slower a little bit slower it's actually much more elegant because uh the code is shorter and easier to read and you know if you but actually it is not even slower you see that if we look at the minimum value is 70 and 73. let's go back here this was 73 and 71. let's take a look at the average 7140 75 20 and here we have 75 60 74 40. you know it's very very close it's uh at this point i would i would use the solution that i prefer to read rather than the solution that i prefer to execute and nobody in the world will ever see the the difference maybe that there could be a difference with a larger number of customers by the way that's another possibility but now do we we optimize the code a lot and here we cannot do much because the entire logic we split it in several steps here is entirely managed by substitute with indexes so it's a very good it's a very good solution to be honest so it's the max at the end the filter is the same technique we use the in the other place here the filter is used to retrieve the final position whereas before we use the filter let me check we use this filter before executing a final rank x and i of course i could have used a better name here but nevertheless the goal was not to you know to obtain well maybe that we have to so let me just because i want to say this code to update the article so let me let me fix this so this is this is the lookup current customer and i replace this here and we retrieve the customer sales to get the current value and then we do the rank x okay good enough so now what can we do at this point first of all let's see what alberto answered and maybe i will add some other comment to the i have to retrieve the article here and alberto said that this is a good idea let's see if alberto made some optimizations he he wrote that he has one value like i did we we saved this value in a variable saving some time the approach here is using all selected customer name which is another optimization we already made so it's pretty much similar pretty much similar so yeah i mean this this answer is good i will check if i have to add something but let me let me check the code i have here so which one this one is the this one is the formula we have this one is the formula so one and two this is the substitute with index i want to check whether the solution provided by alberto is a identical to mine so the only difference is that we saved around the saves here once we avoided because we wanted to avoid having the number repeated multiple times okay yes this part is important this part also saves time so we have to fix this so good so let me copy this code okay so we can copy this code here and i prefer to write the code let me write a code this way so it is rank using oops rank using a substitute with the index equal to ops equal to and i paste my code i format the code and i format the code and i copy the code here in the reply and i can reply here just try that i just measured measured performance and this solution is as fast as an optimized version of our article which the published one has a couple of details that can run faster some optimization so when we have to paste the code keeping the format we just have to use this these two tags and here we have code now while i'm here uh just as a reference the code the optimize the code from the article as this and we can write here and this here so we do the same preparation of the code from the article so this one is the name of the measure i want to show this defined measure syntax is just for the measures that we define within a query but when you copy the code in a power bi you don't want to have this one so i format the code here and i paste the code here here we go oops no i didn't copy the code the right way copy and paste now it doesn't work today let it try again copy and paste here we go uh so excellent solution i have to say the code with the substitute with index is much more elegant once you understand how it works because to be honest i didn't i didn't realize that you could have used this for the resolution here we go then that's fine this is a good solution and i have seen here oh there is another approach made by amedeo uh faster with fewer customers when i filter in one or more brands but is way slower when no brand is selected of course of course because this in this case again the the performance is always a complex topic so let me go full screen the performance is always a complex topic because you can do the optimization for one report but you could have a completely different performance result with another report so we always have or something that is more stable and we do extreme optimizations only when we really need to optimize a specific report the reason why i prefer the substitute with index approaches because we were trying the query removing the filter so in the worst possible condition we got we got very good performance with all the customers visible so even though i could have a solution that is faster when i filter only a few customers i prefer the solution that is more flexible so even though this this solution will not be slower running with a smaller number of customers it could be not the more optimized one with a small number of customers which is a completely different story so because i get good enough performance i prefer to use the version that we analyzed today thanks for watching and enjoy dax you

Info

Channel: SQLBI

Views: 3,608

Rating: undefined out of 5

Keywords: DAX, PowerBI, Power BI, Marco Russo, UNPLUGGED

Id: 5PzVofgxSBY

Channel Id: undefined

Length: 62min 49sec (3769 seconds)

Published: Sat Jan 23 2021