Exploring new composite models in Power BI Desktop - Unplugged #1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
ciao friends in this video i want to explore part of the new features in the new composite models released as a preview in the december 2020 of power bi desktop release now this feature has an official name that is a direct query for power bi data sets and analysis services but we prefer to use the friend dna composite models because it is what it is the ability to extend the model including elements coming from different data sources now the big difference is that instead of just importing tables like data query allowed to do before december 2020 now you can connect to existing models published in a parable data sets or in analysis services databases getting not just the tables but also the relationships the measures and many other metadata that you might have included in those semantic models now this video is part of an unplugged series which means that we don't have a script we don't have a particular goal we just want to share how we investigate in a feature in order to find what works what doesn't work or exploring what are the elements we have to study for future articles videos and maybe courses now the goal for this video is to get to two models that i will show you in a moment and combine them in a single model to create a single report i have different options in obtaining this with a new feature for the composite model and i will try to evaluate pros and cons of each one also taking a look at what happens from a performance standpoint using dac studio and the analysis of the query okay are you ready so let me show you what we have in the modes i prepared so the first model is this model let me start from the other one it says sales is a model that has a sales of products is the classical contoso example we use old times and we have a report here that shows the sales amount for each brand now if i take a look at the model we can see that we have a few tables like product sales customer and date the sales table has a transaction for every sale made in an online sales system or on the web we know everything about the customer in this case now the second model has other information even though it looks similar because you see that we have an amount by brand but this time the measure is different it's called receipts amount because we have sales made in brick and mortar stores and we don't have information about the customer indeed if we go to the diagram view here let me go to the dragon view we can see that we have only three tables which are product come on let's strange okay product date and receipts these are the three tables we have you we don't have the customer because we don't know who the customer is but more important because of technical reasons refresh different refresher requirements we have two different data models even though the structure of the two models is identical for two tables which are product and eight it could be a common scenario you import the same tables or maybe the same result of the same data flows in different models now what happens when i connected to these two models so take a look at the two reports we have here so one and two so let me go back here uh and let's show this side by side if i only have okay here we go so what i wanted to show you is that we have a brand brand and two measures my goal is to create a single visualization in a single report where i can compare the two measures and maybe i can create a third measure that sum the two existing measures for each brain now an additional element here is that even though the structure of the model is the same the content is not you see that here in the sales model we have sales for a brand called littewer whereas in receipts model we have sales for a brand called fabricam and these two brands exist only in one of the two models so this could be an additional problem what happens if the content of the two tables is not the same because the decisions we can make can produce different results so keep in mind we our final goal is to create a single grid or a single chart where we compare the two measures maybe we create a third major with a total amount so what should i do at this point so let's start creating an empty new power bi file and we connect so i published the two pbx file to uh workspace a demo sql workspace i have on my power bi tenant and i connect to the first model says so i go here and i type insert to says and i search for this model i created before here we go so it takes a few seconds and i have my model so far i just have a light connection to an existing model nothing new but as you see in my modeling i have made changes to this model button that i can press and create a local model that will allow to connect to additional models within the same report now this feature appears only if you enabled the preview feature that is called here let me show you preview feature um direct query for power bi deficits and analysis services this is the feature you have to enable to see that option now i could start clicking on that button or i could just try to import a second data set into the same model so at this point there is a warning that says if you want to move forward you have to create a local model which is exactly what i would have obtained by clicking that button what i get at this point what i get when i click add local model a local model that contains now a shallow copy of all the tables and relationships of the remote model is created locally we have no data so the behavior is the same but at this point i can add another connection for example to receipts and this creates a shallow copy in my model of the other tables of the receipts model there is also security risk now i don't have any problem here because these are two simple files no sensitive data and they have the the two models come from the same company so i'm not actually risking anything but the idea is that when you combine two models the query center to a model could contain data for the filter coming from the other model so this could expose the potential security risk the moment you connect databases that are owned by different people maybe different organization not the case here but this is the reason for the word so i click ok i move forward and the connection is created now what happens at this point so let me upgrade the motor so we see the new visualization and let's just move around a few tables give me a second and i will enlarge the font so we can see better what we have but first of all at 10 000 foods you see that we have tables with different colors the blue tables come from the sales model the red tables come from the receipts model now if i investigate here so let me oh i made a mistake sorry i had to put the date at the bottom and this here just because i already prepared another diagram that has a similar structure but you see that i have the same table product coming from two different data sources so it has to have different names so product is the first one product two is the second one so automatically a number is added but now i want to make it easier to identify which table comes from which data source so i rename them so i rename product calling it product says and i rename product 2 so that it is called product resets and i do the same for the date so i do the same for date here date rename date resets and date here we go it says i'm doing this just to make it easier to identify the tables later now this is a renamed table in my local model but the remote model is still the same and at this point i can try to go to my report and i can try to create my grid i can include in the grid let me remove this i can include in the grid for example from product sales brand we actually want to create a reaper by brand okay so this is our goal so for each brand we want to see the value of the sales amount so we go in sales and we include sales amount i also enlarge the font a little bit so that we have a better well let's use 18 it should be good okay now you also see that the major lost some of the formatting but this is a bag of the december release it will be fixed in an upcoming release for sure what happens if now i include the other measure which is i think receipts amount so this is what i want to get i want to get a report that i could transform in a bar chart for example where i see for each brand the two measures but as you see the sales amount makes sense the receipts amount doesn't make any sense because it's always the same number why this happens because this is the brand coming from the product sales table and in this model i have a filter on this table but now what i'm trying to do i'm trying to get the data from the receipts table because the measure is aggregating the receipts table and i'm not propagating the filter but at this point before i move forward i want to spend a minute thinking about my options because actually how can i transfer the filter between the two models i have different options and i prepared a small diagram where i copied the same diagram that you have seen in power bi and i prepared the list of four options actually the number of options we have is much bigger than this but let's start with these four ideas and let's explore what we can do with this so the first idea is i could create a table here so i'm exploring this the first one i could create a table product which is in my local model so i could use a green color here okay a green color this is local to my model and i could create relationships to propagate the filter in a single direction to the two different models and i will do the same for the date so this is the option number one if i use this option i have a single table product i could hide these two tables from my model so the model is simpler to use and the only concept is that the cardinality of this relationship would be one value for each unique product key i would use the product key the the primary key of the tables the option number two is wait a minute do i need to do this by product because if you think about my requirement i just want to create a report by brand and i have thousands of products i have only 16 brands so if i do something similar creating a table with the brands where i have one row for each branch i will create a relationship one too many where i have a single propagation here and here no actually i cannot do this because no yes i can because this is a unique and this is many yes i i can i can do this so this could be my other relationship yes so this will work and this will reduce the cardinality of this relationship the relationship here has a low number of nick values so this is the second option but these two options require me to create some table in the local model the other option is why i don't connect this and this because these are two primary keys of the two tables i could create a relationship here now at this point this will be a one to one relationship and when you create a one-to-one relationship we are exploring this example we have one and two the bi-directional fit that is mandatory when you have a one-to-one relationship it should work but again we have a high cardinality relationship between two different remote data sets so another option could be let's explore the fourth option now so now we explore so let me delete this and we explore the four option here which is we do a something similar by brand now if i connect the brand between these two tables it works but this is a many too many cardinality relationship and in this case we know that the bi-directional filter could be or could be not an issue so if i create a bidirectional filter it could be an issue if i create a single direction filter it's easier but for example if i say this is a single direction filter i had to hide this from the user to the user because otherwise he doesn't know what to use there could be many other options i could connect a table to the fact table in the other model i just want to simplify this because what i want to do i want to see what happens also in terms of dax and local and remote queries for each of these options so this is what i have in mind let's start from one from the third case which is uh the simplest one i just connect through a one-to-one relationship the two products table so let's go back to our model and let's do something that is relatively simple i just create the product key and i connect the product key here okay when i do this you see that in a moment you see the product key the product key the cardinality has been detected as one to one and a cross filter direction is bought this relationship can be active usually when you see the bi-directional filter you are scared that you cannot use this in more than one table then in one relationship but actually i can for this particular model i can use the same technique also for the date i'm not saying that it's a good thing to do i'm just saying we can do that right we can do that and we can keep these relationships active because this is not creating an any ambiguity between the models because otherwise i would have been asked to disable the active relationship in one of these cases as long as we connect dimensions between the models and we don't have bi-directional filters between the fact table and the dimensions everything works maybe it's not a good idea but let's let's examine this in a moment so what happens now if i go back to my report you see that wow it worked right apparently it worked um let's see what is happening though so there are a couple of things that i may not like one is that you see that for littler i see no receipts amount and if you remember litor exists in sales not in receipts but there is something strange here if i go to receipts so let's go come on i don't know why i'm not able to move receipts okay here we go so i put receipts on this side of the screen and i put the composite model here on the right side i think is everything is visible so pay attention the total for receipts amount is right the numbers for each row is the same that you have here so actually the relationship is working but i see that here i have a fabricam which is here that is not displayed here right because uh we are using the brands from the product table coming from sales and in sales we have no fabric and fabricam is a brand sold only in brick and mortar store whereas ditwer is sold only online and this is the reason why we have different content in the two tables if i use one of these tables the other doesn't have this information and look we don't even have a blank row because when you have these uh weak relationships or whatever is the name they they rename this uh technical name but when you have a connection a relationship between two different models then we do not propagate the blank row for unknown members and that's a problem so this is something that affects the feature and could suggest me that if i know that i could have this problem this could be not a viable option even though it works right but if i want to see both brands i have to create a table that has all the products and if i use any of these two tables i have it doesn't work if i replace the brand i have now from the sales table with a brand from the receipts model so let's go here and let's use the brand from receipts here you see that now once we have a result now we see fabricam but we no longer see liquor so that's another problem okay so let me go let's go back to the uh version where we have the sales with liquor just so i start from sales and now we start to analyze what is happening under the cover so let's open dax studio and in that studio i can execute the code that is executed by power bi so first of all let's go in power bi enable oops enable the performance analyzer and capture the query generated for this visual you see the metrics has a query and that's query i copy the query and i paste the code of this query here now i have the latest version of text studio here 213.2 still in preview at the moment of recording but i hope it will be released once you watch this video the code i receive has actually not only the dax query generated by the visual to create the local model but also the query sent remotely so let me show you so we suggest to only copy the dax query but i just want to show you what happens if you click on this button both query means that i see in this list not only the query the dax query that you see here that's query generated from the visual to recruit the data for the metrics but when you scroll down you see sql query which is the query sent to the remote data set wait a minute this is not sql i mean if you look at the code after sql query this is another dax query but this is one of the dax queries sent to the remote data sources and the reason is called sql queries because i hope they will change it that should be called remote query right so that or that query query right something like that so actually i could see these queries i could i can show this query but the problem is that if i look at this query i don't know exactly where these query here are sent and what is the cost of each of these queries so if i want to do a performance analysis i prefer to paste the dash query only okay so i click this this is a single query let me start my query plan and server timings so that i want to see what happens in terms of execution time when i execute this query okay so if i click run i see that the result has the same content of the grid of the matrix you have seen in a report just without the formatting so far so good and if i look at what happens in the server timings i see this so let me enlarge this a little bit and also let me change the layout so we have more space to see the content so you see that this query ran in two second and a half a lot of time has been spent in a formal engine half a second in a storage engine for these queries but if you look at these queries it is called they are called sql subclass but actually they are dax queries because the remote models are still models that accept dax queries and so if i look at this code this is a defined that is getting the list of the brands and for each brand the value of the sales amount from one of the two data sources here we have another summarize that gives me just the list of the brand that exists in the product table so all the brands no matter what is the what the the value of the sales amount is then we see now because the names are the same i'm not able to recognize which one is the query sent to says which one is the query sent to receipts unless i have here the name of the measure sales amount means that this is the query sent to sales but then i have so we have a list of brands with the value sales amount the list of the brand another query that for each brand retrieves the list of the products wow this is interesting for each brand give me the list of the products in terms of product key and guess what we are looking at the brand in one table we have a relationship connecting product key we have to retrieve the list of the product key for that brand and apply this as a filter to the remote model with this list wow this is very long there is a long long long variable with a list of numbers and this number corresponds to the product key product key for the key corresponding to each brand now the brand is identified with just a local number not just with the name but the idea is that this is the list of the product for the brand identified with the local number 12 then we have also four three four seven and guess what we have thousands of products we have at least a long list of items here and when this finishes this is used to apply the filter with three tasks to product key and get the receipts amount and imagine this this is this queries it's taking 100 milliseconds for the execution but i think that we have to spend more time just to create the string with this description because in this case the difference is that xm sql is a textual representation of an in-memory object tree but this is actually the real query a text to a query sent to another engine so it has to be bit as a string then it has to be passed as a string to create the query plan and to execute it so i suspect that this formula engine is part time required to generate the query part time generated actually spent it to consume the result of the quiz even just to retrieve the number of the product keys and build the following queen now we have just the total of sales amount is the grand total for sales amount and this is the grand total for receipts amount so you see that my local model is getting data from other models this way and in two second and a half is providing me the result so i understand that this technique is expensive because of the the cost so let me just save this let's get maybe excel so let's see oops not this one sorry let's see if i can open an actual excel file okay because i want to write down the numbers oh but this is not really interesting now so let's move forward and let's say so we use the number four so thickness so we use the relationship product key and it took so let's use here technique so technique okay and formal engine and storage engine so let's copy the values we have here we have 1786 six and for storage engine 555 and total is equal to this number plus this number here we go maybe we can create a nice table and then we can also create this as a measure so that we don't have to type this again later okay here we go so i want to compare so the idea is that i want to take a look at what is more expensive um fair enough i don't need the decimal points and we can move forward so let's go back to my whiteboard where i had i said okay now we tested the version number three so let me we made this okay we made the the version number three and in this case we created a connection between processes and product receipts using the product key now i want to do the same with the number four i want to connect the brand and the brand here right so if i do this using the model i have i can just go back so let's go to the demo machine and let's go to the untitled here so i can do the following i can change this relationship right this relationship does not have to connect the product key i want to connect ram so i connect the brand here and the brand here now because the brand is not is not unique in the two tables we have a many-to-many cardinality with a bi-directional filter still active okay let's move forward i don't change the date at the moment i'm not interested too i just want to see what changes when i use this approach so first of all the numbers here are the same but of course at this point i would have different numbers if i go below the brand level so i would see completely different numbers i don't want to enter into this topic this is just another let's say if i just want the brand i can just use this technique right i connect the brand many many it works so the query i assume the query is the same i can just run the same dax query here with the new structure so i i didn't change anything in the query because the report is the same the query is the same i just hit run again and i see what happens so at this point the execution should be different and actually the execution time didn't change much if you look at the numbers let me try again maybe the first time it takes longer but here we also have to think about the cost you see that now the second time was much faster and not because of the cache but because uh probably the i'm communicating with the with the remote model and communicating with the remote builder we have to include the network latency into the equation and this also means that when i publish the model probably the same performance could be different because at that point we have models communicating in the cloud within the same tenant it should be much much faster but just to understand what is happening so in this case what happens we have so let's take a look at the content here so we have every query is faster but i think that the reason is that here okay this is simple this is simple just give me the list of the brands but you you can imagine the list of the brands is much shorter we have 16 brands so it's a small table that has to be a here we have another query that for each brand retrieves the list of the brands we have which is obviously only one and here we have the filter applied to the rima table and you see here that this time the list of the local brand in the report related to the keys that have to be propagated through the relationship only has the value of the remote branded to filter and the remote brand is identified by the string we don't have a brand key in this case the relationship is defined by brand but because this is just a one-to-one so we have for each brand we have a remote brand and you see that here if you remember there is um this is interesting because lit where liquor exists in the sales table let me go back to sales so little existence says we know that leader doesn't exist in a remote model but the engine doesn't know that so the engine sends litter as a request and because the engine only knows about liquor and not about the other brand which is let me check i don't remember fabricam fabricant doesn't exist in sales and because it doesn't exist in sales it is not included in this list you see that fabricam is not included here however the the query is relatively small and this doesn't have a long execution time same here same here so each query has a small result and is short in terms of text so this contributes to a faster execution time just to make sure that the cache doesn't pollute my numbers let me try again to be honest it's too fast to be true so let's see what happens now you see that we again we have the so probably the even though we don't see the we don't have the diagnostic this is something we could have to improve in that studio i think that that studio today doesn't intercept additional events that could help us to understand what happens when we have that equity involved so this is a something we have to improve so i suggest you always run the query using this run with clearing the cache and it is very unstable as you see but again probably is because also the remote model is in cache or not is in in memory or not we are connecting to power bi in a shared environment so when i wait a few minutes maybe the power bi just you know unloads from the memory mine model and load something else so it's a really compass to elate the performance so let's keep let's keep an eye more on the complexity of the query that is generated at the moment the performance is something we cannot reliably test at least having a local model we should probably do the same test the publishing the model remotely which is something i can do maybe if we have time later so let's go back to our diagram so now the problem is that i know i'm not confident about what is the number i could publish one thing i could do is i could run this several times and get the best number there is a feature a run benchmark i could try i could say let me try this with 0 i want 0 and i want 3 execution let's see what happens so this is a feature that in that studio i can run the same query several times and i can see what happens as you see if the pdfs are executed close each other even though i i clear the cache i get you know results that are very similar so let's say this let's say that we use the minimum so the best the best result okay i use the best result in my benchmark so that i hope this way i remove you know noise from other queries running at the same time or something like that so here we have relationship by brand and uh of course i have to fix the number for the other execution muscle but let's go here so we want to use the minimum is 421 the duration total 421 and the storage engine is 395 395. so we can compute the form ranging as a difference between the two each oops not this one but the uh storage engine this one okay and i can use this here but of course i have to fix these two numbers and now i have to so i have to go back to the model untitled i restore the relationship using product key and i run the same benchmark this is the right thing to do so let's change this to product key one and two one to one bi-directional okay let's go back to dark studio here we go and we run the query sorry run the benchmark again and i think i use the five here and zero i don't i'm not interested in the warm cache execution i just want to see the worst case scenario and we see what happens okay almost done you see that it is lower okay so far so good but now we have more reliable numbers for our comparison we have for this 566 as total and storage engine is 5 through 34 5 34 okay good so now we have our numbers and we see that as we expected brand the brand solution is a faster at the moment because we have a smaller number here so the execution was faster even though the difference is not so big as we have seen the first time so let's go back to the whiteboard and let's examine the other options so we made this test now now we have to go to the other solution which is interesting because we want to see all the brands not just the ones that exist in either one or the other model but in order to do that we have to create a table here products and we have to connect this table to the other tables so let's do that let's go to the demo machine and let's go to our untitled here so how can i do this well first of all i have to remove this relationship so let me delete this and i had to create a table product and what is the easiest way to do this i will use a calculated table so i can create a table here and i can create my table now i can call this table product product is equal to what well i could say it is it is equal to product sales but if i use only product sales i get only the brands and the products that exist for the sales made in online sales because i want to include also the other brands i need to get to two of these tables so i do the union between product sales and product receipts now i'm lacking because in this case i have two tables that are that have the same identical structure so i don't have to worry about different columns uh different names or whatever if i had different structures i should have spent more time writing the right select columns of the right dax code to get the right result now the problem is only that union will include duplicated values and i need a table that has only one row for each product key so i want to remove the duplicated values so i write distinct here we go so this is my final calculate table and when i create a calculate table here it takes a few seconds because of course it's querying the two models and generating this table but you see that now first of all i have a table view and i can see the content of the product table i cannot see the other tables that are you know defined remotely but i can see the content of this table and i can use this table in my model let's see where it is here we go so i moved the table here now i can enlarge this and i can create my relationships so here let's see where we have product key for the key here i want to connect product key to product key here and now here you see that the relationship proposed is one to one but actually this time i don't want a bi-directional filter i just want a single direction filter so i have to say even though it is a one-to-one i say i want to say these are one-to-many i pretend produce sales to be on the many side and so i say okay one too many and single direction what does it mean if i go back to the diagram view you see that the relationship i have is an active one and propagates the filter from product to product sales because i want my local model to filter the remote modules not the other way around so far so good so i do the same for the other side the receipt side and i connect from the key to product key now i do the same here one too many single direction okay let me check because i want to make sure all you you see that it's easy to make this mistake one too many the other way around i didn't pay attention when i created the relationship about the the the the position of the tables it was probably product receives at the top and product at the bottom now you see it's many to one but if i change the one to many it means from product to product receives which is what i want and now i had the right arrow direction okay so far so good now of course at this point the query has to be different because so far i used the brand from the product sales table what i have to do now i have to use the brand from the product table now if i use this approach i probably want to hide both the product sales and the product receives tables to the user so in my semantic model i will probably go here in the local model and i say okay this is hidden hidden reaper view the entire table and the same for product says because the idea if i have a local table product i just want to use this table right i just want to show this table and whatever selection has to be made through this table now this simplifies the visualization here you see that i have only one brand now but more important look at the result you see both fabrica and litor and this local table is what allows me to get the right result so in terms of result i get i prefer this solution but what about the performance let's take a look so i already have a um dac studio open but i have to capture the new query because the the the report changed i changed the brand column i'm using so the query has to be different so i start recording i refresh the visual and i capture this query again then i go back to that studio i select everything here i paste my code task query only we know that we will get the the the the storage engine queries later and this is what i'm gonna do i'm gonna use product brand product is the local thing now this time before doing the benchmark let's go let's execute this query once let's take a look at the oh i have to remove the selection run the query again and you see that this is the result we obtained so far so good but if i look at the server timings what happened well this time we have one query simple give me the list of the the list of the name of the brands then give me a table with oops this is something that happens this is another problem we have to fix in that studio sometimes when we keep the connection open with the profiler with several times query plan sometimes everything disappears because there is some probably probably there is some event we don't manage correctly we have to fix it again so i was saying here we have the list of the brands and we know that we have a 14 rows you see that now because the table is the the query is an xm sql query and we have the estimated number of rows and the estimated size of the data cache that we obtained as a result of this local query so we know we have 14 brands and 2500 products this table has the list of product keys for each brand the production brand is that are the two columns of the table obtained here now the model has to send two queries to the remote models and this time the first query is a query that has a for all the brands we display locally the combination of the product key for each brand this is used as a filter to retrieve in this case sales amount so this time list of all the products sent to sales model and the following one has a very similar approach you see that the list is the same but this time we retrieve the receipts amount so basically now we not only have one query that has a long list of product keys we have two because we have the local model it has to transfer the list of the product key to buff okay so um let's do some benchmark because as usual we don't know which one is better and so we run the benchmark we just use the call cache execution and we click run so we retrieve the number so that i feel my small excel even though i mean we don't have hundreds of thousands of products so actually there is some difference but it's not huge so here we go we have 402 380 so we have 402 380 here and this time we use the local product key and the next one will be the local brands okay so let me copy this here and here oops oh yes okay let's do this later so if i did everything correctly so far even though we transfer the list of the products in two relationships the local product is faster because guess what um we don't have to retrieve the list of the product kits from a remote data source so it's true that we send a two long list of product keys but we don't have we don't have to retrieve one from a remote data set so actually it's not so much better than brand but it is 20 better than the first technique they use a direct relationship between the two models now we can do the same with the brands right so we can do also the similar approach here but this time we have to change it so the change we have to do is bigger because this time instead of using the product table we will use another table i could keep this one just to just in case i can create another table here so i have to go here because the table cannot be created in the diagram view and i create here branson brands brands brands brands brands is equal to distinct of union of distinct or oh no blank let me use the right all no blank row of the brands from the product says comma and all no blank row from the brand from the product receives i have to use on blank row to avoid circular dependencies once i will create the the relationship between this calculated table and the rest of the model so this is what i'm gonna do okay just formatting the code you know it's not formatted it's not the accent i cannot have myself when i have this code here we go now we have the brand table is a small table if we if we go to the table view this table is small let's go here brands we have the list of all the brands you see that we have both litter and pros were here so far so good and now we can go to uh we can go here to do this visualization brands and we connect the brand to the brand here and now we have a list of brands we have a list of brands um this is a one tremendous yes because brands is unique so one too many single direction no problems and we do the same here brian brand here we go okay now of course if i use the brand from so let's remove this if i use the brand instead of using the product brand i have to use so i removed this brand here and i use the brand from the brands table once again i have to go in performance analyzer and i had to start recording and refresh visuals copy the query go back index studio and paste the code tax video only here we go you see brand's brand is what we get now i execute this query here oh as usual i don't i have to remember to not keep something highlighted when i start the query because otherwise you execute only the highlighted part of the query and let's take a look at the result the result is fine and let's take a look at the server timings now what we have what do we have here uh don't pay attention too much at the beginning because as we have seen we have to execute the same query multiple times to get reliable numbers but now you see here we have oops i have to to run this again sorry one second okay retrieve the list of the brand from the local storage engine retrieve another time the list of the brands from the local storage engine it seems redundant i don't know why but it's very fast so i don't care here we have this technique where we have the list of the local brands identified just by number and the name of the remote brand name and you see that now we have liquor and we have fabricant for both the query that goes to sales and the following query that goes to receipts we have litwear fabricam to get the receipts amount and of course we have the grand total for sales amount and a grand total for receipts amount so once again run the benchmark i just want to do the call cache execution and see what happens and as soon as we have the result i can look at the it seems faster than the other techniques 310 288 310 288 so we have 310 288 and remember i always got the um the faster execution time of all the options we examine so as you see just in terms of performance the table with the local brands was faster and i can make the assumption that once i publish the model uh in the cloud the performance should be better because i reduce the latency to communicate between the models now i'm not sure this will work so let me try and so let's save save this one as mess up demo i already made this demo before but i never tried this remotely and now i will publish it so publish in demo sql bi okay so why am i doing this because at the end of this test i want to run the same query with dac studio analyzing the profiling of the remote execution let's see if this works first of all let's take a look if the model has been published correctly and it works and i think it works hopefully yes we have the numbers we don't like the format that's fine and now we go to dax studio and we try to change the connection so let's go here and let's create so i can remove the server timings in the query plan now because we have to connect to another model and here we have to you oh i have to go to to to my workspace demo sequel bi then i have to go to settings and premium and workspace connection i get this string and i use this string here okay just remove this uh this is necessary because uh when you have uh when you use the dextre before you have an initial catalog which is the name of the database used before you have to remove it i connect with my credentials i didn't work what happened okay probably that's uh no it worked it worked okay we have oh wow it works so you see we have a mashup mashup demo is what i publish so i have to connect to the right one oh it seems i kept the the auto date table that's my bad i never do this mistake i don't know why i did it but let's move forward so here now i have the same query i used locally i could run this tweeter remotely so first of all let's see if it works it does work so let's activate server timings and query plan just several times is enough and let's start as soon as it is ready now you see that this could take a while we sometimes i have to increase the connection timeout of dax studio when i needed to start a trace remotely that could be frustrating sometimes but i hope that microsoft microsoft is aware of this problem so i hope that they will improve the time required to activate a remote tracer and it's ready hopefully so let's try it's it says that it says it's ready but i don't see the run button so let's see the output no timeout timeout okay so so let's do this i close dark studio and i open backstudio again so i also show you the timeout setting that we have to change so before doing the connection i go in file options and i think is in the connection timeouts here so this one we bring to 120. usually 60 is enough but 90 is better and also this one 120 so these two timings just i want to make sure that i am able to connect so connect to this remote workspace use this account and once i see this metadata correct i select the right database and then i click on server timings and this could take up to one second to run okay as you see after uh almost one minute or more than one minute we have our quiz query trace started so i cut the previous step so that you don't have to wait one minute watching the video and now i can start my query i just have to clear the cache before and i run so let's see what happens query started it takes some time because of the execution the remote execution and i hope it will not take another minute because this is not the time for executing the query this is the time to communicate with the remote model here we go let's try to run this again this is 20 seconds is definitely too much so let's try to run this again and as you see now we had some connection but a connection issue but you see that the the query ran in much faster time than any execution we have seen before the total execution time was 280 milliseconds so if i go back to my excel file so we are executing the local brands published and this time we had uh the storage engine is 255 and 280 and this is uh the timing is just 25 milliseconds of formal engine so as i can imagine the reduction is mainly in the storage engine which is the time required to retrieve the data when the the local engine executes the query over the remote data sources and the reduction of the latency between my local connection and the remote data set because now everything runs in the cloud in the cloud probably the the latency is smaller and we saved under 10 of the execution time there could be probably more when there are more queries and larger data sets so let's go back to the window here so what we have seen we made the connection with the product calculated table connecting the products here and we made also the brands calculated table where we basically reduced this to just a list of brands and this reduced the cardinality of these two relationships so that we have a smaller number of elements to be transferred in each query so let me recap what we have seen uh as you see it takes a long time we also didn't explore other options and in each of these options we have probably many other details where we could investigate more but as we could have expected when we have a relationship that has a smaller number of unique values a performance is better if we try to connect tables between remote models it works but we have the problem of the consistency of the results you have seen that the management of the blank row doesn't work in remote models as it works in local models and if we wanted to get the list of all the products when we have to merge products coming from two different data sets we have to create a local product table that has a combination of all the products no matter where they have been defined at least once and this could require an additional work we didn't explore the problem of analyzing the cost for creating the local calculated table and the problem of refreshing the table because what happens if i don't refresh my local table and in the meantime the remote models create new products or something like that so there are many many many things that we still have to explore but i hope that looking at what we explored in this video you have seen how we can investigate on the options available on the behavior of these queries so that when we run something locally or remotely we have to consider all the side effects that this could generate that could affect the performance but also the result that you have seen are not the same depending on the data model solution we use however i think that this feature is great we just have to learn how to use it in a proper way and in the coming months we will publish more articles more videos and hopefully best practices coming from the use of this feature in the real world enjoy dax
Info
Channel: SQLBI
Views: 19,381
Rating: 4.9909091 out of 5
Keywords: DAX, PowerBI, Power BI, Marco Russo, UNPLUGGED
Id: 9lkVk4t2qL0
Channel Id: undefined
Length: 63min 20sec (3800 seconds)
Published: Sat Jan 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.