Dataflow Vs Dataset What are the Differences of these two Power BI Components

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone this is Reza from Radek ad and today I'm going to talk about the difference between two very important components in power bi which are data flow and data sets what are the differences and in which scenario use each of these let's see I have done many presentation about multi developer architecture in power bi which uses both data flows and shared data sets and one of the questions that I often get is at the end is that what is the difference between data flow and data sets so that's why I'm going to talk about that in this video I have also a blog about this video with all the links to study more which I highly recommend you to go and read more information from there there are much more information to study about that let's start with the data flow data flow is a set of power Coryell scripts that you run I go to my blog because I have all of these images over there and a set of power Korea scripts that run in the cloud independent from power bi data set of reports it stores the data into Azure data Lake CDM or common data model folders as a CSV files the data set on the other hand side is different the data set is where the data of power bi is actually stored in memory all the relationships all the calculations tax expressions the connection to the data source that is the data set it's not usually that easy to see the data set when you are in power bi desktop you can go to task manager and see that there's a task with signal server and also services running the data set but in the website in the service you can easily see this this difference you can see there's a report and there's a data set report is released element dataset is everything but the visualization data relationships connection to the source expressions calculations as I said there are links to a study more I highly recommend you to go and learn more about the data flow and it is it now in this one we are going to talk about the differences there are different differences between these two one of the main differences is that data flow is replacement of your power query in a data flow environment similar to power query you can get data from different places to the data transformation versus the data set is replacement of your modeling you can create relationships between your tables write a tax expression and use it in visualizations right so dataset is for modeling tax expressions and relationships data flow is power cool data flow is the ETL layer if we consider these three as ETL layer or let's say data transformation layer ETL stands for extract transform load if you consider this is ETL the other two layers of modeling and visualizations dataflow is the data transformation layer data set on the other hand side is the data modeling layer the layer that you add calculations you might have multiple Mart's data marts data set for purchasing a data set for inventory and things like that and multiple visualizations connected to those dataflow feeds the data into the data set because usually the data flow result is not in memory it's not ready for visualization you need to get it using a data set actually in power we are using data from power bi dataflow build the relationships add some tax expression into that so that it becomes ready for visualization data set results however can be fed into visualization because that's ready all the calculations everything is ready and can be used for visualization dataflow usually access the data source directly I say usually because there is an option to use dataflow to get data from another data flow that's called linked entity or inside the data flow you can get data from another query called computed entity but if you don't have those scenarios usually data flow gets data directly from the data source versus data set also it can get data from a data flow but best practice to build a multi developer architecture is that your data set gets data from data flow not directly from a data source and that way you are decoupling these multiple layers from each other and you can of course have different developers on each side the data flow developer should be a person good with with power Correa skills with understanding a little bit of M understanding how the data transformation works how to build a star schema how to create dimension tables and fact tables what type of transformations is needed that is the data flow developer skill set the data set developer needs to know more about tax and modeling and know all types of relationships what type of relationship is needed here how to write complex tax expressions that is the data set developer requirement data set developer might also know power query but that is not his or her primary skill set users of data flow our data modelers so if I build a data flow and user is not going to use that report visualizer is not going to use that my take of modelers those people who are going to build calculations on top of this are going to use the result of data flow but the result of a data set is something that my report visualizes can use it because this is ready for visualization they can use power bi live connection to the source and build visualization dataflow is built to solve the problem of having one table used in multiple files instead of copying that table in multiple places you can create that using data flow and then in different power bi files get data from that without duplicating your power courier script so data flow solves the problem of having of needing one table in multiple files and data set on the other hand sides of the problem of having multiple versions of the same tax expression let's say you spend time you wrote a tax code which is rolling 12 months calculation for sales and you want to reuse it in multiple visualization that way you build it in your shared data set and reuse it in multiple places so in summary these are some of the differences as I mentioned already the main point is that dataflow and data set they are not replacement of each other they are complement of each other you need both of these to build a proper power bi solution architecture that works with multiple developers and has lowest amount of duplication of your code because using these you can have the two very important layer of a power bi architecture which is modeling and ETL or data transformation as I mentioned make sure you go and check my blog post link is in the description down below and read more from the other links that I provided there thank you for watching this video if you liked this video go ahead and subscribe to our YouTube channel we have weekly videos of power bi [Music]
Info
Channel: RADACAD
Views: 14,112
Rating: undefined out of 5
Keywords: Power BI, Power BI from Rookie to Rock Star, Power BI Architecture, Power BI Desktop, Architecture, Dataset, Dataflow, RADACAD
Id: eZW-DJCaq3o
Channel Id: undefined
Length: 8min 36sec (516 seconds)
Published: Tue Apr 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.