What is a Power BI dataflow? A deep dive on Dataflows for Power BI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so in this video we're going to go through what the data flows or we are going to do a deep dive but for those of you they just want to know what it is you know without all the details I am going to do a quick explanation on the beginning and then we will fill in the gaps as we go along to do this I am going to go through basically the data flows in power bi white paper I'm going to link down below in case I haven't seen it but I think this is a great explanation of what theta flows far I've been here in the world ADA flows for a long time I haven't been able to understand what it is I think I got it I think I got it let's see I will be right in here on the tablet also around the computer so you don't you do some diagrams and so flows and you will see that below so if you see me right here you know what I'm doing and data flows what is it data flows very shortly explain is like power query online and then you might wonder okay so why Microsoft Harden has not call it a party query online you might wonder well it's not just power query so data files uses power query but it does a lot of things that power query does not do and this is one of the things that we will see when we go through the white paper but surely explain the way I understand data flows is as follows so you have a bunch of data sources let's pick a pen so we have a bunch of data sources we have I don't know dynamics for example like in the data source and then we have some Excel budget in the workbook and then we have Salesforce okay those hard data sources the data leaves in there you know proprietary source as usual and now we've been doing up to now we have the power bi service here this is power bi service so this is you know the cloud service the one that lives up there and we've been consuming these sources let's put in another color we've been consuming these sources in there okay so data flows because is the power query online will allow you to consume these services through that so here we will have data flow one and you know data flow leaves in a workspace so you will have here workspace one and then you will have here worse work space - and here we will have one data flow here we will have another data flow and here we will have for example another data flow okay so a data flow at least in a workspace and it pulls data from the different data sources using power query em great so you can combine the inter flows and the output of the data flow is actually let's write the next books which is here you say a sure data Lake is generation 2 so you are data gets moved from the source to a data lake and it gets stores in what they call it a common data folder so this creates another CMD and they seem these are composed with you know small entities or tables as they coded on the white paper which are like CSV files with metadata so here we have the tables that we have pulled through the data flows into the asteroid data Lake and same with you something else that is available is because everything now is an azure and is on the cloud you know there is a bunch of services in Gnasher that is meant to be for managing data for data analyst data scientist because everything is in the cloud now you have actually the possibility to access that data like pull through for example Azure machine learning you have SQL data warehouse you have data bricks you did factory all kinds of services that can actually go in and create its own CMD folders that can then be consumed to power bi okay so looking at this picture what is a data flow a date of law allows you to connect data sources little pulls the data it transforms it and then it pulls it to Nasher data Lake where it gets the store refreshed and they can be used and that it can be reused through Power bi or you can you know use all the data tools that are available in Asscher then create entities and you know suddenly within Microsoft ecosystem data can be easily used and the tools be easily pulled also so the data is not just limited to power bi so what do you think that is basically what it data flow is it does not store any data by itself with just pulse transforms and sense and then orchestrates the actual you know pooling of data so it gets done correctly and because it is everything is moved to a short data Lake you have other benefits of that so there is a short explanation of data leaks now let's go into the white paper I think it's a great white paper is not technical at all Eadie's it is I think he said understand and is very well written so I recommend you to read it no more I mean even if I'm going to go through it here so you can also make your own conclusions about things okay because I'm obviously interpreting these through my behinds of you so here it says data preparation is considered the most typical expensive and time-consuming in the analytics and be a project and it says that this new data flow capacity can change the time cost and expertise required for data preparation in a fundamental way it says it has a data flow has a model driven calculation engine that will cut the cost time and expertise required for cleaning data and we will talk a little bit more about that what that means and it goes talking about the self serving revolution how vendors start to move from IT bi centric - you know self center in self-service bi where you know users can just go and grab and use instead of having to contact IT for their business intelligence needs and it says that Microsoft through power pivot attacks and the very fact they actually responded to that self-service market and he says here a decade later allows tens of millions of Excel and power bi users to enjoy bi technologies work with massive amount of data Lord's work with the scale sophisticated large models without any formal bi training I don't necessarily agree with that I think you need to have training in order to do especially work with massive amounts of data and create sophisticated models it did require training but you can do a lot of things without knowing that much I give I give them that actually so he says ok that is great but you know the actual data preparation there are not a lot of vendors that have tools with that and then you say something now because now we have data flows and as I say two floors I created managing the power bi service and it is directly in the browser so it is the power bi power query online where you clean the data and it goes say India yes it you know you have now power query but there are some limit Asians to the power query that is in power bi the first one is that the data that you get you know is important to power bi is lock in power bi so you cannot reuse that data that has been clean elsewhere which I'm not sure you really agree with that because you know you can support power bi data through different tools I know there are limitations you can ask for billions and trillions of data in that case this is true it's just a power bi tool but there are ways to do it if you're not working with that much data this is another indication that they the flow C is for enterprise and companies have lot of data ok that is what this is meant for second managing complex data transformation and scale is difficult and here is meaning you know when you have double this bi project where you're pulling data from a lot of sources and you have to make sure that the data is synchronized and the steps had done in the correct order so the data at the end is correct that is something that you can do a mini version of it in power bi but power bi is not the tool for that and then he says that all data needs to be ingested through power query connectors which makes for challenging work when working with large volumes of data so ok now it goes on and says that for a large-scale complex and reusable data preparation project stated said this data flow is these two thing right this is for again large-scale complex and when you would have reducible data so note that and he says data flows how to ETL ETL is actually cleaning process it would Excel is to programming and it says here that handling you know extracting data from one table or one database and then putting everything together and doing that in power bi is very easy but when you are creating the system of you know extracting for multiple systems at the same time and doing a lot of transformations that might be a bit challenging to large scale power bi does not work and the it says data warehouse and experts create ETL packages that handle extraction transformation tasks and then they have to do this again orchestrates and they call it but it but it's basically the syncing or the data flows so things get done in the right order for example if you have a data that you upload every day fresh data and then you move the data to a historical data you have to make sure that if you are going to delete this table this doesn't get the lid before you actually move to the historical table so that has to be synchronized and it says that you know a significant little project includes hundreds and even thousands of ETL packages there has to be chained together with complex logic this is you know something that data flows can help you with he says data surface flows are for self-service ETL by business analysts cell service ETL by business analysts I don't know the definition of a business analyst I don't know if there is like this is it or you know people use it differently the business analyst I know don't do this type of work it's more like you know data warehouse people I've seen but let me know your comments in the comment box the remediate is you know like people that you know the build data warehouse and they put big databases together to make it accessible for different reporting stuff that kind of work maybe that is a business that at least I don't know so he says data flows is going to you know it has been designed around five principles that is intuitive and familiar authoring through power query you know again the orchestration of the flows job allow you to work with a lot of data big data you have the common data model we'll talk about that in a second and then the complete native integration with the rest of the power bi system where the data set dashboards reports and also with all the other bits he says data flows are like Excel and the analogy goes like this you know in Excel every cell that has a value has a formula and when you write in a lot of formulas in Excel every time you write a new formula that is referencing all the formulas you always get the right data even if he's like referencing the reference that is reference reference you know this massive excel files and the way that works is that Excel has a like yeah but diagram that maps all the formulas that you have available in excel file and it just tells Excel first is this formula than this one they start one on then type one under that one so it actually that's all that in the background you never need to worry about if Excel is going to give you the right value see as always and they don't have to even think about it how does that work maybe some of you have but what is happening is basically that is mapping everything and then it just follows the map to be able to refresh everything on to give you the right values every time and they say that day the flow works exactly the same way in Excel and is being designed to work exactly the same way so it creates a map and then then it just follows the map to be able to update everything correctly without you having to do anything exactly like in Excel which is fantastic for those that are doing that type of joke it says I said in Excel data flows in it is they call it okay let's start with that they say data flows define a collection of entities and they say entity sees liked he bows so instead of flow will generate tables and these calls show it here on this common model CMD folder so here is creating those small things are the entities already folders or the tables I mean okay so one day the flow can generate many entities or tables I think table is a very world actually and each data flow has a formula associated with it so you have each a small thing of these has a m expression behind so he's using the power query language to do it and each one of this has one M code or you know M expression that tells you know what it needs to be don't know how things need to be clean and he says it is more powerful than the Excel formulas because you can do more things you can join tables you can pivot tables you can filter aggregate tables you can exactly anything you need and then it goes the magic of M expression it says learning M he might not be like it might seem like a significant burden on the high bar for most analysts but aim is very powerful and it says that you are already authoring very complex expressions through power query and all the demos I have seen with data flows what people do is they actually create the query empower query they copy them code and then they paste it in dataflow because I don't think that the you know people are used to write em just ask em you use the interface and then you tinker with the code I have a by the way a course on em and a power query just go and check them out in case you want to learn more about those so he said ok we're already doing a lot of em and he says when author and data flows you have all the magic of M on power query within the power bi service in a power query online experience so you're using power query online to be able to you know create those transformations but it's happening a lot more than that and that's why they are not calling it your power query online he says a data flow calculation engine it is not the same as the M engine and they say we establish that every data flow table has an M expression and the dairyman's pression can refer to other tables in the data flow so you can have the tables relating to tables in the data flow it says like excel the data flow calculation engine analyzes the M expression of each table finds a reference to tables using that expression and use that information to build a dependency graph that's what we were talking about excel and this is seen in power bi before it sorry I'm sure you can see you recognize that diagram is just you know the query dependency flow that using impart way and this is a way that power query instructs the data flow to actually execute the different refreshes which is great it says say the decisive you should be done in parallel or well it is just orchestrating everything how everything should be refreshed and is done everything behind the scenes like in Excel so you don't have to worry about it it says for projects that have dozens of tables or more these multiple data flows are created managing a single power bi workspace and he says when you have large and combinate projects these is the way to actually do it it goes through how the actual execution is the win and what type of refresh you have available let's see one of the things that they mentioned here is that you can have different Refresh schedules for so for example this data flow can refresh once a day well this data flow can refresh once a month so for example the budget you do budgeting once a month so you'd only need to refresh in once a month while the other one can refresh once a day and the data flow will know how to refresh these in a way that you always get accurate data on the store on the actual data lake which is great he says you failure occurs while processing it will pick you up from when it failed so you will not do everything again what else it says the data flows can refer to tables owned by data flows manage in other workspaces so you can reef you know this data floor data flow can refer to entities produces by that date flow and it says in many ways it works basically like in Excel well you know when you are in a cell in your effort referring to other cells so since you have a mind picture of how that works the type of refreshes that are allowed you have money or refresh of data you have this scheduled refresh and there is different scheduled references for pro license or premium license well through your table at the end and you have even the you know the external refresh data sets it says data flow can also refer to data that is not owned by the workspace and in that case the data flow darest the external data practically looks for changes in the data if it finds changes entry curses get a refresh ok and then you have incremental updates and it says that with incremental updates data flow is canadaĆ­s only new data or change theta instead of you know processing everything which you can imagine what a team that is it says dataflow is on the lake the lake so as I mentioned before the data flows to store the tables in a asteroid data lake and here's the thing you have 100 terabytes of this to use that you have a hundred terabytes of free data storage for premium and you have 10 gigabytes only for free for pro so basically if you have a pro I mean you are going to use more than 10 gigabytes of data storage I'm sure of that so after that you pay it doesn't say here much it cost but I guess there is a calculator somewhere I'm not sure the how he can go and take I don't know I don't know if they have released any pricing the way I read before it was that it was free but it's not free it's just the initial storage many users will never have to be aware of the underlying data like I guess it's good that you're aware that your data's being moved and then you have you know data privacy knows where is that data make you some Europe stays in Europe or you know all that kind of stuff I'm guessing that thing stays in the same local region and I can only imagine that it doesn't say here though so I don't know it says open approach then is talking about you know what we mentioned here about the usher or their assure packages which is great so it allows you to connect to usher data factory data breaks SQL so they can create their own CDM and then they you can access them through Power bi we've talked about that already and they say that these are small entity tables are actually C is V files with metadata and they have an entire appendix about this C and D folder with CDM sorry CDM folder and I'm not going to touch on that on this video let me know if you want me to do a review on that but I don't want to make these way too long it's already quite long us is so security this is say quite important it says data process you know power bi you see say a CL standard that is that allows you to not only give security for the data set but it allows you to give you security role level you know entity level that is not supported row-level security is not supported by the asteroid data like yet so this is what I say is highly recommended to restrict access to the data flows entity only to analysts whose job is to create the power bi data sense okay so give access to the data Lake to the people that are creating the data flows and nobody else because you won't be able to restrict what they see in there and why risk it right when they say it white so just your thought then he says dataflow power bi probe versus premium and it says data flows are available both with Pro and premium but there are differences and one of the difference already mentioned is 100 terabytes versus 10 gigabyte data Lake storage for free it says there is a civil on page guess you want to check it out this page 16 I will show you also on the side and that table it just shows you the difference between power bi Pro and premium when it comes to data flows it says connectivity all for both the storage talked about it oh sure for premium the authorities are not revised for p1 or graded nodes it says data NJ injection it says serial while with premium is parallel no incremental update so you will load everything every time but thinking about reference to entities in the same workspace not available for pro it is per premium reference to entity say cross work workspace is not available from Perl calculation engine not available which I mean that is the thing with data flows that is not power query so basically for not not really but for the Perl users you don't have the calculation engine that you know orchestrates how things get executed that in clean and transform a motive data Lake so you have a power query online experience up there but your data is moved to a data Lake so it's not power query at all you're still moving your data somewhere else and Microsoft will charge you after 10 gigabytes and then the refresh rates you have eight times a day or 48 times a day it says power query for web enhancements they call it like that you know the finding when you are actually this is the finding entity so when you're cleaning the data in the data flow they're calling power query for the web pqw and as you probably seen in the demos are available there it is a very limited power query is not near close to what power query can do I didn't excel on power bi so they say here it is a limited we are going to continue developing so we will know they will close the gap between power query tableau you know for the web and power query in the coming months he says also the connectivity that there are not a lot of sources available that will close that gap - and it says most new power query features used by data flows will be really similar in Italy for both power bi desktop and power bi for web but they won't lion version you know the one that data flows uses Serena says the data flow experiences and hands men's they are going to improve on the visual data flow experience you'll see you actually can you know see what's going on when comput eight in the different flows and do you have like a diagram picture in there and there will be more and a husband's to the calculation engineer too so I really hope that this brings some light into what a data flow is I hope it makes it more clear if you have questions let me know you know in the discuss for you about what you think about it so for me this is you know great for you know people are building the fundamentals of data management you know the architecture I wish that data scientists business analysts business users power users will not have to do this you know they can actually concentrate on the other parts of being a business analyst which is actually creating valuable reports so this is more like the management of data and before that I would rather have seen data Catalonians then a data catalog that is actually be able to be induced where you can discover sources easier maybe these will be the future of data catalogues I don't know but I think that is more needed than these but this probably because I don't do this type of you know data table so anyhow let me know in the comment box what you think of this and I'll see you again on one say if you want me to I can make a data flow demonstration you know to actually create a data flow from scratch to see how that works and maybe I can do that for Wednesday okay I will do it at any point but if you tell me that you need it I can do it earlier so have a great Monday and that's how you can own Wednesday why
Info
Channel: Curbal
Views: 26,737
Rating: 4.9014373 out of 5
Keywords: Power bi, powerbi, Curbal, Curbal.com, excel, excel bi, power bi desktop, power bi designer, cubal, power bi video tutorial, dataflows power bi, dataflows public preview, dataflows preview, dataflows preview power bi, dataflows azure, deep dive dataflows power bi, data flow power bi, data flow, power bi dataflows, dataflow, bi, what is a dataflow?, what is a power bi dataflow?
Id: bkFG8s_9sGE
Channel Id: undefined
Length: 31min 59sec (1919 seconds)
Published: Mon Nov 26 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.