Dataflows end-to-end project (Microsoft Fabric) + Lakehouse + Power BI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome back for another video and welcome to learn data fabric I'm will and in today's video we've got a very special one for you now we're going to be deep diving into Data flows and I'm going to be building an end-to-end project that shows you exactly how to use data flows within Microsoft fabric and we're going to be seeing some new ways that you can use data flows within the fabric context that you can't really use in the Microsoft power bi data flows and what I'm going to be doing is using data from our world and data specifically it's this one so it's the which sources does our Global energy come from how much is low carbon and you haven't used our world and data before it's a really interesting website I recommend you have a look it basically collates lots of different data sets about our world and you can Browse by topic and it describes what's going on in the world through data [Music] so on this page we've got this graph here that I'm going to be trying to recreate within power bi and what you can do with any of these charts in our world in data is download the data that goes behind it so we're going to be downloading the data I've put it into an Azure storage account and our goal for this project is to First extract the data from azure so we're going to bring it into a data flow we're going to have a look at it and transform it so that we can recreate this chart here so there's going to be lots of steps to this make sure you follow along and yeah let's get going with the real world project using data flows [Music] okay so here we are we are in the data Factory experience and I've set up a workspace creatively named data Factory and we're going to start by creating a new data flow so we're going to be using this to extract our data from azure so we're going to be going to get data and then more looking for the connections and in Azure we've got lots of different things to select from my data's in an Azure blob so I've already set up a connection to my Azure blob storage and it's using the account key and to set that up you just go into your storage account and scroll down to access keys and then you can find this key here that you just copy into the connection when you're creating it within fabric within your data flow so let's click on next that will all go through here we've got this container here and we can see right this is our container and these are our blobs we've got the CSV files there's two of them we only need one of them but let's click on create [Music] okay so now we've got our data within our data flow so we can start to transform it and extract this data so first of all I'm going to just filter out this file here because this is the only one I want then we can extract the content of that CSV file by clicking on this little button here and it transforms it from a CSV into the actual data contained within the CSV that's what that looks like now let's click OK that looks good and we're going to be doing some Transformations on this firstly we just want to load it into our lake house because what I'm going to be doing I don't know much about this data really I want to explore the data a bit and it's difficult to do that within a data flow but to do that first one of the limitations that you have to bear in mind is that we can't write data specifically column names that have spaces in them into a lake house so the first step is to rename our columns okay so we can begin to see the structure of the data now we've got the source name that's come through from our data flow we've got various entities so these are for example consonants or countries in some cases it's a bit of a mix which is a bit annoying we can't really do country by country comparisons very easily we've got the year and then we've got the amount of energy produced by each source so coal oil gas in terrible hours I think it was [Music] so now we've had a bit of a look at our data now what I want to do is not do much more Transformations with it yet I'm just going to store it in a bronze lake house so I'm just going to be getting the raw data I've made very minimal Transformations enough for me to save it into a lake house and I'm going to save it into our lake house like this so in the bottom right hand corner we can add a data destination add it to a lake house which already I've already set up within this workspace and it's called Data Factory and I've got this bronze lake house here and I'm going to call my table energy from data flow let's click next and we can do replace just check that the column mapping is correct that looks fine it might struggle with this but we'll see with the source.name so I might have to remove that but what we'll do is we'll publish and we see how we get on okay so I haven't given it a very good name though it's just called data flow three it's this one and when you publish it it will run for the first time so it shouldn't take very long I don't think this data sets are very large I think it's about nine thousand rows or something okay so we're back in the refresh history for our data flow 3 and we can see that it has succeeded it took one minute and 12 seconds to run this pipeline so let's open the SQL endpoint for the bronze lake house and just have a look at the data we've got here okay so you can see our data has now arrived into this lake house you know we've got very similar or exactly the same structure as we had in our power query in our data flow [Music] the reason I've done this is because I want to just click on new report here and just have a look at the data do some exploratory data analysis just to understand it a bit better because if you want to recreate this chart here then we need to understand the data in a bit more detail and just have a little little bit of a play with it first so we've got our data set here this is one of the benefits of fabric and power bi integrated into our environment we can quickly kind of use data flows visualize it in power bi all of this in one kind of seamless way so what I want to do is just explore the data a little bit perhaps I want to have a look at the entities here and I want to have a look at just what accounts are really so how many records are there for each one is it are they all the same is there some records or some entities that have less data we can see that there is now so some of these countries have less records which makes me think okay maybe it's uh different years obviously USSR has less records so I'm just going to copy that and have a look at okay is it the year okay so for it looks like the data goes up to 2021 and down early as far as 1965. so that gives us a bit of an idea about the types of data we've got there now I notice in the one that we're trying to copy we have this world so we're only really interested in the world category so what happens if we just add a filter to this page and only look at the world okay so now we've got our world and so with this chart here we're nearly all of the way there really we are I've got one bar and we want to categorize it by the different energy types So currently that's not really possible with the structure of our data we've got different energy types in different columns in our data set and really we want to have one column for all of these different energy types or one column called Energy types and one column with their values because then that will enables us to use the legend we can split this big bar up into the different energy types so it's a bit of an exploration of our data let's go and do that now [Music] so I'm going to do is I'm going to create a new data flow now you could just add this logic into the old data flow but I'm showing some different ways of working with data flows here so in this data flow we're going to do get data and this time we're going to read it from Microsoft fabric itself we're going to be looking into that lake house the bronze lake house where we've stored this data and so here you go this is the Bronze Lake House let's open this up and this is the table we want energy from data flow okay so there's over 300 Transformations that we can do within data flows thanks to power query all of these different options of removing columns keeping moving rows filtering split columns merging appending queries all these different things which we won't be going into in this lesson but we are going to be tidying up this data a bit so this Source column we don't really need so I'm going to be removing this column we want to keep the entity and we want to keep the code and obviously the year is very important but what we want to do is transform these columns here so there's only two columns are attribute which is energy type and our values and the way of doing that in power query in these data flows is if we go to the transform tab then you have this really useful function called unpivot columns and this is probably my favorite part about data flows is to unpivot columns because you can do it obviously in Python in a synapse data engineering notebook it's actually more difficult than it should be to do this in SQL so this specific transformation is a really good one to do in power query in data flows so what you can see I've just unpivoted these columns and now we've got for each year and each country we've got eight rows in the data set previously it was one row and so there's a row for each of these different energy types with their responding values so let's just change this to something a bit more user friendly we'll leave that one as value okay and now we want to set the data destination and what I'm going to do is I'm going to save this to a different lake house so I'm using a kind of Medallion architecture here whereby in the bronze lake house we're just going to be capturing the raw data as it comes in from our source in the silver lake house we're going to be putting our transformed and cleaned and curated data sets so again we're creating the connection to a lake house but this time we're going to be choosing our Silver Lake House and we're going to be calling it transformed energy from data flow obviously we've got different columns here now let's save this and we can publish so let's just wait for that one to load and I'll come back in a second [Music] okay so here we are we are in our silver lake house now and we've got our transform data and you can see what I've actually done is added another filter since the last step and I've just brought in the world data set because we're only really interested in the world stuff so we've got our data now what I'm going to do is I'm going to go to the SQL endpoint and I'm going to find our table so what I'm going to do is I'm going to create a report now and try and build this visual that we have in our world of data within our power bi report so we brought in our transform data now and what we want is to grab the entity which is only world at this moment and now we can use the values column as our x-axis okay so I've brought in the data and we're just doing the entity on the y-axis and the sum of the value on the x-axis which I'll go into in a bit more detail in a moment and what we'll do is add in the legend for the energy type so now we've got coal gas geohydro nuclear and solar and wind Okay so we've brought in all of our data now we've got our entity which is world and we've broken it up using the legend into the different energy types coal gas Geo Hydro we haven't got the same order as this visual maybe that can be an extension to this project for a future rendition but you can see we haven't quite got it right because currently this is only showing 2021 data so it's about 1002 or 162 000 terawatt hours so the slight modification is we need to use this year and let's just grab this so yeah there you go so now we're at 162. and perhaps this isn't the best filter type what we can do is edit this slicer to I don't know power bi doesn't really give us the option to do a Slicer in the same way very easily as we've got there but perhaps a better way of visualizing this to see how this is developed through time would actually be to use something like this and this isn't really a power bi tutorial it's mainly about the data flow but if you see now we can if we visualize this by our year then there we go we see a bit of a Time series just to summarize what we've done here we've taken the core data from an Azure blob storage account we've pulled it into a Bronze Lake House we've done some very minimal Transformations first just to get it into that lake house then we visualized it in power bi found out what Transformations we need to make to recreate this our world in data graph and then finally we've done some um pivots and some tidying up the columns just to show some of the functionality of a data flow and then we visualized it again and improved the visualization in power bi so that's just an end-to-end project using data flows and power bi and fabric and I hope you can begin to see the power of data flows within Microsoft fabric now I think they're much more useful than they used to be in power bi because we can load data into any fabric item so that's really powerful so I hope you enjoyed this end-to-end project hopefully it's given you some ideas of how to use data flows in the future and there'll be many more videos like this coming so if you're not subscribed already then make sure you subscribe like this video and leave a comment on what you thought about this mini project in the next video we'll be looking at the data pipelines so we'll be looking at the other side of data Factory and we'll be looking at the high level overview and I'll be doing another project similar to this one but using data pipelines so see you then
Info
Channel: Learn Microsoft Fabric with Will
Views: 205
Rating: undefined out of 5
Keywords: Dataflow, Dataflow Gen2, Microsoft Fabric, Power BI, Azure Blob Storage, Our World in Data, Terrawatt hour, PowerQuery
Id: ZmgZPhl2LRU
Channel Id: undefined
Length: 17min 21sec (1041 seconds)
Published: Wed Jul 05 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.