Power BI Dataflows Tutorial and Best Practices [Full Course] đź“Š

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so welcome everyone i see you guys are getting in like i said as you can see we have a small timer before we begin and then we will start with power bi data flows we're just doing a small little audio check so this should all be listened here in three two one i'm gonna and hopefully that worked and it muted and you couldn't hear me so excited y'all here let's say about about eight minutes we're gonna get going [Music] we gotta make sure we're unmuted here but hello and welcome everyone here to another learn with the nerds event so hopefully everyone here is excited ready to go today is all going to be about power bi data flows as you might expect hopefully that makes sense so it isn't going to be too confusing as we cover this content and this material now a couple of things let me actually go ahead and get this nice little slide deck open we're going to put here in the chat a little link that you can go ahead and you can download the actual powerpoint itself it's actually the pdf representation of what we'll be going through because we've got a lot of stuff that we're going to be covering in these three hours right so we go we're starting now we'll go to two if we end a little bit earlier that gives us some opportunity to have discussion there's so there's a lot there's a lot that goes on when we talk about power bi data flows itself and that's what we're here that's what we're gathered to talk about now uh if you've been with us in other previous learn with the nerds here's a couple of things that you're familiar with right we're going to start here and we're going to go till 2 as i mentioned we as well are going to take a little bit of a break around that 12 20 12 30 mark it's just gonna be a 15 minute break just kind of stop whenever it feels natural and organic now yes this is a live event i've been seeing a lot of messages like i'm looking down i just saw something from david lee hello from sunshine florida so i mean this is live but naturally as soon as we finish this event as soon as it closes up we are of course going to be uh putting this right up on our youtube channel so it's going to be there and available for you to follow up on now this kind of segues into the next point right questions of course there's lots of you in the chat lots of you from so many different places i saw a message in there saying it's going to be dinner because more than likely they're over the pond they're overseas so the chat's going to be active it's going to be going really this is something i have jeremiah's in the background got a couple people looking in the background as they go through that chat they'll try to answer any sort of questions that you might have but as you can expect with what we have to cover with how many people are here in this call we're not going to have that opportunity to just kind of stop and troubleshoot all those issues but that's why we put it up there for a recording now as far as the materials and the items that we're going to be going through i'll of course we're going to be doing different urls and online access that's why in the file there are no files right we're going to be in the power bi cloud service which is awesome because we can leverage and take advantage of basically various different urls i try to make it as easy and user friendly here some of the examples basically only really one um has been it's gonna be done on mine that you probably won't be able to follow identically but you'll be able to recreate that on your own probably post the recording and that's gonna be when i connect to my azure sql database just because you know i wanted to showcase something here with dataflows so that's what i'll be doing but for the most part in the first example i'm going to connect it's just going to be to our url i can provide that url in the chats post class will have a link for the pdf download and i'll make sure that there's a word document and they're probably just a text file which will have all the urls that was used in today's class so about following along if you wanted to certainly but that's really not the focal point and idea today it's just about really giving you this information on what exactly is power bi data flows now this is some of the i guess original stuff if you've been with any of our learn with the nerds in the past this is pretty standard stuff here if you haven't seen them make sure you check it out we have a whole playlist for those previous events but on to a couple of new things first and foremost myself you haven't seen me yet on learn with the nerds maybe you've seen me on some of the other youtube the pragmatic work youtube videos as well as possibly on our on-demand learning platform but my name is manuel quintana i'm a fellow trainer here at pragmatic works and i love what i do straight up i'm pretty sure there's a couple familiar names in there i've either met talked with or maybe you've seen some of my videos so it's great to see you guys coming back also naturally the topic new power bi data flows now that word data flows is always fun for me because there's so many different variations right if you're in the power apps platform you have data flows there if you're an azure data factory the data flows there it's a lot of different things but we're going to refine this down we're going to focus on power bi data flows that's the emphasis that's what we're going to discuss there's tech a little bit of you know like spill over into power apps like when you're on data flows there but we won't be focusing on that and last but not least you may have already noticed but we have a new couple options in that chat menu just to have a little fun super sticker super chat just another way to kind of get involved with the community if you want to add a little support in there that's just something we wanted to add into the chat it's a little bit of fun but just you guys being here the support that you've given us over the many learn with the nerd events it's amazing guys we appreciate it so much we love that you come here and from what we're seeing from the comments the stuff that we're kind of introducing to the information we're sharing seems to be helpful right at the end of the day that's our goal is just to provide you guys information insights and hopefully you can bring it back to work and you know be that superstar in your organization like hey guess what i learned about data flows today so with all that fun stuff out of the way let's see what else we've got here for today once again we only have a pdf available for you that's all it is it's just a pdf of these slides so you don't need to really jot down everything you see on the slide deck itself because you're going to have this pdf version once again once this is put up as a recording there'll be a link that will have that pdf and i'll include once again a text document which will have the links that will be used today as we're doing this live if you're going to attempt to kind of follow along and go through the process of course as we go through these stages i'll make sure i'm explaining this but we are going to be traversing over kind of various different areas when it comes to licensing in regards to data flows that's a pretty big conversation here how can i use data flows what features are available to me so naturally we're going to be covering those topics as we go through it it's in the slide deck as a reminder so you'll be there but many of the things that we're going to be looking at today will require either a premium per user license or it will require a premium per capacity so what's known as dedicated capacity now data flows can be used for pro and we're going to discuss and what kind of constraints you can use data flows if you have what's called a shared capacity so your organization either has free or pro licenses you don't have premium per user or premium dedicated you can still use it so we're gonna you're definitely gonna have value regardless what type of licensing you have going on you can take advantage now also the nature of the demonstration we're gonna be doing today just to make it nice clean and streamlined will not require a data gateway right this is uh something that's used for power bi power apps all the power platform in its entirety azure use it as well but if you're ever going to use data flows in conjunction with any sort of on-premises sources on-prem sql server maybe you have an xlr csv and a file path on your local network that requires a data gateway we won't be going through that today but we have a ton of content on all the other technologies basically it's the same usage just for power bi power apps if you're going on-prem you need that data gateway we won't be using it for our examples today but it should be known that is something that is required if you're going to be going that route with your sources as far as the content what we're going to be kind of going through and covering today we're going to be looking just as a general overview of what's going on inside of data flows itself of course as you can see there and then we're going to be moving on and going into talking about storage so we won't be hopping right into creating data flows because there's actually some pretty important decisions that you can make prior to starting making your data flows this is actually very critical right the option to do this actually has to be defined before any data flows exist in your workspace and technically if you want to kind of eliminate this you have to detach any data flows as well so storage actually is an interesting conversation i think sometimes gets missed out and once again it actually has to be a decision that you make when you create the workspace itself very very important so after we kind of talk about a little storage we're going to start creating data flows right i'm going to kind of use two different environments to show you guys this the first environment we're going to have i'm going to be logged into doesn't have premium it doesn't isn't a premium per user it doesn't have dedicated capacity it's just me as a pro user so you can see yes in a workspace i have access to data flows i can see it and i'm going to use that as an example to tie into azure data lake storage so you can see what that's all about and why you might want to consider using azure data lake storage instead of just the default storage which we'll be discussing then i'll switch over to a tenant that has premium capacity and this is where i'm going to go through the majority of the examples that's why following along might be a little bit difficult because some of you may not have these options for using dedicated or premium capacity so that is going to be some item that we kind of need to look at and to discuss from that point we're also going to go into just some integration really we're talking about here is consuming right i've created some data flows how do i use them what are the different options that i have to take advantage the most obvious and natural one will be using power bi desktop but as we go through some of the different options you'll see there's other methods and ways and depending on what we decided on that storage it opens up even more and last but not least depending on how much time we have we're going to take a look at ai insights we're definitely going to start it there's a couple of things some go a bit deeper but we're going to look into some of these ai insight capabilities that exists within data flows so hopefully that sounds exciting and there's really only one more announcement guys he's not here in the chat right now he'll probably be in later on but if you know him on twitter if you have them on facebook let's make sure we wish mr devin knight a happy birthday because it's today guys so wish him uh he is the let's just assume he's the graceful age of 30. we'll give him that little name there but yeah devin knight if you've been on anyone with the nerds he's our commander-in-chief here in training so let's wish him a happy birthday guys so with no further ado let's kind of dive in and begin the conversation around data flows now you're going to see me kind of popping in and out here just as i focus in the conversational pieces so i don't get too distracting so you can really hone in on what's going on here now we're not going to spend too much time on these slides because you have it this is at your disposal right you can kind of revisit these items and go into it and look at it at your disposal we want to dive in and see how we get this working right so we talk about power bi just power bi as a whole right we know that this has come as a major wave major popularity years ago we have this kind of growth in what's known as the self-service bi right this is this way that everyone's reading it's it's more popular than ever so power bi has grown over these last three years immensely and one of the areas one of the gaps that we have when we talk about self-service business intelligence is really gaps in the data preparation process now power bi does have for any of you who've worked with it or have leveraged it the power query editor right you go into power bi you hit transform data boom you got the power query editor in the background this uses the m or the mashup query language an extremely flexible and dynamic language that allows us to create business rules within our data but it's really not an enterprise level etl tool now etl is a new term for you that stands for extract transform and load so it can do this right we bring data in we can transform it and technically we are if we're importing we're loading it into the powerband model so it kind of fits that idea but it really isn't an enterprise tool the more we start jamming into our power bi reports inside the power query editor that naturally will have impacts in the overall efficiency of the model and at the same time if we're in a power bi desktop report and we're doing all the creating all these business rules that's me creating these business rules setting that up there's no real form of collaboration here because if i publish this out to the power bi service all we have is the data set and we have the actual reports itself we can't access that back end we can't access the power query editor with the reports that are being launched into the service this is really where power bi data flows comes into play now the greater extension of data capabilities mlai this has actually grown we're seeing it now in power bi desktop it's been here for a couple of months now so they've actually started to add extensibility at the self-service right at the desktop level that is there too but the whole purpose of this cloud environment the whole purpose of that power bi service is to have workspaces so we can organize and work with our fellow colleagues and we can have collaborative efforts here so it's very popular right now that people will create power bi data sets right they'll publish this out to the service and users will consume it a very effective and awesome way to basically kind of have governance for your power bi service but at the same time when we need to make adjustments or amendments to the data cleansing the business rules that we created we either have to download that report get into power bi desktop and make those changes the collaborative side not that straightforward this is where we have power bi data flows coming into the mix here it's pretty awesome pretty amazing so that's what it's meant to do fill in the gap we have whatever sources we have we have data flows which connects to those sources cleans those up and then creates them effectively as data sets and we're going to talk about what that looks like in power bi directly we have a power bi data flows connector and that's it you connect to it you log in with the credentials that is the same credentials you use for the service and you get to see whatever data flows you want and you import that into your power bar report and off to the races you go writing reports creating dashboards all of that fun stuff this becomes a consumable a usable source like any other source inside of power bi but now when we need to address and make some changes and update what's going on in that data set we simply can go to the power bi service myself any of my other colleagues anybody who's in that workspace we have all of that logic all of those rules right there in the service that we can access manage maintain we can refresh things it's right there and it does provide an end-to-end solution for basically that etl process that we're looking for which is pretty awesome and amazing that's where data flows fits in so at the end of the day really a quick way like what are data flows it's effectively the power query editor online so if you're familiar with power bi and you've used the power creditor you're gonna be pretty familiar with the user interface you're gonna see here during this call it's pretty awesome it's pretty neat and we're gonna check it out together now who should be using this really in this range where we're seeing an extension of the responsibilities for the business analysts for the you know the the kind of the you know we talk about data consumption the responsibilities for like citizen development it's becoming more and more widespread on these responsibilities right so we're seeing more and more usage now quite often this is going to be something that's managed and maintained by users inside of a workspace so i think that naturally kind of dictates who's going to be leveraging this anybody who's already kind of in that space of report writing report management they're going to be the ones who are going to be in these workspaces already so this isn't going to be extended out to any of the end user consumers right these are going to be those managers of those workspaces that have that data wrangler knowledge where does the data exist how can we bring it in the one aspect of data flows that should be understood is that it is focused on the data shaping side right it's the power query editor so we don't touch on the modeling side creating relationships and dax and stuff like that that is still relegated to the power bi desktop itself but really that's what we're talking about we're talking about more of those data wranglers here but once again more and more we're seeing these kind of roles and responsibilities of the analysts expanding so more and more we're seeing this come into play but the idea here is not to have a bunch of different power bi data sets not a bunch of individuals creating their own business rules we create a data flow which has you know these three tables that we know our users are going to have and they have a standard set of logic and everyone can access and use it that needs to and if we need to adjust or amend or add logic to it it's in one central location now here we see a very loaded topic licensing strategy something that's missing in this representation is really just power premium per user for the most part power premium per user mimics what you can do with dedicated capacity except for some of the performance increases but you can leverage with power premium per user so the ppu license the new license that's out there you effectively can leverage everything that the dedicated capacity can do but some of the throughput is not as pronounced you have more throughput when it comes to using dedicated capacity but when we get into talking about linked entities and calculated entities and you know using the ai items the ai insights and data flows that is available to a premium per user as well like i said a little bit newer i think there's a bit of confusion in the air when it comes to what power premium what premium per user licenses can do and they have restructured workspaces and we're going to see that first and foremost right this is important what type of modeling what kind of licensing structure we're going to have and what capabilities is that going to grant us when it comes to data flows itself so that's we're going to be discussing here right so just a little bit of conversation a little bit of setup there on the overview but the common data model is something that we're going to touch on very briefly it's a much deeper conversation a much deeper topic but it should be known when we create data flows we effectively are going to be storing those data flows we we're creating something that has business rules it's effectively creating a table and we're loading data into it so it's storing this and just in case of course there naturally has been some uh verbiage changes here where the common data model in regards to microsoft is dataverse and entities or tables but still in the power bi service we actually still see the usage of the word entity still in play here so it looks it hasn't been translated in all avenues at the end of the day that's what's happening by default and we're going to get into this when you create a data flow you're storing it in some default azure storage and it is in what's called the common data model structure it's just there it's storing the metadata and the data itself but by default unless you tie it into your own type of storage it's just stored in the background something that's really not accessible really only accessible using the power bi desktop and also as well data flows in the service it's kind of limited and that's what we're going to talk about bringing your own storage and what that opens up for you but as a whole the common data service which is leveraged across the entire pla the entire power platform is based off the common data model which is an open data initiative ibm microsoft sap a bunch of different organizations are part of this open data initiative to create a common set of objects right these common set of tables common set of columns so they can be proprietarily like kind of you can work across various different technologies that's the idea behind it so the dataverse itself is microsoft's flavor on the common data model and at the end of the day when we're storing data flows it does it in that structure that doesn't have a big play in what we're going to be talking about today but it's gonna come up so i didn't want it to be some sort of strange topic but when it comes to the common data model and the dataverse that is well its own big deep topic make sure you check out jeremiah actually just did a brand new intro to dataverse class and awesome on the checkout also we can see this is just describing the schema what it looks like you have the model at the top we have common apps but this is meant to extend even beyond the microsoft power platform that's where the common data model is meant to be part of that open data initiative so we're not going to spend too much time on this this is just meant to be a logical representation of what that model is meant to look like and here's where it fits in the scope of power bi we have our sources in the service we have our data flows which are connecting the data cleaning up that data and then when we store this data we actually have kind of two avenues either the standard default power bi storage or we bring our own but at the same place it's still all going to be using effectively azure data lake storage on the back end it creates its own structures its own files and i'll show you that i actually have already created a workspace connected it to my azure data lake storage created a quick data flow just so we can see what it's created the structure of everything and you can actually see that actually i think i have it right here on uh it's probably coming up in a slide or two let me not skip forward too fast but we'll see that structure here in a moment but that's how it's stored once again if we use the default storage that powerbait offers us it's really only accessible from the service and powerbi desktop so right there hopefully i'm accenting how cool it is if you bring your own data lake into the scenario there are quite a bit of requirements when it comes to using azure data lake itself if you want to leverage that piece so to take advantage of this there's a couple of requirements like so we're not going to go through each and every piece we're not going to go through actually setting up that storage account and getting everything taken care of i've put some of the core requirements here in this pdf so you have it available to you but naturally if you're going to be going this route accessing and going into the microsoft documentation will keep you online and keep you on course so just a couple of the key things here just a bullet point of hey in azure if you create a storage account here the minimum specs so that you can connect your power bi workspace to azure data lake storage and now when you're creating these data flows you have access to it you can have access to this common data model structure which has the metadata and all that and what does that mean for us when we do it that means now we have access to this to consume it in other azure services we literally have access to this data that's in a storage account in an azure data lake storage account so we can take advantage of it in maybe azure machine learning services we can tie into it and maybe do some stuff with azure data factory it's available to us that's really a main thing we want to do so we can take advantage now it should be noted you can potentially tie specific workspaces to storage accounts and there is an actual option at an organizational level to say hey here's the organizational storage account if we want to use that too so there's kind of actually some different uh kind of tiers in which you can leverage this but it's a really cool option this is just a quick look at kind of the structure of how this information gets stored inside of that storage account itself like i said i've already created a workspace i've tied it to my storage accounts so you can see what that actually looks like so let's actually take a gander let's dive in let's actually go into a little demonstration where we can um look at and this one is going to be a little more if you want to follow along with the beginning you can but obviously i've already created a workspace part of me i've already created an azure storage account and i've connected that to the workspace that you're about to see so let me go ahead and bring this up so we can take a look at it so right here we can see that i'm in my workspace and what i'm actually going to do is let me go bring this up as well here all right i had this up i must have accidentally closed this out perfect we want to be right here one second to sneeze all right sneeze averted so you're not deafened but we can see i'm logged in powerbait.com this is not this is an account that does not have premium right to indicate premium naturally as you may or may not know uh there's this little pretty diamond icon nexus as you can see there is no diamond icons so i mentioned a couple of important decisions need to be made before you even start creating data flows so let's look at what that looks like when i go to create a workspace right and once again this this is a you may not be able to do this these are permissions that potentially could be turned off by organization they can limit who can create workspaces so you might just be able to watch but this is just a quick data flows demo that's all i'm going to put we're not even going to use this but we can see as i'm going through this process there are some choices that are critical under the advanced section and this is where you must make these decisions right now you can switch workspaces basically the license mode this is something you can adjust after you create it but if you're going to be taking advantage of some of the premium capabilities then you need to either be premium per user or premium per capacity if you were unaware they've made these adjustments before you just kind of made a workspace and licenses are effectively associated with users that's still the case licenses are associated with users but now if you're a premium per user you are the only type of license that can be in a premium per user workspace a premium for capacity you could technically be both it gets a little muddy here we don't want to focus into those details but if you're going to take advantage of some of the premium related features you must either have the premium per user or the premium per capacity and i'll show you in a moment in my scenario this doesn't have premium but i was still able to create a data flow and it worked so that's the first step is considering the license mode but once again it's not the end of the day if you choose pro if you happen to have premium per user if you happen to have premium per capacity you can make this change after creation as you're gonna see right here if i go in and i locate this data flows that i created you can see there is a data flow this is a non-premium workspace and if we look at the settings here we can see that i potentially could make some adjustments if we had the licensing available to us they actually would make that option available to me down below right you can go in here and make those adjustments well here is the fun thing azure connections by default this storage will just say attach or connect to azure that's all i will say here by default we are not connected to azure you will use the default power bi storage here but i've gone through and i have connected this as you can see to a storage account called data flow adls and if we come over here if i go and find this storage account just so you can see just what it has done for us right and there was nothing in this you can see it creates a backup because we actually save versions of our data flows in here but if we go into power bi itself here's that name of that workspace here is the name of that data flow and you can see some snapshots of the data some json these are things that we're going to see later on we can bring into play but this is also where we can go in and we can actually see the data and leverage it and access it and use it for other azure services so this would be one of the reasons why i'm bringing this up is to talk about the defaults it's great it works and if you know you're going to be using it just in the confines of the power bi service or if you're just going to be consuming things using the data flows connector and power bi desktop then you can use the standard storage there's no problems those work fantastic but by opening this up and connecting your workspace or because there is an option at the very bottom when you uh set this up you can either when you choose to connect to azure you can connect to this workspace to a specific storage account or specify a storage account that's been set at the tenant level of pregnant of the you know power bi tenant itself so i've tied this workspace over to a storage account which i defined and i created technically not like an organ organizational storage account but because i've done this as i add more and create more workspaces or as i create more and more data flows they're going to be stored here that the metadata of the dataflow is going to be there and we'll see how that can come into play later the date itself is going to be in there so now i can use it with various different azure services all we need to do is grant the appropriate access to this storage account here's another pretty cool thing if i were to go through and delete all of these data flows that i've worked on if something happens we remove it the data storage the azure storage account it stays there it doesn't auto delete it doesn't remove you would have to explicitly remove those items so it's great because it has it all in there kind of like a redundancy element to us which we can take advantage of so this is the beginning this is before you start to create data flows you do need to define this once you have started creating data flows the option to create an azure connection actually is unavailable to you currently so it will just use standard default power bi storage in the background and if you wanted to say hey you know what now is the time i do want to make a switch i do want to connect this to my storage account you would actually have to delete all of your data flows in the workspace and then you can connect it so a little bit icky now granted there are ways that you can go ahead and export data flows that's an option right there so you can export them all then delete them and then import them later obviously so you don't have to recreate things from scratch but this is why i wanted to show you workspaces first right now as far as and the one thing of course naturally is if you use the default storage right if you use that default storage that's coming from um you know the standard if you don't connect to azure connections naturally there's no additional cost in there when you bring your own storage it's just about the cost of maintaining the actual storage account itself so that is just the storage cost for storage accounts varies it does create hot storage but honestly the costs here are pretty pretty minimal pretty low adls storage accounts you can get and leverage additional features to increase that cost but for the most part the amount of data that's stored in there is pretty good pretty effective as far as a cost analysis there so it's definitely something you need to check into that's a whole other billing scenario but i just wanted to bring it to your attention and once again if you're not going to be leveraging this information for other azure services and other items then you don't need to you don't need to and as well we've talked about you could make this decision at a later point in time so that's just something i wanted to bring up into the mix so that is workspaces this is decision number one now we can go ahead and we'll wave bye-bye to uh this workspace here i'll probably just minimize it for the moment because we're going to be working over here which as you can see this is a premium account now do note i have a couple of uh data flows already pre-created i'll explain those when we get into it but don't worry we're going to create some items from scratch here momentarily but we gotta settle do a little level setting on what exactly we're gonna be doing here so that's where we come into creating data flows now the one thing i did mention we had this whole conversation around workspaces and the the licensing mode of the workspaces and that fun stuff um that brings us to the conversation of the my workspace um hopefully everyone's aware you know every one of us here on the call there's something called the my workspace the features there are limited one of the features that is limited that you don't have available in your my workspace is the ability to create data flows so regardless my data the the my uh workspace should be treated like a dev area you should never have anything production anyways this is only going to be available for app workspaces you have to create a workspace and then make all those decisions we just decided that is where you will have the ability of creating data flows themselves right now we're going to see when we go and hit new data or when we go to create a new data flow itself here momentarily you know it's it's going to grant us you know a lot of capabilities right you can see we go to get started it's going to look right it's going to look like power bi when you hit transform data but it's not doesn't have a 100 parity as you're going to see in a moment to what we have available to us inside of actually power bi itself and we'll talk about that in a moment but here is the first screen when we get to our workspace and we're like let's go we're ready let's go ahead and let's create a data flow these are the current four options that will be presented to us now you rewind the clock a couple months ago there wasn't the import model option wasn't available so who knows these things could grow you're going to see a little bit behind the curtains here when i show you my environment i'm kind of in the future glimpse the environment we'll be using is it kind of has access to beta features so you won't see one feature that i have we're not going to go into it um i think from an nda perspective as well i can't really go into that item uh but there are some exciting things on the roadmap if anyone was there for the the previous microsoft where we had the business application summit then we had the most recent event which talked a lot about the power platform dataflows is going to get some performance boosts we are going to be seeing something in the form of what's called streaming data flows and i think this is going to add some pretty cool elements in this conversation so there's no eta on things like streaming data flows just keep an eye out like i said you're going to see it on my screen we're just not going to be able to go through it just because of the state of the state of what that feature is at right now but here we hit new data flow gotta make a choice this is our first area where we discuss pricing so our first item the one that says define new tables just as you see here this literally goes through and it tells us exactly what we need to handle here you can either create your own custom entity or you can connect to a data source and map it to standard entities remember the term entities it's still it's still kind of being used here but tables is the correct term you'll see entities show up in other places but this is tables so i'm going to connect to a source do i want to put it in a new custom table which will be stored in this common data model format or do i want to map the data that's coming from whatever sources i've decided to a standard table inside the dataverse so you actually do have that capability and those options to do that this is available for all modeling types all all license structures pro ppu premium per user dedicated capacity you're going to be able to do this now there's a small nuance in here right there's this item called a computed table or you know like i said it's sometimes referred to as computed entity this can be achieved when we're on the online designer and those are actually premium only so yes you can go here you can say add new table it launches the web designer and off to the races you go whether you're pro or premium and you can finish it up you can refresh it as long as computed tables or as you see to the right of it link tables are not in play so this is the most common one that everyone here in this call will probably have access to and you'll be able to leverage and they get the job done tell me what you want to connect to bring it in define the rules and how you want to clean it up and then let's save that into either our storage or the default power bi storage right so that one's open then we go over to link tables from data flows now linked tables those actually will be reference tables right there are some interesting things that we can accomplish when it comes to the linked tables here but this like the common areas or patterns that you might want to use for this is effectively going to be like for instance if you want to go ahead and reuse a table multiple times right that is a great scenario where you can go through and you can use what's called a link table so that you can have it in one location and you effectively reference it in other data flows so you can mitigate or streamline kind of the redundancy of refreshing a table that's hitting the same source this happens a lot when we use multiple power bi data sets that are pointing to the same tables this is a very common practice right you'll go in you'll have multiple data sets you refresh them well they're probably hitting the same source here's a great scenario where if you have like a static lookup table or like a date table something that is very common that could be used in multiple data flows instead of connecting and bringing it in every time you can reference a data flow another data flow that exists inside of your actual it has to be inside the same workspace but you can do that link tables as you can see as noted premium next to that which the colors are a little mismatched there it's the yellow one the import cdm format dataflow is the third item that says import model this one is going to be available for all licensing structures this literally just when you hit it it opens up a file browser and you are prompted to load in a the file type it's looking for is a json so i showed you earlier how you could export a data flow this puts it in what's called a json format type that's how the common data models format is stored so you can export and then you can import this is a very nice easy way that you can move data flows from one workspace possibly to another workspace if you wanted to so pretty standard feature interesting enough that's the newest one that's come to the table but you know it's there so the other item of course is going to be if we have the the fourth option which is premium only we can see it's attach a common data model folder this relates to once again the azure data lake right if we have if we're using advantage of our own storage it's going to be storing it in the common data model structure and you technically can attach a data lake storage gen 2 account to this and we will import whatever's in there all that structure all that data it's another way of doing import model but it's taking advantage of the potential azure data lake storage that you may have already set up right and it should be noted right you can have as many data flows in a workspace that you want there's you honestly when we create a data flow within a data flow also you can have as many tables or queries in there as you want so kind of the sky is the limit when it comes to that kind of realm and those types of items now once we've decided what we want let's assume we're going to hit that first option one right the one that lets everybody use it creating a new table we're going to see that this is the ui and this is where the nostalgia comes into play this is where the familiar ui comes into play now you might notice though we have 49 options available to us right here in the user interface this is the power query editor this is just like power bi or if you've been using tabular analysis services 2017 or higher this is the same ui as well but if you think about it if you've ever been to like one of our power bi classes we always say there's over 80 connectors in power bi and it's true there are actually more connectors in the power query editor which is available in the power bi desktop than there is here but the interesting thing this is continuing to grow we're going to continue to have more sources added in here there are some much more natural um integration with cloud services since this is already in the cloud but interesting enough i've seen it i haven't done it personally myself on many examples but i've seen many a blog that actually says hey you know what if you don't see the source that you're looking for in power query editor online when you're creating a data flow you can do it inside of power bi desktop and just copy the m code out of there and you can just paste it in here it's pretty interesting i would definitely describe that as a workaround so it's not going to work in all scenarios but it's just something that i ran across when i was doing some examples that tecla you can access that power bi desktop and kind of move that code from one place to another which is pretty cool you can do it but i think organically over time we're going to see that kind of 100 parity from what you're expecting in power bi desktop to what we have over here inside of the power query editor online right and that's once again that official kind of name here now for us we're going to connect and we're going to point to a source and we're going to bring that right on in and we're just going to create this right we're going to create a new table but once again remember in the background this is all touching what's called the common data model so something that you will find here that's unique to the power query editor online that you won't find in power bi desktop is this option this map to standard and what it's saying is here do you want to basically take your sources and instead of creating brand new tables simply map those columns map your source tables to some existing tables that are part of the default common data model there's a bunch of different tables that are already created you know you have your accounts you have your events you have your customers like there's these very standard tables that exist they have many many columns um it could be a potential route that you would leverage here right so it is something unique to the power query editor but nothing too crazy now let's go over and start by creating our first data flow itself now once again in my scenario if you wanted to all you need to have is a workspace of any type for this first example you know it doesn't have to be premium in my case i do have one i'm going to switch over to it actually it's right here this is the exact one i want and we're going to be connecting to a source here so let me go ahead and get this to you guys right now some of you might be familiar with this bring this up right here all right it's going to be this guy right here all right and i'll put that temporarily in the chat just look for that pragmatic works um sign there let me actually post this here you might have to repost this for me um and specifically i'm going to actually use a url now if you haven't been if you're not familiar with this obviously this is a web page right but there's some fun things you can do with this is an example we use with power bi desktop but i can go right here on this download link and i can copy this link address which is pretty fun and pretty cool so i can use this directly inside of what i'm about to do next so i'll kind of leave that in the background case we need to review revisit but i've got my connection i've got my source that we want to use what this is is just a effectively it's a csv that's hosted online which provides us a list of failed banks from the year two i think it's 2000 to 2020. we use this in other examples but it's perfect because it's a web you guys all it's a it's an anonymous open url everyone should have the potential to connect to this now we do need to make sure that what our next decision right we're in our workspace whether it's pro premium or premium per capacity we can go right here i'm just right here at the main level right i haven't gone anywhere special there's a couple ways you can actually technically access this um but i can just click the data the workspace name and right here in the upper left is where you basically have access to create you know every object that's available to us reports dashboards all this fun stuff there's some interesting things i said some of these options you may not see this is something that is you leveraging some kind of road map items um you'll see streaming data flows that's coming it's going to be pretty cool but right here we can see we have data flows right and it tells us prep clean and transform data and as soon as we go ahead and we click this let's see here the url itself won't actually do anything it'll just download a csv just in case all we need to do is we need to put it in for the dataflow itself so when i click data flow we get that those four options that we saw let me kind of close this out so we have more room but there's our four choices right that we spoke about define new table link new tables and we're just going to go and start on the left hand side and say i want to add a new table all right when we click on this link it gives us all of these choices that we can use excel workbooks like and you can go here and go file database there's various ways to filter and manage this what we're actually interested in here is going to be under the file option you might think it's web when we usually do this example we actually use a web connector but in this case we're actually going to use the tech csv because the link that i've actually provided actually points directly to a csv file itself right we'll see it right here i can paste that in and we can see it points directly to a csv so although this is a file path you can see you can provide a url this is going to basically collect that information from the csv and preview that data right for us so if i go ahead and hit next it's going to connect to this location and we're going to get a small preview and we're going to get more nostalgia here for anyone who's working with power bi this is the same kind of preview screen you would see and it's just letting us know basically if we need to make any adjustments it's kind of figured out that this data uses a comment a comma delimiter and you know file origin you know you can make adjustments here if you so choose but it is previewing the data so it's kind of just figuring this out for us so i don't really need to make any adjustments we'll notice coming from the web it's kind of picked up an interesting character here at the end that's actually a character that's not some sort of like reserved symbol or anything like that um but we can go ahead now and hit transform data because remember before with like power bi if you went into something like that you actually had this capability and i'm going to close this down on the left so we have more space here you could like load it remember at this point we're creating a new table here so we literally have what's called the power creator now let me kind of shrink this down so we can kind of take in this ui right on the left hand side since we're connecting right now technically to just one table it's one csv we can see that it's brought that in and it's given us the name of query right that's the kind of the default item as we start to tinker with different sources as let's say we connect to something like sql server and we want to bring in five different tables you're gonna see all five tables there on the left so it's very important to kind of label your things correctly here label your tables correctly so if you're collaborating with other users they very much know where they want to go so that's usually my first little step there let me go ahead and change the name here to failed banks right just going to name my table accordingly you can see that updates on the left and whatever we have chosen on the left dictates what we see right here in the middle so if i have multiple tables as i make a choice the preview will be displayed right here and one of the most important pieces of the puzzle here is going to be on the right hand side in the query settings we have the supplied steps i also always say this when we're talking about power bi desktop but this is going to be our steps these are our breadcrumbs these are the trail of what has occurred what's happened to our table these are all of our steps now we haven't done anything and you'll notice there's three already there now there are some interesting icons located here we will discuss and talk about this later you see it says this step will be evaluated outside the data source in a moment when i connect to sql server this will be a bit different all right so we have this sets it's gone ahead and changed the column i'm going to go ahead and fix this you see this little question mark like i said it's just an extra character so i can actually click in the cell and delete it and you'll see now it's added a step called rename columns you can always either remove a step go back in time i could go i can return you can see i can click at different stages and look how did this data look at that point in time this is your breadcrumb trail critical and it's how we keep track of everything so i'm actually going to go through here and modify and get rid of these little extra special characters that we have here we don't need those that's all i'm doing for this first little guy so we can clean that up let me move to the right here get rid of this we don't need that and i believe there's only one left at the very end here for our closing dates and you can see that present and available the last one i believe does not have it i'll make sure fund is okay so you can see rename column is there and just in case once again this is like a code generator we're not going to dive into m and how this works but just know everything we're doing here is being translated into the mashup query language that's what this little formula bar is at the top that's what it's showcasing you you can you know they always have it on there which i appreciate and it's good um we're not going to focus on it but do note there is code being written but we can rely on the ui that's what makes this so successful now for the most part we can pretty much do a ton of different options we can do a ton of different options in here right and most of it is going to be readily available to us we can make you see choose columns removing columns split columns all of these items are available to us there are a couple of things though that do move into the realm of what's called a calculated table and we'll look at that here in the next example but let's say i want remove some columns just like you would in let's say like power bi or so that nature here's choose columns and i can say you know what i don't need to use cert or i don't need to use fun these are columns i don't think i'm going to need so i eliminate them just by making that very quick decision you can see on the right hand side it says choose columns that's the step and fund and cert are now gone the fun part is too if you're like wait what did i get rid of what did i do you can always hit the little cog over here and it brings you right back up to what was done and you can modify that step in line so there are some cool options we can do we can manipulate we can revisit what we've done but we can add other steps too now we're not going to go wild and crazy here with the transformations but you can do things one of my favorite things to do here is if we want to enrich this if we want to extend this and add more items you can of course add columns right there's a lot of options here in the add columns so what we can do is we can actually go ahead let's say i can say you know what i see closing date is right here i want to get a column that gives me my year right this is down to the day level but i'd also like to go ahead and i'm going to right click this and hit add column from examples and i can say i would like the year 2020 basically what it's doing is it's looking at the row and the columns that i have highlighted so for this first record i know if i look at the date here it is this october 23 2020 so from ad column from examples it's saying give me an example power bi service is saying give me an example of what you would like based off of the data in this record and we'll try to figure out what that pattern is so when i give it 2020 it realizes that we want to extract the year from this column i said we're not going to focus on m and the code language itself but you can also very easily visibly validate this look at this for the first row 2020 second 2020 when we get to 2019 17 it's figuring this out so by me just typing in 2020 boom i've created a new column that extracts the year from a specific item and it's actually already relabeled this and we can see there is our little step on the right hand side it's applying that and now i have a table i got rid of two columns but now i've added a third so manipulating this adding these items very easy to accomplish but i'm going to show you something later like we said you can see we have map to entity that's there we're not going to dive into that but it literally brings up a list of tables that are available there's a ton of tables in here guys so if you want you can acquaint yourself with the common data model but you could map things but once you're there once you're set and you've cleaned this up like perfect you know this is what my users are going to want this is what they're going to need that when they connect to it they're going to see the na table name it's called failed banks and they're going to have access to these six columns they'll never see cert they'll never see fund right we got rid of those they are going to get year because we added it and there's there really is a lot of things we can do here i mean explore and look at that transform ribbon we can do a ton of capabilities but in the very next example i'm going to showcase something to you where it's going to use a linked table and it's part of that linked table i'm also going to show some additional logic which shows us what a calculated table is and i think i saw quickly in the chat there was something around combining data really when we start to combine data so when we start doing merge operations when we do appendings that can affect it when we do in our case i'm going to create a reference table and then do a group by on it these create calculated tables and there's a special symbol little that'll showcase it'll it'll be visible on the tables on the left and it'll have a little lightning bolts this will let us know it's a calculated table this is where we start moving into the realm where you need to have either premium per user or premium per capacity but right now no lightning bolt nothing sets i can go ahead and actually save and close this this is now storing this in this tenant in this environment i have not tied this to azure data lake storage so this actually storing it into the default power bi storage but i can give it whatever kind of data flow name i would like so we can say our first data flow something of that nature you could add a description uh information about failed fdic failed banks right just something that's more informative for the users and then it'll it's now present and available and we'll see in a moment how this showcases itself to us and we see right away it's asking do we want to refresh now you can of course do manual refreshes this does not require a data gateway so i can just let that happen in the background because we're going to want to see this data when we consume it but now if we go and revisit the ui right if i go back and look this is what we see here is our our first data flow okay you have a bunch of options you could manually refresh this you can go ahead and schedule a refresh we'll look at that momentarily this is where there's other options where you can delete edit export but you can actually you know we see edit if you just click on the data flow we get back to that view that we just saw i'm in the data flow and now i can see all the tables that are part of that data flow right so that is how we can go into it and if i say you know what i actually need to update this i made some sort of i want to make a new column or maybe i want to bring back a column you can always hit this option right here where you can hit edit table and this brings us back into that power created right where we left off and we can update adjust amend delete add things as we see fit so you always have this keyboard this is what we meant by having this centralized logic that's available to us within a workspace itself right so it's all here whoever i invite to this workspace they're going to see these data flows and they can take advantage and potentially update and amend and adjust and any of those sorts of items so that's a pretty important piece of the puzzle what we can do as well is now that we have right created our first data flow we have other things we talked about the linked table so you'll notice in this case i pre-created a date dimension this could be anything in any source for any of you guys right i just created this in a very simple fashion using a blank query but let's imagine this is a table that's coming from a sql server now we can't go into all the nuances of why you would use a date table what's the purpose of it this goes much deeper into modeling but a date table in the con when we talk about things like modeling and dax it's a pretty impactful topic here so you generally want to have a date dimension in your power bi models if you're going to do anything with time intelligence right so i have this table this is a common thing that would exist across various data sets that we would use in various different reports nothing really changes the same table over and over so i've gone ahead i've connected to wherever that lives sql oracle maybe i'm pulling it from an excel spreadsheet in onedrive whatever it might be it doesn't really matter it's just it's connecting to a source well rather than creating new data flows and connecting to that source and having a different data flow and connecting to that source every time we schedule a refresh we're hitting that source over and over right well i have it here and what i can do is i can create a data flow which is based off of a linked table but once again length tables is where we start moving into you must be premium per user or you must be premium per dedicated capacity and you'll see it kind of lets us know how do we do the refresh so this scenario i'm actually going to go through the process and i'm going to go ahead and create a new data flow this one you definitely won't be able to follow along because i'm just going to show i want to i want to tie in another piece of this puzzle so there's a small cool little thing i can show you um but i'm going to tie this into a azure sql server so i'm going to create a new data flow which is going to start with a linked table this date dimension and i'm going to bring in additional data from sql and we're going to see how it represents itself what how it shows in data flows and really what we want to be kind of like how do we refresh this right that's going to be kind of one of the elements so as we just did we start the same we go here to new we can make our way down to data flow and we get the classic four little choices this time around though we can hit linked tables right and when we do this we're presented with the basically how we want to connect to our power bi data flows um i think i'm already connected but just in case let me make sure let me connect using my credentials here okay looks like i already am perfect this is designed in this manner to allow for what's called multi-tenant authorization like you could technically connect to a different account which is connected to a different organization which connected to the appropriate account which i have access to the appropriate tenant i literally get a list of effectively all of my workspaces that have data flows so in my case we know on this one that i have here we have the new fdic one the you know our first data flow that's one we just made and you can see when i expand it i can choose which table i want to use from it now this isn't actually the one we want to use but if i go over here to the date dimension the one i created ahead of time here is this table i can check on and say this is where i want to start i want to pull this table i want to reference this table which is in a different data flow and i can go in right into the same normal power query editor online but you'll see it tells us link tables can't be modified this is going to be from a read-only perspective any changes that are done to this table aren't going to be saved it actually just removes them and we're going to see when we finish creating this data flow it actually says you basically if you want to refresh this you need to go to this data flow which has this original file right this is just a linked table and you can see it's got that nice little icon that's what represents a linked table and you can see a nice tool tip that tells you exactly what's going on right maintain consistency across your organization everybody wants to create some new reports they want to use and do time intelligence with dax well let's make sure that everybody's using the sate same date dimension that table's been set up already for you know all of our fiscal details all our fiscal information it's already there it's already in play so that's what we can kind of design and set here so that's taken care of now it's all about let's bring in some more data because this is something that i want to showcase as well we mentioned these little icons here and it says this step will be evaluated outside the data source what does that mean and obviously there's an icon here what other icons can be present well let me go ahead and make a quick connection here i'm going to go ahead and connect to this should be my azure i said this one you just kind of want to watch and see i have a little friend visiting me here all right okay and we're going to see i'm just connecting to this database because i want to show i'm combining this data with a linked table and a moment here you know we have those live demonstrations you're waiting on there let's hopefully it behaves itself and lets me go in because what we're going to showcase here is that icon and some of you have been working with power bi recognize and know this but this is something that's pretty cool from an efficiency perspective and it's going to be something called folding all right so i'm going to go in here and i'm just going to grab a couple of tables i'm going to go ahead and grab dim customer i'm going to grab dim product just a couple of items here for me to tinker with and i'm going to hit create so this is going to bring these three tables into this data flow alongside of my link table and you'll see it it's going to you know the link table has its own little icon and these ones are just making that connection to the azure sql db it's going a little bit slower than i would like but it's okay we'll give it a moment so we can see exactly what's going on here on the right hand side so maybe maybe maybe it's thinking about it okay there it goes notice the icon that you have right there on the right hand side see how it's green even though it's loading once it kind of finishes here there's going to be some a different indication here pardon the delay part in the weight while i'm looking i just happened to see alberto wrote that he's in his my workspace and he doesn't see the ability of creating data flows once again as a reminder in the my workspace dataflows is unavailable you can only create data flows inside of an app workspace so you need to create either a pro a premium per user or a premium per capacity workspace these are the only workspaces that allow that creation and sorry guys it's taking a little bit longer to load this up this is you know how it goes when you do a live demonstration like that there we go i just had to click off that other tab so you can see it says this step will be evaluated by the data source okay if i go ahead and i move over there's some additional columns here this is due to the nature of the relational database that i'm connecting to because there's been established relationships it's kind of neat it's like allowing me to do lookups but let me just decide you know what i don't want these columns so i select them and i remove it we know that another step gets added here on the right hand side and we're going to see that it continues with this little green icon with this lathe with this little lightning bolt this is actually something called folding and here's a great part right all those icons you can go in here and you can look at the official documentation this is basically making it so whatever steps i'm deciding to do here it's actually pushing this over to the actual and i can you know put this quickly in the chat here for you guys give me one second this is just official documentation right but you can see here it talks about usage whenever we see that lightning bolt whatever steps whatever business logic that we're employing here it actually goes to the source system this is awesome from a performance perspective there is quest get sent it's actually azure sql database itself which is actually going to manage and maintain this or you know process that and then we only bring over those items that we find to be necessary so that is going to be an awesome thing for us from a performance perspective now there are some nuances when it comes to folding here only certain sources support it in this last example where we did the failed banks example that was basically going to a csv hosted on a website so naturally there's no like native query language right there's nothing that we can like translate our m to in order to push the workload to the website there's no that source isn't capable of handling this so for relational databases such as that like oracle sql any of these sources they'll have this capability so that you can leverage folding the idea is is the work that i'm doing in the power query editor online can that be translated to whatever the native query language is so since i'm using azure sql database where we're talking about here is sql so this is all translated and done in the background for us using what's called folding and you know folding is going on when you see that little green database icon with that lightning bolts the key important thing to remember and recognize though is if we decide to go through the process and create or do some sort of logic in here which is not supported that cannot be translated the folding will end as an example what i could do is something like this right if i go over something like the first name there's a lot of options the power creator offers us one of those options i can actually do is if i go to i didn't mean to do that one here's the best part i can just hit cancel if i go over here there's a transformation option that says i want to go ahead and actually let's go ahead and do a capitalize each word let's see here capitalize each word isn't something that's available to us within sql so when i do this step once it's finished processing let's take a look at how this presents itself in this the you know the custom entity the steps on the right hand side we'll give it a moment here it's kind of loading up but from a folding perspective we're going to see that there's going to be some kind of nuances to this it's kind of going you can see we're still thinking about it notice the change in the icon it says this step will be evaluated outside the data source so now this is actually being processed here on the power bi service itself like i said a lot of steps a lot of sources that you use are going to be using this it's capable of handling this and it's capable of handling this in a performance way but if we can push work out to the source side that is going to be fantastic so when working in the realm with sources that support this right once again this is called folding pretty much anything that's a relational database will support this you kind of want to stick and do any sort of rules do any sort of logic that you know your source system can handle right so capitalize these work doesn't work removing columns that's just adjusting like a select statement if i want to filter this down that's using a where clause but it will visibly showcase to you am i folding or am i not and as soon as you introduce it's kind of like a chain that has to be unbroken as soon as you introduce some sort of transformation that doesn't allow folding to occur you can't do any more folding so you kind of want to make sure you layer anything that can help you from that perspective first and foremost now let's say just before we go on our little break here i talked about right now we're already kind of in a space where this requires premium per user or premium we're using a linked table in here that's what dim date started as right if we wanted to and let me go ahead and make sure that all products is all cleaned up here because it's kind of giving me a warning about it so give me one i'm going to clean that up there since you guys don't have access to this i want to make sure it's taken care of there's actually a photo image here too that doesn't we don't want that but while we're here let's assume here we have our customer table right we also have a table here that basically records a bunch of transactions that's what's happening here well let's say i want to create an enriched version of the customer table all right i would like to have a table that effectively i want my customer table to have a new column and it tells me how many transactions each customer is attributed with now granted of course we could just bring this data into power bi and this is something we could easily do with just visualizations but from a data perspective from a transformational perspective i want to just enrich my customer table i want to do and figure out some grouping so what i could do potentially is i can go and use our internet sales table which once again it's a transaction all of these represent transactions and we have a bunch of details who bought it where to get purchased what got bought when did it get purchased just a bunch of stuff right and i'm just going to do a little bit of clean up here at the end to remove these once again these are just record data types we're not going to need these here get rid of those and since this represents this what i could do is if i group this table by my customer id and i group on it and i get a count this will give me customer id how many transactions customer id now granted it's going to be in a new table but then i can combine this with dim customer and make a new enriched customer table there's many different scenarios where you're going to want to do this kind of logic where you're going to be combining data this is where we're going to see an introduction of our calculated tables once again this is a feature that is only available for premium per user or premium dedicated capacity so let's say this right i've got internet sales we know what we're going to do i want to group by the customer id here and we want to do accounts right i want to do account grouped by customer id so to achieve this i'm going to go ahead and choose my internet sales table because i like the details of it i don't want to modify or transform this i'm going to go ahead and create a reference table just by doing that check out the symbol next to this table there's little lightning bolts right whether we duplicate whether we reference when you start to merge things this is going to come about so this right here introduces a calculated table and this is only available for premium per user or pro or premium per user or premium per capacity so very easy to kind of step into this realm when we start to adjust this now the purpose of this table of course is to give us our number of transactions right that's what i want this table to do and i'll just simply do a group by right over here in the transform we can see there's a group by it simply pops up what do you want to group by and i'm going to say i want to do it by my customer key and i simply want to do a count of the rows i do this it's already a calculated table um this would be something that would adjust it and you can see there's my customer there's my customer id and how many transactions obviously this isn't valuable yet right it's just an id it's our customer table right it's this table right here that has the name the email address all the good info so we need to now combine our new calculated table with this one to get a better more enriched customer table and that can be found in the home ribbon under combine data we can simply hit merge that's what we want to do here right we can go ahead and merge our customer table and actually what i'm going to do is i'm going to do a merge as new i'm going to create a brand new query and we're going to call this enriched customer data but i'm going to combine the data from customer right i'm going to combine it with my new number of transactions and we're going to do this on the customer key and we can see it's presenting to us you know how we're going to set this up my ui is a little bit zoomed in so we also have this option of what kind of join do we want to do there's definitely a lot of nuances when it comes to joins so definitely something if you're unfamiliar with joins this is something you want to check into this does have an impact there's technically not there's obviously these are options there's not like a right choice like do this every time it depends on your behavior of what you want to achieve in my case i'm going to leave this to a left join which effectively says i want to keep every record in my customer table i have 18 484 customers that's how many rows i have in my customer table i don't want to lose any of that and i'm going to bring back only the matching records which come from my in this case it's my kind of tailored down my grouped by version of the fact table the internet sales table so i'm going to bring that over because there's a chance right there's a chance that there are customers who haven't made any purchases they have zero transactions so they're not going to exist in this table here below so even if they don't have transactions i don't want to get rid of them so that's where the joins come into play definitely examine and go further that was a quick description of what a left outer join does but as you can see there's many other choices so definitely look into those options so i'm going to go ahead and make my join and we end up with a new table which is effectively just my customer table with another column my number of transactions so this would be my enriched customer table we'll say something like that just something to kind of describe what's going on here so when my users go to pick what tables they want to use we have access to it and i said it's the customer table if we compare them they would be exactly the same but instead i have this option here which allows me to if i expand this include my counts right and i'll rename this we don't need the customer key from the merge we already have it and we'll rename it instead of count of course we'll say number of transactions something of that nature right something that makes a little more sense number of transactions so now it's there it's built in we just have an additional piece of details in this dimensional table something that my users can take advantage of maybe they're not as savvy or as versed with power query editor they don't know as much in dac so they might not realize how they can come up with this i can provide it to them right here but like i said this would be if i wanted to do this in power bi there's actually a ton of ways we could accomplish this already but now i have a table that has some additional facts additional details some additional attributes that i think my users would benefit from and we can go ahead and close this out right now i have a new data flow so it's going to ask me to name this we'll call this our sales information dataflow we've got you know sales and enriched customers and our products so that's what i'll go ahead and call this but now we have another data flow inside of this workspace this one though is using a linked table and it has a calculated table so two elements which require once again premium per user or premium per capacity right so it's important to understand this and you can see actually the nature of the nature of actually being able to like leverage a calculated table it's not too far off it's very easy to kind of you know go venture into this realm where you're already using a calculated table so you can see this is my data flow you can see all the tables behind it we're going to call this our sales information data flow just like this now when you rewatch this as you go through this example again this is for anything i decided to use an azure sql database to showcase to you folding the key piece of understanding here is any table that exists in any data flow can be used as a linked table so you can reuse tables in new data flows and as you can see it tells me this is a link table it can be viewed and edited in the source data flow and it's a little it does it's let me see if i expand this it's a little bit i think it's because of my resolution you can't really tell it but it tells me the exact path data flows it gives me the name of the workspace the name of the actual data flow and then the name of the table so it's telling me exactly where to go if i want to make updates and changes to it and you can see there's no ability for me to do an auto machine learning model i can't do an incremental refresh if i want to do that kind of stuff i got to go back to my list of data flows i got to find my date dimension which this is the same one and i can set up those refreshes here but the best part is i set this refresh up here everywhere in every data flow that's leveraging this linked table they all get refreshed all in one go so standardizing this logic across all of your data flows that's the idea behind a linked table and how we can take advantage and use it but do be aware and remember kind of the licensing structure we're now in that premium per user we're in that premium per capacity space so we've got these two items let's talk about something we have been just seeing the manual refresh let's take a quick glance and look at what potentially scheduled refreshes would be all about so let me go here and bring this up it's actually very very straightforward as you saw refresh now it's there but obviously one important discussion needs to be had here is if you are using on-premises sources let's not forget the conversation that we visited earlier in the class when we were talking about kind of what you might need if your sources are being brought in or you know are provided from an on-premises source right if it's not native cloud native data gateway comes into play so that will be one of the kind of prerequisites for setting up a scheduled refresh okay now scheduled refresh and actually manual refreshes we've seen it being popped up that can naturally happen whether you're a pro user or whether you're any of the premium license users as well as long as naturally the data flow in question fits within those criteria right you're creating a new data flow that doesn't use calculated tables or link tables because you're only using pro yes you can manually refresh that and you can schedule it which we're going to see in a moment very very easily the incremental refresh option which is present is only available once again for premium or premium for user now this works for you know the reasoning or the logic behind incremental refresh if you're unfamiliar with that type of loading data pattern it's not exclusive to data flows this is something that's been around for a while just loading data is that you will effectively create some you have to map and define some predetermined date columns so that you will specify hey i only want to refresh items within the x amount of time so you specify the length of time and then you go ahead and you specify a date to recognize if it falls within that threshold that criteria so instead of refreshing the entire data set which is what happens when you use a manual refresh or a standard schedule it literally just refreshes the entire data set imagine removing all the data and then reloading it all in so as you get larger and larger tables this will take more and more time the idea with incremental refresh is that you can reduce that but it should be known that when you do an incremental refresh the first time we of course have to load all the data in so you have an initial load which won't save you any time it'll be the same as everything else but after that whenever we schedule this incremental refresh it's only going to update records that fall within whatever threshold we define and we'll take a quick look at that user interface so you can see it once again we are falling into that premium realm here but refreshing if you've done anything with power bi literally mimics the exact same user interface you toggle it on specify what the cadence is the frequency is and those time frames let's not forget though we do have some limitations on the number of refreshes that we can do per day this also relates to what kind of licensing if we have what's called the shared capacity which is just going to be pro and free that's going to be eight refreshes per data flow per day so eight i have seven data flows each one of those can be refreshed eight times per day if i have premium per user or premium per dedicated capacity that can be refreshed up to 48 times per day and if that number seems a little bit weird it'll make sense here in just one moment so let's go right back to where i was right we have a couple of data flows here we've seen already there is our refresh here is our schedule refresh if we were to access the very one that we just went to which is right here we're going to see that it brings us to a very familiar location you can see literally if i had power bi data sets to be right here if we needed a gateway they would show up it would let us know we'd have to define it but it tells us for this one which is using azure sql database and in our case just a web it says you don't need a gateway because all are in the cloud which is awesome that's why i wanted to go this route and you can see there's an option schedule refresh it really is this straightforward i turn it on i want to do it daily i want to do it weekly what's the time zone what time do i want to refresh this at notice how i can set this up though i can choose an hour and then the minute section you can only set this for the half hour and the on the hour this is why we said with premium you can do this 48 times per day technically that allows you to set a refresh for every potential interval that's available to us within the user interface the ui because there's only 48 of these periods throughout a day that's why it has that number but it should be noted guys that there's actually some external methods in which you can refresh power bi data sets as well as data flows so you can use things like xmla endpoints you can actually use things like logic apps or power automate to basically make http calls to the backend power bi api there are ways that you can refresh your data sets so your standard power bi data sets and your power bi data flows without having to lean on this ui but regardless even if you go that method the limitations of eight times per day for pro and 48 is still in play when we talk about premium in those likes right so really cool ways to do it i really like power automate um logic apps both are these items which are based off of triggers so you know maybe this table and sql server that gets updated right because maybe that's what my data flow's pulling data from that sql server gets updated i can make power automate watch for that and when there's an update to that it triggers it and then boom i can send a request to update or send a refresh request for the data flow that's pulling data from that location pretty awesome also i just can't go into much more detail of this don't forget that in the future here we're going to have access to what are going to be called power power bi streaming data flows something you're going to want to look into it's a way to kind of move around not having to have a scheduled refresh here so that is going to be a really cool introduction to this kind of entire power power bi data flows scenario right so what we'll do guys we've now just we talked about the common data model we broke down kind of workspaces and some of the licensing requirements and all of those fun elements um what we're going to be doing here is we're going to take our 15-minute break and then when we come back we're going to have some conversations around ingesting this right let's talk about what are the different options we have we've actually already seen one we can ingest data flows via linked tables so that's the one way and of course how do we access this through power bi desktop and take advantage of it so that's what we're going to be taking a look at we're going to examine that right when we come back from this break all right guys welcome back um just a quick little break i was just kind of perusing through the chat there just to kind of take a look there's a couple questions of interest that i was kind of looking at let me actually bring up where i was looking here uh one second there we go um so yeah so i was taking a quick little gander and i saw some questions um i think i mentioned it when when we were talking about it but not available in the my workspace so you always have to create a workspace don't forget the whole nuances of the types of workspaces there were some items there that talked about you know what about modeling right what about modeling i see that you're doing the shaping side of things how can i create relationships between my dimension tables and my fact tables i had you guys are on point i just saw the chat everybody's getting involved i love it um but yes the answers that were given in the chat are correct right this is gonna be basically just the shaping portion of things we're connecting to sources we're creating business rules and then we're storing those in tables and then it would be something in the form of power bi desktop which we're going to do momentarily we would connect to that corresponding data flow itself choose the tables that we would like and now we're in the power bi desktop and we can take advantage of the modeling tab we can go ahead and add some dax we can create those relationships so it is here where we would accomplish that modeling phase now this isn't don't take this with an absolute grain of salt this is me just as a user of power bi desktop think of me as disconnected from like pragmatic works or microsoft or anything like that i don't want this to be taken and say it for truth don't quote me on this one but if we look at the direction in which power bi has been going and the service itself right now a lot of people are saying well you know what if we're talking about power bi desktop and we're talking about using premium which you know with the new gen 2 stuff and the performance this is like a super set of analysis service like the the grade and degree of features that we're getting with power bi is amazing so i wouldn't be surprised if we see some sort of extensions continuing into the service maybe we'll have some aspect of modeling that will eventually become available to us in the service but as of right now that is not available um there's nothing on the road that i'm aware of but who knows right with how fast and frequent and how much microsoft has invested in this tool it is only going to continue to grow with the feature sets that we have available so just wait and you'll see so we have the dimensions and facts modeling that's going to be on the desktop we're going to see that here momentarily i do believe i saw another question that was already answered as well um let me remind myself where was it we're talking about incremental refresh um yeah that is an option i'll i'll show that really quick before we get back to the sides um basically there's two options when you do incremental refresh which once again you set it at a table level so when we saw schedule refresh that was at the data flow level that's for like everything but you can actually go into individual tables and you can set incremental refresh which remember premium only here but you basically let it and decide you have to have a table that's designed in a certain way because it says point to a column that we're going to look at right that has a date and then we're going to look at basically a threshold of if this is when a certain amount of time those are the only records that we're going to be looking to update and even further the second portion of an incremental refresh is you can specify detect data type changes and you specify a column and if the maximum value of this column changes then it that'll those will be the only records that actually get updated so there's definitely some nuances jeremiah put some links in the chat that talk about incremental refresh they talk about those items so definitely check that out um you know it's a cool way especially if you're working with larger data sets it is a way that you can reduce the amount of time that's taken to refresh these items so a couple of items there also just be aware i think we had a message put on there can we consume multiple data flows from a single pbix file can we mix data flows and data sets in the same pbix so right now we'll kind of look at this momentarily there are some preview features in play with power bi desktop when we connect to it the way i have might set up it's going to be import mode so with import mode you can basically bring whatever you want to power bi right literally i can bring in this data flow that data flow like you can bring in whatever you want when we talk about imports um if you're less savvy or knowledgeable on that topic around imports versus direct query um we don't have the time to go too deep into that but right now when i connect to dataflows it's going to be an import so i can just bring in whatever i want as many things as i want excel data flow sql server but there currently is a preview there's right now there's a preview option that you would have to set up in your tenants so in the service area you'd have to set this up we have to use the the preview gen 2 com compute optimized engine so there's a couple things you have to flip on in the back end but this will let allow you to leverage direct query with data flows it's direct query is awesome it's a super powerful feature it doesn't you don't have to do any sort of refreshing if you're unfamiliar with data direct query definitely check that out um but right now in the current iteration the current preview we're not going to be showing off preview features because they're apt for changes so very rapidly but right now in the current uh preview of direct query with data flows you cannot leverage composite models right you can't do a direct query and an import so that would be a scenario where you would only be able to point to one data flow so that's kind of a long-winded answer for the question i just saw there but when you're doing import you can do whatever you like that's kind of a really cool strength but that means that you're bringing all the data from whatever your source is you're loading into the power bi model which will require a refresh of that power bi data set itself um but like i said if we're refreshing that data flow that's getting fed right into the power bi report we just need to make sure it's being refreshed so it's a really cool thing and like i said when streaming data flows comes into play i think that's going to add a whole new kind of avenue when it comes to power bi so i think it's going to be a really cool thing so let me go ahead and bring this up it's been mentioned but i do want to make it formally put out there guys so we don't miss out on this this is actually available right now so we're going to have something going for a limited time here um check out the on-demand learning platform i know some of you already are members we love you we appreciate you hopefully you're enjoying the updated content we're actually excited because we're going to be revealing something new here in the future not too far off and we're continuing to just add new content update what we have it's a if you haven't checked it out sign up at least for a trial you get seven days to kind of view most of the material but there's like i think there's seven or eight in a day classes that you get free for life right app in a day dashboard a day robotics process automation of the day paging so many good classes guys definitely go check it out this promo is going to be running as you can see until next tuesday ending at midnight so make sure you take advantage of that i mean guys tons of classes 50 plus classes here if you're just talking about power bi we have over 12 classes right alone in there so definitely just an exciting little resource that you can tap into and leverage another thing of note um that i will be bringing up at the end of the session just as a reminder we are going to be having these learn with the nerds we're going to continue this moving onwards and forwards different topics next time we're going to be having power the man the myth the legend himself mr brian knight is going to be joining us and we're going to be doing that august 12th 11 to 2 same time frame east coast time frame and it's going to be all around power app portals right data flows it's kind of still a little baby as far as you know how much time it's been in existence it's still relatively young the powerapps portals even younger this is a relatively new interface and it's awesome in the powerapp space allowing us to give a capability of sharing our awesome apps that we've created with external customers that's the idea behind it there's a programmatic element in play here but that's we're going to be looking at on august 12th so if you haven't already signed up when that opportunity comes around get situated get ready to pop that popcorn and enjoy another three hours and this time with mr brian knight himself so i said i'll bring this back up in case anyone's maybe still on the break and miss this out so you're reminded so you don't miss out on that additional free event that we're going to have going on uh there in a couple months so with that being said let's return back here right this is where i've left off let's remind ourselves right right now i'm living inside of my workspace i've created right now two data flows our first one was here this one was available for any licensing type pro premium premium per user all of them good to go which is amazing from there i went ahead and introduced a new data flow right this one right down here and this one went through the process of using a linked entity basically i took this table that was there was one table that's part of this data flow and i built a new data flow which brought that in as a linked link table and i added more to it and we saw how that manifests itself we saw how it presents itself inside the data flow saying hey you can't make any changes this is a read only um i don't know what that's about i thought i'd close it correctly but this is a read-only type of scenario if you want to make changes go back to the you've got to go to the source dataflow itself and it's got a little link here that takes you right to it you can make your adjustments as you see fit but this was when we started into the foray of creating this is a power bank data flow which must be premium per user or dedicated capacity and also since we were already in that realm of premium we decided to go ahead and add in these calculated tables right this is the symbol which represents the calculated table itself here so that also premium per user premium capacity we saw how very easy it was to kind of stumble into this area also i did note and saw the comments earlier hopefully that example kind of answer that question there was something around hey i've been trying to do combined data inside of power query there was errors occurring um you know in that scenario i went ahead and did a merge so i'm you know that's an example of combining data the other will be appending data so i'm not certain if that answered that question i do believe historically in the past there were some oddities with doing combination of data and one of the ways you would go around it is by basically copying that code like i mentioned you can go to power bi desktop do what you need to do right and i'll show this in a moment in the power creator whenever you're doing this you can just take and copy that code and you can just put it into what's called a blank query in a data flow and that's kind of like a very fast way to take what you've done in power bi desktop and just like turn it into a data flow just take the code and copy it over it's an interesting approach i do believe that was a method from before but as you saw i did just use a combined data capability and all was well and good so hopefully that kind of helped answer that question now here we are um we just saw when we're talking about the standard schedule refresh that's at the data flow level but when we get into the data flow we do have options here for refreshing at the individual at a table level you can incorporate incremental refreshes here basically you're going to be setting these guidelines and you can see the ui pops up and it's just telling you do you want to use this yes or no you have to specify a date time column that you're going to be looking at so you have to have a design that's meant to support incremental refresh you must be thinking about this even before you get into data flows we need a column that is going to be kind of like a created on a modified on something that is tracking when this record is being inserted or updated because remember this is coming from azure sql database that's where this source resides so something is responsible for loading that azure sql database and we're going to want to track and keep an eye on when that's occurring what records are being updated so we simply point and indicate a column in question we say how long we want to record this i want to go back and store rows in the past quarters months you just choose what's the increment and what is the period and then you want to refresh and you specify how much so you definitely want a larger you don't want to keep the last year let's say or something like that you can choose this effectively as a method of partitioning this and i only want to keep the last you know i want to rever be refreshing the last five months you kind of just have to figure this out what it makes sense definitely check out the article that we posted in the chat it talks about some of the kind of best practices and understanding efficient incremental refreshes so that you can make sure that we're not getting duplicates we're not missing out items deletes can be a little bit odd if you're setting this up so you have to be wary of deleting so if a source does handle deletes you have to take some additional steps here but that's more on like the back end or you could also say as well as having these increments set up these thresholds established above you can also specify a column like a modified on or a you know a this technique doesn't even i mean it has to be date date time but it could be something a little bit different than what's above and it's literally going to capture it's going to be caching this data this one column it's going to cache for each row what the maximum value is there and when that increases when that maximum value increase so as a newer date is put in there like modified by and that increases that triggers that we're going to be doing an update so you know there's an option once again this is only going to be you can see you can only refresh complete months so there is a lot of options of consideration when you're going to be going into this option so um also now i'm thinking about there's one last question i can get caught into so many of these guys because then you guys are being so active and involved i think i did see a question about updating like like going back to the source system can it be used to update the source system um and that is not the case right we are collecting so in the etl process our load is effectively limited that would be the idea right in a traditional etl tool where you choose your source and you choose your destination here we effectively don't have a variety of destination choices you either load to the default power bi storage if you've chosen to attach your azure data lake storage it'll go there or lastly um i mean effectively that's it that's where it's going to be landed in one of those two areas that's all so we can't choose to kind of update a sql table right so it isn't going to impact other sources we're not going to be adjusting it we're just taking from various sources and we're loading it into these two areas whether it's our bring our own storage type of item or not so i think i did see that you're going to have your own separate etl process that updates and handle those source systems this would be something different but this is meant to serve as you know in effect like a centralized reporting layer you know you hear these talks about uh you know enterprise data warehouses to a degree we're getting to the point of functionality here where we can start bringing in stuff from so many different sources and creating these entities right here that users can connect to now it's from a shaping perspective but as we're going to see in a moment we can still do the modeling and handle that in power bi desktop the relationships and the model enhancements with dax and something of that nature so i know there's so many more questions those are just ones that stuck out my mind like i said once the class is said and done we kind of scrutinize and we go through all those questions and i'll kind of pick those out some that i think weren't answered in the class and we'll look to have like a follow-up video or potentially like a blog that i have there as well so keep on asking those don't think they're gonna get lost in the wakes we're gonna scroll through it there's just only so much time and that time is three hours guys so let's take advantage of the hour that we have left all right right over here i'm going to dive back over into the mix very quickly very short integration right we've already seen it we've already witnessed how we can consume this inside of power bi service right using the linked entities that was one quick way of consuming it technically the whole import and export that's kind of touching them too and technically also azure data lake is in play if we combine it and bring our own storage solution that's at the workspace level but one of the natural ways we're going to also take advantage of this is going to be through power bi desktop and i think i saw the conversation already in the chat it was brought up there that you there is a data flow connector and actually there was another question on how can users see this the permissions as you can see right here it's based on workspace access so it's a bit different than power bi data sets you can literally go to a power bi data set and just individually grant users permissions to that data set by itself so it has a kind of a more fine grain level of access with data flows it's going to be based on workspace permissions but of course there is and that's really one of the main reasons why the read only access is available to us for that because we don't you know there's going to be certain users that we may want to have access to this but we also don't want them the ability of editing the data flow so you grant them read access it's just the reader access role inside of the workspace itself of course if someone is a contributor or a member or an admin they as well of course will have that capability of seeing the various data flows inside of power bi desktop right but as far as editing it basically the the author edits the data flow itself but that's all done in the service itself but once again we can grant users administrative access to the workspace so that users can go in because that's what it's all about it's all about that collaborative kind of interface that's what we want that capability of doing so let's go ahead i'm going to launch power bi desktop and i'm going to connect to my effectively power bi service and see all of the data flows that we have there and we're going to bring that in to power bi so let me go ahead i'm going to close out of this guy here i'm going to launch power bi desktop and let me go back to kind of the generic screen so we can see all of our data flows here so we can get an idea right this is the data flows learning with the nerds workspace we have all these various data flows and let's not forget within those data flows we have various different tables right that's the whole idea actually i think i already had it open let me go ahead and close this extra one all right let's make sure it's the only one i have open it is maybe i there we go all right so i'm here naturally we do need to be technically interesting enough um you technically don't even have to be logged in over here on the right hand side that is how we you know can connect as far as publishing and getting various different items for custom visualizations but when we choose the option and let me make sure i don't have this already kind of cached power bi natively caches data connections cool i don't have it cached but i would say all right word on the street right they've told me that there's a data flow that i have access to and you can see they are promoting here the power bi data sets i think once we see things like once the direct query option of data flows becomes ga once streaming data flows come become ga if you if i was a betting man i wouldn't be surprised if you saw power bi data flows right here front and center i think it's going to be a really cool and very very real option for many users but it's not here now but i can hit the get data it is listed under common data sources we can see it right here and when we select this it's going to ask me this is a cool thing right it's not using my login here this allows me at this point i could be logged in and there are situational times and some of you in this very call might be part of what's called a multi-tenant environment where you maybe have you know the this is the main company's power bi tenant and maybe we uh work with another client and there's a second one that i'm a part of right maybe i work in two different areas this just lets you sign in to whatever organizational account you need so it basically supports multi-tenant capability which is pretty cool and just in case i just happen to see dirk you're spot on on that answer so looking good my friend thank you for that so let me go ahead and get signed in i'm using the same credentials to match the power bi tenant that i know my data flows exist in so let me go ahead and sign in on this guy all right being signed in i can go ahead and hit connect and what we should see here is my tenant and we can see look at all these fun guys and it's there guess what data flows learning with the nerds i expand it there are my various items the date dimension here's our first one here's the one with the link table right here and i can expand it and these just look like tables right now also it should be noted something i kind of didn't showcase um there are methods inside of the dataflows itself that you can actually like not load a table into the model itself because in this case i did label this correctly where i can say hey you know what i want to bring in the date table i have customer but hey there's an enriched customer right i could do that one here's products and i'll grab internet sales but there's actually a method in which in the data flow side we can actually make it so this isn't actually loaded it's part of the metadata because obviously we need it right number of transactions is basically derived from fact internet sales tables and we did some stuff there and then we merged it into enriched customers so you can actually make it so if you didn't want your users to see these items you can actually go ahead and disable them from being loaded into the actual model so in this case it's all visible so it's good that i did some labeling so users can see i've got two customer tables which do i want i see enrich that's going to draw my attention so absolutely this is what i'm going to use and then i can go ahead and load this in right and or or we could hit transform because right now as mentioned i don't have the direct query option turned on it's preview but we could potentially now at this stage continue even though i used data flows to kind of create standardized tables this still allows my users to introduce their own business logic on top of it because that very well very well well may be the scenario right steps one through 25 right these are the seven tables i know my organization needs to use and these are going to be the 27 steps that i know everybody's going to need done for this data set boom i take care of it i do it into power bi data flows now i make it available to them and maybe the finance individuals they're going to go ahead and add their own little you know maybe they want to change and do something in the power creator for themselves and then the operations does something for themselves and of course the logistics whatever it is they can further and extend this logic we're this is as you can see if you haven't been in power bi desktop and you haven't used it pretty deja vu is here right home ribbon you got the formula bar you got the applied steps power query power query editor online that's what we're talking about data flows is power creator online and this is the power query power query editor in the desktop so just very very familiar interface and we can make changes right you know i don't want this you know you can kind of make these updates maybe i want to get rid of you know this title this is representing like a dimension table maybe i don't want that anymore makes sense from a naming convention let's make these updates it should be noted um you know and i'll call this like customer i know it's enriched that's okay i don't need to put that out there um it should be noted as well right we can go ahead and say and make all these things we have full access to the power query editor here it's pretty awesome and amazing the one thing to know only i only want to mention this because i've kind of been like promoting it and kind of saying how cool it's going to be for anyone who's worked with direct query there are pros and cons there are definitely considerations if you want to go it but now that we know data flows is going to have this ability it's going to eventually be generally available so that you can leverage direct query do note that it limits so right here inside of the power bi desktop if i use direct query to connect to a data flow my capabilities of working in the power creditor becomes limited you can't do everything here that's just the nature of using direct query because nothing will be stored in the local power bi model basically the i think in the documentation they use a phrasing that is um you cannot use complex m query transformations something that falls into this category which is very common is any form of combining data really causes a lot of issues and there's other various transformations too that you just can't take advantage of so direct query there's a time and a place definitely check up on it if it's an unfamiliar term an unfamiliar connection method but once direct query is there and they add the capability of composite models r if you're going to write anything down from this webinar direct query composite models data flows when all three of those worlds come together it's gonna open up some new horizons very exciting stuff very exciting stuff but you can see i can make whatever changes i want i can remove columns and i say you know what um let's go ahead here give me a second i just realized i want to go ahead i want to go ahead this one also is how you can go ahead and kind of go into that direct query mode and i'm going to access now it's cached my connection so it already knows what user credentials i want to use but i say you know what i forgot i actually also want to go ahead and bring in that failed banks information right go ahead over here hit okay and there we go i know there was a direct question about this saying can i bring in data from multiple data flows boom there you go reminder right now with direct query this would not be able to be accomplished you basically with a direct query can only make one singular connection here so it means to one data flow only the tables in that data flow but when composite model shows up when that comes into play sky's the limit sky's the limit ladies and gentlemen so be very exciting now this really doesn't make much sense failed banks is a completely disconnected data set it's okay though we're not gonna like dive into the realm of visualizations but honestly at this point i'm here i'm inside of the power bi model it's going to load this in like normal i could access over here you know the modeling view and i can start defining relationships but we're going to see as power bi does it tries to automatically detect it so it's going to create a couple of these relationships for me which is beautiful but i can adjust this we are in what's now called the semantic layer right i'm in the power bi model i can define these relationship as i see fit right so very nice easy clean to access i can do relationship stuff i can start adding calculated columns i can take advantage of our dax don't forget we got to learn with the nerds three hours of dax maybe you went through that you got all that skill set you can do and take advantage of all those capabilities right here all the while it's coming from a centralized standardized source which was done and created using power bi data flows that's its purpose that's its ideal that's what it's meant to be used for and you can now kind of you know roll this out for the entire organization i do feel like there's a lot of parallels here with how you can leverage power bi data sets with how we're discussing how you can leverage or roll out power bi data flows there are some parallels but do remember data flows it's all in the service significantly better for managing and collaborating with fellow users within the organization that's one of those key things so let's go ahead and uh and real quick we aren't going to be able to show a composite model but um a composite model is simply just some tables are direct query some some tables are import that is literally a very simple definition but a 100 accurate definition it's that simple of course it relies on what sources you're working with but that's all the composite is some are direct some are import that's all definitely check it out it's a really cool topic supported in various areas right now not supported with data flows though so do keep that into consideration so that's how easy it is so that we can use our users can do this and as a reminder right this is my workspace i just need to go here and give access to whoever i want to be able to see this i simply add them right here who do i want to add simply use that email address this is tied into my organization's azure active directory and i assign them the role in this case all roles viewer and above if we assign viewer those users would have the ability of seeing a data flow right and that's more than likely going to be what you're going to use possibly contributor very popular option as well but users must be in this list in order to be able to see your power bi data flows and not to be remiss or not to leave this out because i did mention the beginning but it is very critical it's very important is don't forget right we talked about link tables that's way to ingest we just showed power bi desktop that's a way to ingest and let's not forget if you decide to go the route of bringing your own storage you tie it to your azure data lake you've just opened up the world to whatever azure services you'd like to use now we have data stored inside of an azure data lake gen 2 storage account agitated factory spark notebooks azure machine learning services they all give you the ability of connecting to blob storage accounts and actually technically power bi gives you access to connect to blob storage accounts so with the correct established permissions because you'd have to do a little bit extra you could use that connector in power bi desktop and then connect to the csvs in the data lake itself now granted it's a far more organic experience using the power bi dataflow connector um especially you know you can import so it's it's kind of odd that you would go that route but you know maybe maybe but there's by using the bring your own data your own storage option it just opens up a whole new world that has its own ramifications that has its own nuances once again about permissions there is now additional costs from an azure perspective in my opinion you know azure billing is a whole conversation on its own but storage accounts are pretty darn efficient as far as cost wise and what we store as far structurally how the data flows are stored it's pretty small in scope and size they do a fair amount of compression there so definitely something to take into consideration it is a pretty robust consumption side but you have to attach your own storage so you may or may not go that route now let's adventure into the world which some of you might have been waiting for this topic in conversation but let's adventure into the world of ai insights artificial intelligence definitely one of those buzz words that users use very quite you know tons this is one of those things that everyone loves a i ai ai and you might have noticed ai insights was showing up when i was creating a data flow it was there so we're going to look at a couple of these options of course it should be noted right off the bat all of these features will require premium per user or premium per capacity so it's falling in that realm you know i've said it a lot of times i apologize if i sound like a broken record yes you can use this if you don't have premium it is somewhat of a linear road but you can connect to tables clean those tables and then store those tables in the service and now they're available to your users in the organization just link tables calculated tables and these insights are going to be premium only so you just kind of have to stay in that area but if this is something you're interested in then you may have to consider you know and you know what the introduction of power premium per user is a very interesting thing right i was very excited when they announced that price point i know spending money is never exciting you know you always kind of say oh why can't they just give it to us for free here but the price range and the capability of what can be found in premium per user is pretty darn attractive definitely look at the nuances of how the premium per user license is leveraged because it's a little different than you might expect but you can use all of these things that we're talking about and by default it's going to be twenty dollars per user if you have pro it's technically ten dollars per month but definitely check out what ppu is all about how you can share things with people because that is his own conversation i just wanted to turn you guys on to that topic and that since it's so fresh and so new i want to say it just became available ga either at the end of march or beginning of april so it is a baby still and who knows where we're going to go with what features are there and what they might add to it but it is an interesting option a very interesting option but ai right when we go into the insights we're going to see we have things like language detection we're going to check out here an example of sentiment scoring which is pretty awesome key phrase extraction all these really cool things and really when we're talking about the world right of cognitive services right these little features and all these different items there's a couple of different fun conversations that can be looked at in this area so like sentiment scoring we're literally going to be working in the world of machine learning here but when we go into sentiment scoring basically this is an already created so you're taking advantage of azure's what's called cognitive services and they've already done this right they use in the background now trust me i could go on for ages on this i love talking about this so i'm going to try to talk about this in the most kind of linear way i can because when we start talking about a.i when we start talking about data science when we talk about machine learning this is a whole another world and i don't want to throw things out there that will cause confusion but you should get excited these are cool things and the services themselves are built on this technology so when we choose to use let's say like any sort of text analytics in the back end azure has already kind of created a machine learning classification algorithm and it is what generates a sentiment score we're going to see it's going to give us a score between 0 and 1. items that are closer to 1 that indicates like a positive score those closer to 0 that means negative but the model we use the model that azure has created it's already been been pre-trained with an extensive body of text and various different sentiment associations they've already done these items where they say you know these words represent good these represent they've done all of this work and it's very extensive at the moment right at the moment we cannot create our own train our own model on sentiments analysis we can't provide what's called training data more on that later on we just can leverage what azure has already done but as mentioned very extensive and it uses a combination of techniques while we're doing text analysis like things like text processing it does part of speech analysis word placement word associations tons of different things guys a lot of fun with the cognitive services the same can be said if we go over into the realm of tag images basically it's a pre-created machine learning model which is based off of thousands of recognizable objects living beings scenery actions and it uses all of this uses all of this in combination with what's known as cv or computer vision algorithms to output tags based off the objects itself pretty cool stuff and lastly definitely not least we'll have to see because this goes into a realm far more advanced but azure machine learning integration is an option that's available to us within data flows that we can look at and that we can examine so premium only this is going to be premium there's also if you have azure embedded technically that is another way that you can create a workspace but premium per user premium per capacity and there's also embedded capacity workspaces those are going to be the only ones it does only support online data sources so right now we can't tie this with a data gateway but that should be changing and basically you just have to make sure that you've turned on these features inside of the admin portal basically everything that you can do in the power bi service in the admin portal there are switches to turn things off and to turn things on now this last item permission and access to the azure portal um this basically coincides with what we talked about with azure data lake storage if we wanted to be able to access a machine learning experiment so if we create something in azure through azure machine learning services we would have to turn on and grant access to our power bi tenant so we could see it but we're going to go a different route data flows allows us the capability to create a machine learning experiment specifically an automl experiment inside of the power bi service itself so we don't actually have to step foot into the azure portal this is one of those kind of advancements that they're doing to make it so that the power bi service and the desktop like this ecosystem of power bi is going to be this very robust set of features that really kind of gets into that enterprise realm it's really exciting to see where power bi is going right now so that's where we're going to take a gandra guys we're going to dive in and we're going to take a look at some examples from some cognitive services right so to begin let's go ahead and go into the realm of text analytics specifically we're going to look at sentiment analysis we talked about that it uses pre-trained models let's go ahead and import some data into the data flow let's score the sentiment and we're going to go ahead and get those scores back to us and we can observe and we can see if we want we can connect to it with power bi and see exactly what's going on with it so the first things first also we have to make sure that you know we've turned these features on but we're going to go ahead and connect and create a new data flow it's going to be a new table so technically this is where it gets kind of funny we technically are going to start with an option that we've talked about that can use just pro you don't need premium but those ai features are going to require that the data flow we're creating be in a workspace that is either premium per user or premium per capacity so i'm going to go ahead we're going to go to new as we've done in the past right i'm going to go to a new data flow which brings us to our favorite little four little options and i'm going to choose that option on the left hand side right here add new tables all right as soon as we go into this little space into this little area here we are going to end up using the text csv option once again and to try to make your life easier i have a link that just points to a csv let me see if i can get this in the chat i think it was working it looked like it came up a little bit odd last time so hopefully it comes out okay here um if we need to you know basically what you need to do in there is simply uh right click and copy that link address the chat kind of like shortens it so make sure you copy that out there so that's going to go into play once you've collected that information like i said you don't need to follow along i'm going to provide all of these urls in a text file that will be able to be downloaded post this class so you can watch it in the recording at your you know at your leisure at the pace that you want but we're going to use the tech csv option again right go in and paste my url here which is just some kind of standard uh you know options nothing too crazy here and we're going to go in and look at what this is all about so i'm going to hit next let it connect bring in that data let it do its thing here take up there we go and we can take a moment to examine this right this is just some data that has as you can see a comments section so once again it's this company called fabricam you can see it has like this uh this topic its email subject you know the name the user id some basic stuff in here what we're really interested in is right here right we have this comment box and the idea is we're going to go ahead and use this box we're going to feed this box here comments this column to a trained model which is going to return us a score which as we remember sentiment analysis basically is what's called the classification model and it gives us it returns values that fall in um between 0 and 1. it's going to follow in that range things that are closer to one positive things that are closer to zero negative items that fall in the middle because they're actually it actually takes the time observe it some things that are basically found to be objective there isn't enough information for it to figure out whether it falls positive or negative actually we'll just fall right smack dab in the middle so a lot of times you'll just see straight up a 0.5 or something of that nature here so this is just the preview so we can go ahead and of course hit transform data that actually brings us into the ui i'm going to go ahead and hit this little guy over here to you know give myself a little more room and it's taking its moment and loading there it is but over here is my ai insights right and when i click on this it provides me the options if i had a machine learning model that was available to me it would actually show up here but interesting enough as i mentioned i already created one ahead of time and i'll explain why i did that um it'll make much more sense once that's the case so it's thinking it's looking it's connecting effectively to the azure services associated with my login and we should see various different options under some cognitive services so of course it's taking a moment we'll give it a second it's using a taking its time with this you know live demonstration as it is there we go now in my case because i have a trained model this is showing up here right we'll assume like i said i'll explain this later that's going to be the last example we do what i'm most interested in is this there's a possibility if you are like following along and you are doing this you have the premium and all that stuff if you didn't have a model this is all you'll see the folder of the default cognitive services options and you can see you have a couple it explains tag images we'll use that one next extract key phrases sift through text and surface important phrases detect language it returns literally is it english is it russia is it chinese pretty cool and then we have the bottom one which is the one we're going to be leveraging which is going to be sentiment analysis and you can see it really doesn't have many parameters here to fill out all you need to do the only required field which is indicated by the asterisk sign is what is the text that we're trying to figure out if it's good or bad if it's positive or negative and that phrase you can see when you see the pencil that just means like literal text so i can type whatever i want obviously that's not going to work in this scenario what we want is to hit the drop down and specify a column from the table that we're using in the background there and that column is going to be called comments so once i've chosen the option once i've said i want to use a column i can hit the drop down here and find comments and that's basically it we don't have to do any more now you'll notice that at the bottom there is this culture info this language iso code it's optional like if we wanted to we could put in the value of en if we if we knew the text was all english you could specify language this can make the process a bit more performant because now it doesn't have to try to figure out what language it is so like i said you could just simply put in a specific code in here it uses the iso format so en is english and so on and so forth you just have to kind of look what language do you want what is the iso code and that can be those items right for us we're just going to leave it blank and power bi first we'll go ahead and detect the language so we can see how that works and then it's going to input a value and it's going to give us the score that's what it's called the sentiment score so i'm going to go ahead and hit apply here right we can see that it's gone through and it's trying to invoke a function and a little message came up here and i'll show you where you can um do this it says hey we canceled this because combining data from multiple sources may reveal data from one source to another right there's just a question around data privacy it makes sense what we can do is we can simply hit continue to say i want to allow this and i'll show you actually where this option is presented to you right here in the ui because you can just say you know what i know i'm going to be working with multiple sources let me go ahead and turn this on by default because by default it's off it's right here under options for this when i see project options we're talking about this specific data flow and you'll notice here's the check mark allow combining data from multiple sources you can do that in this case the idea is we're effectively basically providing the source in this case from this online csv to the source in our azure cognitive services that's what's going on here technically if we had two different sources maybe like this csv from here and i had an xl coming from somewhere else we would still get this same message so that's what it's talking about you can manually turn this on if you so choose once again that's located here under options and they're actually technically some global options too that you can tinker with as well but we've got the same table right but if i scroll to the right hand side we're going to see that we have a couple of options available to us right we're going to see that on the far right we get our cognitive services score sentiment and what i'm going to call this i'm just going to clean this up and i'm just going to call this my score now formatting wise i'm going to go ahead and change this over because this is something i haven't really talked about but it is a conversation we talk about data shaping when we talk about the power query the power query editor whether you're in power bi desktop whether you're in data flows there are some basic things that should be considered for performance purposes and one of those is proper data typing right and when you see abc123 this is in any data type there can be some performance impacts here and actually if you notice you can see this little guy it says warning if i look at what it's telling me it says the query has an untyped column so it's actually telling me warning me about this so you always want to strongly data type your columns this has performance and ramifications this could have impacts on what transformational options you have available to you so don't leave any abc123s don't leave any any data types so i'm going to switch this over to a decimal number fixing that little warning message and i almost forgot my own advice right look at the name of this it's just called query that's pretty terrible let's go ahead and call this something like customer comments right nice simple straightforward i can go ahead save and close this and we now have a data flow which we just have to give it a name right we'll let as soon as we exit from this it's going to tell me to name this i'm going to give it the name of my company something like that and we could use power bi desktop to once again connect to this and see what's going on just to see how this is actually working so my data flow i'm going to call this fabricam i think that was the name of the company customer comments and we're going to say doing sentiments analysis on analysis on my customer comments right something like that and once we save this it does ask me to refresh it's important to note actually and this is something i haven't mentioned when you finish loading a data flow and pointing to a table like we just have by default there is zero data that's in this table if i connect it to this right now with power bi nothing would be inside of this when you finish i guess the takeaway here is the power query editor whether you're in the desktop tool whether you're inside of power bi data flows here that's merely a preview of the data we're looking at simply a sample of the data which we can create our business rules around but right now we've said okay i'm storing these rules i'm storing these connections all of that but we technically don't have any data loaded into this we actually need to go ahead and make sure that this gets refreshed so one way since i kind of that pop-up disappeared there is the option here of course you can just do an incremental you could edit but if i just go back over here right into my workspace remember i showed this we have our new one which is fabricam customer comments i could just click here so i don't i don't want to schedule something and wait for it let me just refresh this and it should be noted as well you can always observe you can always look at what's going on here if you hit the ellipsis you'll see there is a refresh history and it'll show you this is in progress this has been completed and there's some very interesting metrics that can be collected if you hit this download button here this just gives you a this gives you a a csv that you have that you can go through and see various different details about this so let's go ahead and uh see here it's still in progress let's take a look let me kind of refresh this and see because we want to make sure when we look at this it's 40 seconds we want to make sure of course that there's data in here when we connect to it so let's go ahead open up power bi desktop i'm going to go through the same method in connecting to this and we'll see exactly what's going on i'm just looking at the comments real quick not really certain a nickel i'm not certain exactly why that link wouldn't be working it's just a public one and that's a relatively correct statement there anthony there's some nuances to it but for the most part that is correct uh let me go ahead like we did before right get data data flows uh let's see i'm hoping that it's been refreshed in the background but the nice thing is i can just refresh my connection here but we can go to same thing i'm going to go to the same workspace we're going to see there's my fabricam customer comments and we can go ahead and connect to this so there's customer comments let's take a look let's see if it's refreshed in that small moment it did so good timing if it wasn't that's okay i could still technically connect to this and i could have just hit the refresh button in power bi once it was done um and if we wanted once again we could potentially transform this right i'm importing this data in here if i wanted to make additional changes to it i could we can see you know here's the information on the back end but let's just take a look let's kind of examine what this has actually come up with right if i go in just going to load this in real quick i'm just going to pop this into a table real quick just so we can examine what this has come up with now once again something that's important to note and understand in this scenario since we don't have really much control over you know we can't train the model ourselves but although it is extensive when you're working with machine learning it's all about predictions we're working in a realm of probability we're training machine learning algorithms to try to figure out and understand patterns so it can take data and predict results so you know it's not going to be 100 correct i mean we try to strive to get the best machine learning model we can so they can help us in the decision making process so let's take a look at how well this model works i'm just going to grab just a table and you know what i'm just going to keep it pretty simple who made the comments what is the comments and let's grab the score right i called it score let's go right here and let me stretch this out let me go ahead and increase the size here real quick i'm going to shrink this let me go ahead and fix the grid so you guys can see this a little bit better i'm going to kind of fix the word wrapping here in a moment so let's do that let me work let me shrink this up there we go so i don't have to do any scrolling right so if we look at this we can say wow i had no idea that reciprocating and they used some kind of jargon words in here to kind of throw the model off just to see but you can see i had no idea i paid twice they weren't nearly as smooth an operation field ground delivery was a plus so you can see like these are positive items and the score was a 0.99 so approaching one would be positive if we go down to there's a 0.17 i've had similar problems tracking down another source of thingamabobs or is it what should i call it i found that they uh but no not reliably maybe it's starting to get more orders than is would improve so overall that would veer to the negative and it is showing in that score and you can see this very bottom one very disappointed as a result you could go through this and look unhappy that's the trick this is being fed into a machine learning algorithm we can't really see what's going on in there to degree some would describe this as a black box operation here we know what's going in we can see what's coming out but we don't get to see the process that's kind of the one thing that's kind of missing here when we leverage the out of the box cognitive services here that they're using pre-trained models getting more advanced and getting into the getting nitty-gritty into machine learning you can create your own models and train them to deliver this and that would be supervised modeling supervised training and you could you're in control of it but right out of the box check it out looking at text telling if it's good or bad it's a cool little way to go about this and it didn't really take much now once again it is behind locked behind that premium paywall but it's there available to us let me sneak out of this i'm going to load up a new one again a new power bi desktop that right there was an example of using sentiment analysis now what about going into the realm of image tagging okay let me go ahead i kind of save some code here this might not copy that great into chat i'm going to do my best here let me actually put this in like a notepad see if it kind of goes a little bit better but i'm going to use a blank query option here okay so we're going to do another data flow the idea is this is going to use image tagging so we're going to provide it you technically can provide this data in binary formats you can have explicit images but this is going to use the cognitive service of image tagging so we're just going to use a url that points to two images right and if we wanted to let me actually bring up these images i'm going to put this in the chat like i said it's a it's a piece of code so it doesn't look that great in the resources that you'll be able to download after the class i'll make sure it's included in there let me get to the point where i can get to my chat there we go like i said it's code you don't if it looks too funky it's actually beyond the character limitation here guys so i would just say watch this but let me bring up to the screen what exactly these what images we're going to be looking at so we can get an idea so this is going to be image number one all right i'm just going to leave these up in the background and here is going to be image number two copy paste i just kind of found some images nothing great just you know something right so these are what we're going to feed to power bi and we're going to put it through a cv algorithm a computer vision algorithm to kind of look at the image look at objects persons themes backgrounds those type of things that's the idea behind this so let's see based off of what we're seeing right here how well this is going to look right so give me one moment there we go let's go ahead and start this up we're going to go and do the same thing we're going to go into our create a new data flow and we're going to go ahead and do the same thing as we did before new data flow we are going to use the define new tables which once again is kind of interesting because we start in an area we've talked about that this is technically available for all licensing but as soon as we take advantage of the ain sites that one is going to be specific to pro this time around we're actually going to leverage the option called blank query it's down at the very bottom of the options just blank query this is one of those methods i talked about early it's literally just asking you to provide the code so if you're less familiar with m and how to use this this is probably an area you might not find yourself too often but i did talk about hey if you did work in power bi you could technically copy the code from power bi and this is where you would paste it so for me i'm going to paste in this information it should be fine it's oddly formatted there it's going to attempt because we're connecting to a web url a public url saying you know how do you want to connect do you need to use anonymous authentication basic organizational in this case i should be able to use anonymous that should be perfectly fine it's asking it for both of them it's trying to combine two different urls we talked about this so i'm going to allow that and it's looking pretty unexciting it's just storing binary data in here it's the images but now we can take advantage of the ai services right the cognitive services here and do some image tagging so what we're going to do is go back to our ai insights right we'll let this load up it's going to access those default ones and we saw it actually already has the some machine learning in there but we're looking for the tag images option give it a second give it a second so let's find that guy tag images just like that it's already by default chosen the column option and it's picked the column that meets our criteria it's in binary formats um and all we need to do is hit apply it's going to go through feed this to the algorithm and it's going to return two columns for us the tags themselves and then it technically breaks down and gives us like the json representation of this it just is a little bit of extra code so we'll give that a second for it to right now we're literally feeding this this information to a a you know a trained model and we're waiting for it to process that and then it's going to return back the results in our case we're looking for tags from that cv algorithm what has it figured out so we'll give it a second here to do its thinking give that a moment give that a moment also we'll probably want to rename this query as well we'll call this uh image tagging and while i was doing that it looks like it finished in the background but we can see for our first binary image it came back with flower plant yellow colorful and colored i it didn't bring up a butterfly that would have been cool but i mean fitting pretty good our other one comes up with tree outdoor forest it actually got real specific here conifer nature wooded plant hill all this fun stuff and you can see over here it's getting a little more into the realm of machine learning on confidence levels and showing those different items that's what's held in this json what kind of confidence level it has for these respective tags it gave so if you want to keep that anything maybe that's lower than .53 maybe you want to go ahead and dump it this one says wood it's at 0.27 confidence but i mean there's wood in there so that works for us but this can be a item where maybe you decide to filter out records that are below 0.5 or something right you can eliminate some sort of image tagging that you find that isn't isn't sufficient enough is or maybe is a bit too ambiguous but right here from this example from this scenario this is it's there we can now take advantage of this c c v algorithms right here built into data flows that we potentially could then leverage inside of let's say power bi desktop wherever we want to use it remember if we've tied this in to our azure data lake storage technically this is now available for consumption for various different azure services so we really have a lot of options here to take advantage of these items now we're getting close to the end game getting close to the buzzer here guys i'm going to try this next one is a pretty loaded topic i was uncertain if we were going to go into this let me actually give the name of this image tagging let's go ahead and we'll call it uh ai insights this is just the name of my data flow image tagging but this next one we're going to look at is the automl capabilities now we are not going to be able to create one i'm going to walk through the beginning stages of creating something from scratch but we're actually not going to go ahead and submit this and basically train the model as the process what it's called and you'll see why but we'll start the process so you can see what this is all about just because there are when you properly want to do this and set this up you want to give the model enough time to be trained as much as possible so you're going to see we're literally going to have a slider where you can decide on how much time it can take the longer you allow a model to train the more variations the more what's called featurizations can be done against the data set to maximizing trying to get the best model possible so what we'll do is we'll walk through what this first little basically it's a wizard which is pretty awesome what it gives to us and at the very end i'll end up canceling it because i can show you the output right here we can see that i actually have this online shopper's intent this actually is our trained model and you'll see there's a report in here all this stuff but let's look at the beginning pieces of this like when it comes to machine learning there's a lot guys and like i said this was a topic i was uncertain if i wanted to bring up because there's a fair amount of conversation that can be had here machine learning and data science it is its own realm of possibilities and it can get crazy when you talk about the the the science behind it the mathematics behind it it's deep guys but this automl capability really makes it cool to take advantage of this and we can literally use this and feed data into our model and it brings back results it's kind of interesting so let me start this so you can see what it's all about right definitely something i recommend if you want to go further there's some cool examples online but if we go into new as always right we're going to stick with the basics here just hit new we can dive in and go through this process of creating a new data flow and i said i won't hit submit at the very end and you'll see why but we're gonna basically the idea here is if i know i'm creating the purpose of this is i'm creating i want to create an auto ml experiments you basically are creating a data flow that is going to supply the data that will be used for this automl experiment this is called training data so the tables that we define and we establish here that's what's going to be used for what's called training and testing data in our scenario we're going to use a data set that's just talking about like shoppers intention it's just some data that was collected about their traffic on a website how well what was there and you're going to see there's various different um areas here and powerband is going to do all this work to figure out which columns make sense which are going to be helpful in predicting this and it's like click view rates um bounce rates various different terminologies used from like things like google analytics all right so for us we're going to go ahead we got our new item i'm gonna go ahead and choose text csv let me go ahead and find my chat real quick i'm gonna put this link in here but this one i would recommend sitting back and watching or waiting until the recording's up so you can take your time with this one because ml gets deep guys i'm going to go ahead and put this chopper's intent in i said this will probably bring us right up to the wire you can see various different data right just a bunch of different items you know some of these are internal metrics that they figured out but you can see how much you know how much time did they spend on product related pages what's the bounce rates the exit rates page values these are all things that you can be found from like google analytics so the only thing that i'm going to do here for transforming this data is this final column over here which is called revenue it's true or false that's all that's here this represents whether someone based off of all of this activity did they generate any revenue basically did they buy something based off of all this time so i'm just going to update this to true or false that's the category that i'm going to put here i'm also going to update the name of this right this is a pretty poor name let's call this a online visitors or something of that nature online visitors and pretty much that's all we need to do the fact that we've updated and changed it to true or false this is technically a data set that's set and ready to go and i can simply just hit save and close now this column revenue we're going to see that's known as a target column it's called the label column that is the column that we want to predict that's the idea behind it based off of all of these details i can feed that to this machine learning model and it'll tell me based off of this they should buy or they should won't buy let me go ahead and call this my online shoppers intent demo since i already think i have one that's called that and all we've done at this point is just brought in a table from a csv and we're loading the data into it so we haven't done anything in the machine learning realm but this is where we get introduced to how we can create and train a machine learning model this is where we can say hey take this data this information that we have inside of this online visitors example that i just created we want to use that in this model to predict this column i want you to use this data to train and test it basically the data set that we have here will be split into two options i believe by default it uses 80 of the data will be used to train the model very quick definition of that is we're just letting a machine learning algorithm look at all these results we look at all of the points and we let it see the answer based off of all of this they did have revenue it's true based off of this it's false so it goes through all of the columns and figures out patterns it uses statistical correlations various different methods for featurization to try to see what are the columns here that give us the most influence to decide yeah if someone's high in this category maybe they're low in something else or maybe this is a yes versus a no this column is a high predicted value so it takes the cumulative of all of that it's doing a ton of work that's why it's machine learning that's why it's ai putting it to work and then it's going to return a score so that's what happens in training and then we showcase it 20 of the data set which it doesn't know the answers to and it will bring back and predict the results and that's how we call what's scoring it how good did my model do based off of data has it seen how how well did it predict those results that's the goal so very quickly guys there's a lot in here but i just wanted to kind of show an uh ah feature so don't worry about following along check the recording out you can extend this and go further but online visitors i'm going to use this data to train and test the model and we can see right off the bat just saying okay we're looking at your table which is the column that you would like to predict and also if there was multiple tables that would show up we only have one table here use that table and the column that we want to know the outcome of is revenue so i go i select it i hit next it actually on its own will actually go ahead and do a preview here it analyzes the values in that outcome column and that we identified and it will suggest the type of machine learning model that will be created so it is chosen a binary prediction true or false that's all it is technically this is can be known as a bivariate classification it's true or false right binary two that's what it's representing if for some reason it chose this and you want a different model very easily you can simply click this link here it'll show you all the options of models that you want to choose from and then you can make the corresponding choice what we're saying is here choose a target column a target outcome column basically in revenue right we have true and false what is the outcome that you're most interested in and for us this is going to be true because that's what we want to our people you know spending time that's what we want to predict and we can use some friendly naming before to say what we're most interested so if it is true what that represents is that they made a purchase this is going to show up in the report after the fact and we can say if it was false which would be if it's a mismatch there's only one other option that means it was no purchase so just giving it some friendly labeling oddly enough it's not letting me really scroll down here but i can barely see the option of next so that's good that's a weird little ui thing there maybe i said this is acting a little bit odd usually i can scroll and i can see that so hopefully this continues to the next page what's happening here i think it is going forward is power bi is basically doing a preliminary scan of some sample data and it's basically going to suggest to us which columns we should use when training this model it's effectively using various different statistical methodologies to understand correlation between columns highly correlated columns and non-highly correlated columns it's doing this work for us and it will say we recommend using these columns and you'll see here it is it's saying administrative low correlation informative duration low correlation so it's saying i don't want to this is only and simply recommendations we can decide you know what i understand this data maybe in the sample that it looked like looked at it came up with this but i know this should have a higher level of correlation i want to include this so if you know and have an understanding of this you could make some adjustments here and change which data is going to be used in this process so you can make that decision now in the final step and this is why i'm not going to just start and running it we give our model a name so you could call this whatever you want purchase intent prediction something like that right you could just give it a name give it a description and notice down below training time the longer you train your model the more accurate the results train for short time if you just want to make sure you selected the right data so if you're looking for something more accurate you want to do this the lowest amount of time is five minutes and as you can see we can go all the way to 360. and it tells us what's going to happen next it's going to take the data that's in that table split it 80 is going to be sent to train it so it gives it the answer it sees the values in the revenue column and it's going to say okay let's figure out the patterns and then the 20 is where it doesn't see revenue it doesn't know if it's true or false but it gives us the predicted values and we can see how close it came and the perfect part of this is it does provide us statistical metrics that are used for understanding the predictive power of models but it actually provides us a pretty cool power bi report that you can just examine it on your own pretty interesting so this juncture if you've done this if you're following along if you know you went there i said this was definitely more of a follow and watch scenario you can go ahead and hit save and train if you'd like obviously i'm not going to sit here and have you go wait 120 minutes even just setting it to five minutes that's just literally we can't do anything until it's done so at this point i'm actually going to hit cancel because back in my data flow i have gone ahead and we can see i have online shoppers intent and this is a trained model in here literally hit save and train it creates these two tables the training data and the testing data remember that 80 20 split it used that and then it creates these outputs when you wanna when you look at the model which is right here notice this has been here the whole time been hiding just haven't brought your attention to it it is right here and you can tell see it's been trained when was the last time it was trained if you want to retrain the model if you want to maybe your maybe there's more data right that online customers table that we loaded that can be refreshed right you can refresh it maybe new data gets put in there so you want to retrain the model maybe there's new patterns within the data that need to be recognized times change users will have different behaviors and patterns so machine learning is an ongoing revolving process you need to keep retraining those models to make sure it's giving you the best potential option there but we can see that there's a nice fun report that's also available to me so this was after i went ahead and i let it train i applied it and it brought me this information so i could look at the results of this and not to go too deep but this is a classification model we saw that was right it was a binary prediction of binary classification one of the main methods of metrics to examine the predictive power of this model is called the auc or the area under the curve and you can see it right here the closer this is a measurement that goes from zero to one the closer we approach to one is means higher predictive powers so this was a 0.94 so this was a very effective predictive model and there's so much you can examine here look into it this does go more into the realm of data science so it would require a significant amount more appreciation understanding of what we have to do here but it provides a ton of stuff you want to know what are the top predictor columns they have a little drop down here that you can look apparently page values was one of the features columns that had the biggest impact right page values is something that we indicated where if they were in this page the value of this page correlating to purchasing so if it was like the purchase screen or the detail screen of the product we assigned values to it so naturally those are going to have impacts right bounce rates that's obviously going to have an impact whether they're purchasing or not right how much are they going in and just leaving that's obviously going to have an impact so all of this is right here it's pretty cool that we can see this it's a little uh it's a little uh you know the orient the the size of here is a little crazy but you've got tons of cool little options you can look into you can check out the accuracy report the training details all of it's in here right this is what it's all about with the machine learning side it's a nice wizard that walks you through end to end and now that it's there when you're in the ai insights you can literally just use it and say you know what i've got a machine learning model here's this data i'm going to pass it through to it give me back those predictive results give me back if it's going to be true or false and the fun part is just in case for those who might be familiar to a degree in the machine learning world of course it doesn't just do classification models it can also do regression models which means predicting you know like whole number value like values numeric values so it's a lot so like i said i know that's coming at the end it's more advanced it comes into a new area of understanding it's machine learning data science it's awesome we do have some great classes on it honor and on-demand learning platform so and don't forget right that's a big thing that's why we're here to kind of showcase that is that right now you can get access to it if you're like looking into ai and you're looking you're interested in that machine learning services definitely something that you should consider and look at check that out we've got a great discount that's going to be running from now until the 15th until midnight don't also forget that we as well have coming up on august 12th uh brian knight is going to be with us going over powerapp portals also something to get signed into so we are getting down to the wire guys we're going to be uh getting to the end like i said if you had questions in there i greatly appreciate that we're going to take the opportunity we're going to be diving into the process uh scrutinizing it like i said i'll look to see if we're going to go with more of a video presentation maybe a blog or something of that nature but the main thing i need to say is three hours i know there's people from all around the world all different time zones is thank you right thank you for being with us throughout this duration throughout this entire time i hope we can see you in more learn with the nerds and also i hope i see you inside the on-demand learning platform asking me questions about the video that you just watched that's giving you more insights on that machine learning or data science or if you're looking more into the other various power bi topics because we're there waiting for you and we have so many other cool things available to the community so check it out on-demand learning virtual mentoring the works but i had a great time i hope you did too and like i say in my videos i'll see you in the next one
Info
Channel: Pragmatic Works
Views: 78,617
Rating: undefined out of 5
Keywords: power bi, bi, power bi dataflows, microsoft, learn with the nerds, pragmatic works, Manuel Quintana, etl, Azure Data Lake Gen2, bi dataflows, Common Data Model, complete power bi course, Data Model, dataflows, dataflows for beginners, how to create dataflows in power bi, intro to dataflows, intro to power bi dataflows, Power bi basics, power bi data flows, power bi dataflows api, power bi dataflows data lake, power bi dataflows databricks
Id: GuU6_cqu8FE
Channel Id: undefined
Length: 189min 10sec (11350 seconds)
Published: Thu Jun 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.