Microsoft Fabric Data Engineering [Full Course]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hackathons can effectively make learning relatable for your team by providing an interactive and engaging learning experience the process starts with a planning session where we identify a small to mediumsized business need for a prototype following a one-day training session we collaborate with your team in a two to three- day build session where everyone is actively involved and by the end of your hackathon your team has developed new skills and you will have a working prototype to show for it Hello friends and welcome my name is Austin liel and we are in the learn with the Nerds Studio where we are going to be discussing data Engineering in Microsoft fabric today I am very excited to be talking to you and to spend the next hour and a half going through a demonstration of working inside of Microsoft Fabric and working inside of the data engineering Persona so that we can see some of the cap abilities that Microsoft is introducing inside of the powerbi service that is going to allow data Engineers data scientists data analysts whatever data Persona you might be to collaboratively work together in this environment so hopefully you are excited for this as well now before we go through and get started I want to give you an introduction to some of the topics we're going to be discussing today we won't spend too much time on that though I want to get to as many demonstrations as possible and also be able to provide some good feedback with some question and answer type uh conversations that you can have with me in the chat today now what I do want to say at the very beginning of this is this will be a recorded session of this live YouTube so if you have to step out during any time for the next hour and a half feel free to come back later on and you will be able to watch the entire event over and over and over again so don't feel like you're missing out or if you can't follow along today for any reason with what I am doing you can come back and watch again later at any time now let me give you a little introduction to me Austin liel your resident learn with the Nerds today uh I will go be going through just a quick PowerPoint demonstration of talking about some of the topics we're going through but I want to get you to let me know who I am a little bit more as well and why I'm here my name is Austin again I am a trainer for pragmatic works and pragmatic works is a training company that specializes in Microsoft analytical products so we do training on various products such as Azure such as powerbi the Power Platform meaning things like power apps and power automates since you're here on our Channel maybe you've seen some of our content before we do lots of YouTube content we have other kind of live training events we have many discussions we'll have throughout the day of how we can uh interact with the pragmatic Works team now I'm a trainer for pragmatic works on what we call our Azure dat engineering team so I've come from a background of working in tools like synapse analytics data Factor I work with powerbi quite extensively a lot too but I can also if you want to reach out or connect with me or find more information about me on the web either reach out to me via email at my email address here which I'll share in the chat a little bit later on also or find me on LinkedIn at my LinkedIn uh status right there now one fun fact about me is that I absolutely love to watch movies I am a movie snob a copile if you will so I go to the theaters all the time and watch the latest greatest movies love to go through watch movies at home and uh in the theaters as well I've also gone through and I've learned some of the topics of what we're going to be discussing today when it comes to working with data lakes and uh virtual networks and things like monitoring storage by taking one of the Microsoft certifications called the Azure Solutions architect which is a certification I earned by going through and taking one of those tests I'll be talking a little bit about some of those tests a little bit later on through our class as well well enough about me this is not the learn with the nerd session for Austin liel this is about data Engineering in Microsoft fabric now one of the questions you might be having at the very beginning of all this is Austin what even is Microsoft fabric I've heard of it before but I don't really have a great understanding well let me let you know that we have already done a couple learn with the nerd sessions and maybe you've seen those before and if you have welcome back but if not want to do a brief overview of why Microsoft fabric exists and why it is the newest latest greatest and hottest topic coming out of Microsoft in 2023 so Microsoft fabric is an all-in-one analytical solution for Enterprises that covers everything from data movement to data science real time analytics and business intelligence it offers a comprehensive Suite of different services including data Lake storage data engineering uh with spark capabilities and data integration all inside of one place now with fabric you don't have to go through and piece together various services from multiple vendors you can integrate all of these different services in endtoend analytical solution with an easyto usee product that gives uh anyone who is familiar with the powerbi service the ability to go in and work with some of these new technologies or something you've done potentially in other sources but all integrated together now in one environment so Microsoft fabric offers this comprehensive Suite of different topics like working with data warehouses or lake houses which we'll be discussing as we go throughout the day working with business intelligence and using power powerbi reports and models to be able to go through and visualize and better understand our data and also being able to store the data directly in what we call the one Lake which we'll talk about a little bit more as well now one of the other questions you might have is Austin what is data engineering I've heard of it before I work in data but I'm no near like an engineer I think like uh going through and working on like buildings with engineering but data engineering is a little bit different it's like this behindth scenes Magic that makes all of the data work together it's the process of Designing of building and ultimately managing the architecture of a Data System including the structure and the organization I told you I like movies and if maybe you saw my preview for this so I talked about this there as well but let's make a movie analogy to data engineering so picture it like a film director someone who is responsible for going through and getting a film and get orchestrating all these different people production companies uh actor uh sound editing all these different kind of things they're wanting to go through and carefully plan each scene select the right actors design the overall structure of the movie that's a little bit like data engineering orchestrating this behindth scenes work to ensure that data is well prepared and is starring role in the analysis process now this is going to involve different tasks like collecting storing and also processing data so that it's easily accessible usable Downstream for consumption in things like a powerbi report so data Engineers will set up the infrastructure that will create data pipelines which we're going to be going through as well they're going to be going through and having uh some sort of data transformation as well so using different tools for transformation and ultimately ensuring that data is clean organized and ready for analysis for business intelligence analyst so data integration workload inside of fabric is going to provide data Engineers with an ability to go through and unify by hybrid or multicloud Estates in an experience that combines the ease and use of power query editor in the powerbi desktop tool with the scale and the power of data Factory to be able to integrate and orchestrate all of your data movement whether you're connecting on premises in the cloud or to third-party sources so inside of the data engineering Persona as a part of the Microsoft fabric environment you're going to have a couple different technologies that you can work with one of those that we're going to be focusing on heavily for this demonstration in a little bit is going to be data pipelines and working with data Factory to be able to build pipelines so that we can orchestrate data and have that run on a schedule now you can also go through and use tools like a data flow Gen 2 which is essentially power query editor inside of the powerbi service so that anyone can go through and make transformations to their data as long as you have that famility of working in powerbi desktop to go through and clean and modify and transform your data so that it looks in the fashion that you are wanting to for analysis now we're also going to be using different Technologies like a Lakehouse we'll talk about a Lakehouse in just a moment a Lakehouse is where we're going to store and access our data inside a fabric there are a couple different offerings for where we can access our data but ultimately it's all going to be stored in the Lakehouse as the kind of orchestrating and the main focus of where you can go through and work inside of fabric environments now there's other Technologies called like spark and we're talking about spark towards the end of our course today and using some different languages like Python and SQL to be able to access data across millions of records and do so very efficiently and very quickly now there is this conversation that goes along with fabric as well as I keep hear people talk about the Lakehouse what is the Lakehouse well the Lakehouse is going to be this combination of the data warehouse which is a centralized repository where you store data in structured forms from various sources spefically for the purposes of business intelligence and Reporting along with a data Lake a data lake is another repository allowing organizations to store different types of data whether it be structured so you can go through and store like tables semi-structured with CSV files parket files or unstructured data as well so this architecture allows you to store a large amount of data for a very inexpensive Bill compared to some other sources so data Lakes are becoming very popular in dat organizations because they can go through and store all their data without necessarily having to rack up a very large Azure bill now when we go through and talk about the Lakehouse essentially we're going to be storing our data on a data Lake but having a structure of your traditional data warehouse where you can go through and write SQL operations against that and have a confirmed assd property meaning that you have an ability to go in and make sure your transactions and your data is going to be reliable now the data Lake and the one L which we're going to talk about in a moment here as we go through is going to be something like a one drive for your organizational data you can think as this Lake as your organization goes through and wants to access and store their data in a single location so that all of the different jobs that you are performing can operate against a single source of truth instead of all these different versions of the truth that you could potentially have when you start storing your data and trying to access it across the organization for many different people in many different places so using the data lake or the Microsoft fabric one Lake which allows you to go through and store one copy of your organizational data inside of a storage account which you can think of like your one drive but it's for Access inside of the powerbi service all right now before we go through and start getting too far away I have an exciting topic I want to bring up for a moment here I mentioned earlier that I have that Microsoft Azure Solutions architect certification anyone in here put in the chat you got a certification let me know if you have any certifications whether it's the powerbi data analyst or some of the other ones tell me what kind of certifications you have I'm interested now pragmatic works is always coming up with nice cool new ways to be able to go through and make learning enjoyable make The Learning Experience nice so what we've done recently is we've gone through and we come up with an awesome offering to be able to go through and study for these Microsoft certification tests and you're going to be able to give you a sneak prak today into the brand new offering for pragmatic works it's going to change the way that you prepare for Microsoft certifications let's go ahead and let's see that video now the pragmatic Works team is excited to introduce cert XP cert XP is not just a learning platform it's a New Horizon in technical exam preparation experience learning like never before with elements of gameplay that make studying not just effective but incredible L engaging with our pre-loaded Journeys you can easily navigate through the vast array of certification options and choose the ones that align with your career goals CT XP will be in beta to our season learning pass subscribers in December we hope to give access to all subscribers in the New Year stay ahead of the curve with C xp's exceptional training programs join the wait list to be notified when cert XP is [Music] available [Music] wow that is awesome I am so excited for that and excited for you to go through and ultimately experience this so definitely go through and look at signing up I will show you how to do and get on the wait list for that in uh a few minutes as we go throughout our session today but before we get going too far I think it's time to go ahead and start working through some of the demonstrations that we're going to use today to be able to display data engineering inside of Microsoft fabric so if if you would like to follow along with me which is not a requirement but will be an availability should you wish to go and do that you would make sure you downloaded those class files there's something you're going to need in there a little bit later on and then you can go through and join me inside of the powerbi service where we're going to be going through and working through some examples of data Engineering in fabric together now here is my powerbi service and you can go to either app. powerbi.com or just powerbi.com to be able to access this now if you're looking to follow along with me you are going to be able to have to go through and provision a fabric enabled workspace so if you look in then I don't really know if I have that ability or not I'll show you in a moment as you can go through and uh look and see where I'm going to find that option available to me uh and if you don't can and can't follow along right now again this will be recorded so you have an ability to go back and watch it later on when you do have that provision for yourself so inside of the powerbi service there has been a big facelift over the last six to nine months as we've gone through and seen things like integrating over here on the left hand side our navigational Pane and some of the DU different offerings that we see there now down here in the bottom of my screen you will see an option called powerbi and if I click on this this allows me to go through and experience the Persona switcher inside of the powerbi service where I can go through and see all the different personas that Microsoft fabric enables for organizations to go in and act as this Persona now just because you work in powerbi does not mean you can't NE necessarily go and access the data Factory persona it's really enabled for everyone so that you can go through and learn something new and be able to help your data team helped now where I'm going to mainly be working inside of today is this data engineering Persona so by switching to that I have an ability now to go through and look at all the different data engineering items that I can potentially create for myself as a part of this fabric uh environment now what I want to do first before I do any of that is go over and create a fabric enabled worksspace so I'm going to click over here on my workspace tab on the Le hand side of my screen and I'm going to go down to the bottom of all of my available workspaces and create a new workspace now this is going to open up a fly out on the right hand side of my pane where I need to go and give this workspace a name so I'm going to call this something very simple like learn with the Nerds L WTN and I'm going to call this my data engineering that's the session we're going through today if you want to know more about what does powerbi mean for the uh what does fabric mean for the powerbi data analyst check out Manuel canan's video from just last month where he talks about the powerbi developer of the future or if you want to see another example of this check out my other session from about six months ago where we went through and did an into toin solution inside a fabric as well but I'll call this when my learn with the Nerds data engineering workspace if I expand my Advanced options you might be able to see some of the different license modes that are available to you for types of different workspaces you can create the the one that we want to make sure that we have available to us is this TR trial license mode there so I want to make sure that is enabled if you don't see that though you can also go through and potentially enable your free trial inside of your Microsoft environment but again that might be restricted based on your organization now I'm going to go through again I EX on that so I'm going to call it my learn with the Nerds data engineering I'm going to make sure I have my license mode of trial enabled for myself and I'm going to go ahead and create my fabric enabled workspace now you'll know this is a fabric enabled workspace because it has the nice little diamond symbol right there we're we're shining like a diamond here inside a fabric today now what I want to do is I want to go through and provision a Lakehouse now again there are a couple different options for how you can go through and store and access your data inside a fabric ultimately it's all going to be stored on the one Lake the data Lake that's managed by Microsoft in your fabric environment but we're going to be going with the Lakehouse for ourselves now so what I want to do is because I'm in the data engineering Persona here at the bottom of my screen when I click on the new Option it gives me a drop- down list of different related topics to data engineering so things like a data pipeline a data flow an environment an experiment if I wanted to go through and do some sort of data science but about halfway down we see this option for a Lakehouse that's enabled for us so by going and selecting a new Lakehouse it's going to allow me to give this an option and the data set that we're going to be working with today is a Microsoft provided sample data set called the worldwide importers data so I'm going to go through and call this my worldwide importers Lakehouse it kind of goes along with what we're doing right this is our organizational Lakehouse this is where all of my tables are going to be stored that my data analyst can go through an access in powerbi or with SQL or with spark notebooks so now when we go through and click on the create new Lakehouse option we're going to be provisioning this Lakehouse and this will take just a few moments for us to kind of go through and have access to everything do we have any questions over in the chat that might be uh helpful to answer at this time let's see if we got any questions uh don't see Power be icon yeah some of the limitations will be uh removed again depending on what you can do inside of your fabric environment uh you can go through and see things uh or not is one Lake just a Microsoft name for their Lakehouse it's really their name for their managed data lake so inside of your Microsoft fabric environment you are provisioned with a storage account which is going to have its own separate billing apart from your Microsoft licensing so if you want to go through and you want to store data in the data lake house you can do that and ultimately all of that data will be stored in the Microsoft One link which is again just a storage account just a place where you can go through and access data okay we got a nice little error message there let me see if I can go through and refresh my screen it might have already been provisioned and just didn't give me nice option nothing here yet okay let's go through and try and do one other Lake housee name maybe it's a a name that's already taken by another one inside of my environment so I'll call this worldwide importers all with my three-digit initial at the end there hey that one's taken so what we saw there probably wasn't a reason uh the reason behind that was I already have a lake housee somewhere else called worldwide importers and it didn't allow me to overwrite that we're essentially creating a container inside of my storage account or a folder inside of my storage account and just like your four folders on your file uh Explorer in your on premises file explorer you can't have multiple folders with the same name or multiple items with the same name so probably what I was experiencing there apologies for that now we're inside of my Lakehouse though I was able to create that we're able to we're good to go now no uh no major delays there what I want to do next is be able to go through and access data now as an organization you probably already have data that exists somewhere inside of either the cloud or on premises now you could say okay well I need to move all of that data now over inside of the one Lake Austin that's what you're telling me no you do not so what Microsoft has done is be able to integrate your other door data storage solutions inside of your fabric Lakehouse with something known as a shortcut and shortcuts are going to be a data engineer's best friend inside of Microsoft fabric shortcuts are going to allow in Microsoft one L A unify your data across domains across clouds and accounts by creating a single virtual data Lake meaning it's not technically one data Lake but it acts like one for your entire Enterprise so experience in analytical engines can directly connect to existing data sources whether it's in Azure whether it's in Amazon whether it's in onelake or we introducing some new ones over time that are going to be coming in the next few months so if you want to follow along with this where you can go through to add in a shortcut is going to be inside of your Lakehouse Explorer if you go to your list of files here and click the Ellipsis you can create a new shortcut so you can go through and access files that exist in another data link now the reason I wanted to make sure that you went through and downloaded those class files is inside of that is going to be a text document for how you can go through and gain access to a publicly available pragmatic Works Data Lake where I have some sample data that you can work with to follow along with this demonstration so go ahead and select the new shortcut option currently the sources of data you can connect with are a little bit limited but they're going to be expanding the reach as we continue to work and continue to go through fabric so where we're going to be going through and pointing to is an Azure data Lake storage Gen 2 this is going to be a data Lake that would be provisioned inside of azure that your organization would own and that you could be given permission to based on your role within the company so I'm going to go through and select the Azure data Lake storage connector it's going to bring up this menu that's going to ask for some information who are you how do you know that you can connect to this right security is still important as we go through and start working inside of fabric so what I'm going to paste in here and you should find in that text document is a URL my URL my endpoint to the place uh my location of where the data exists inside of one of those Microsoft data centers across the world so we're going to place the URL for that there PW ADLs fabric is the name of that data Lake and then we're going to go and see how we can authenticate now most of you do not work for pragmatic works so the method that we're going to be using for authentication is not your organizational account it's going to be going through and using a SAS or a shared access signature and that SAS token should be available for you in the document as well and you can go through you can paste that in there it's a long dist string of like random numbers and letters and dates and stuff like that but ultimately by going going through and saying that you want to use that token you should be able to click next and then go through and make that connection to the pragmatic Works Data Lake now we've gone through if you got to that point and you're able to go and access this right now you see this kind of menu on my screen I'm seeing where the shortcut name and the sub path is there you you're in you're accessed you're going to go through and be able to follow along with this if you don't you might have to like disconnect from a VPN there might be some other security issues as we're going along we can work through that as we go maybe email me after the class and I can help you out if you want to follow along now what we're going to name this shortcut is my just external data Lake I know this could probably be given a little bit better name but just to kind of reinforce what we're doing this is a data lake that is external to my fabric environment but it's going to be integrated into the one Lake environment the Target location you cannot change for the subpath I would have you go through and put a forward slash and then use the word worldwide if you don't that's okay but we're going to be connecting directly to a folder inside of that data Lake that allows you to go through and access five or six or so different files so back forward slash worldwide and then go ahead and say you wish to create this now once you do that you should be able to see in your Explorer which is kind of like the file explorer kind of like uh your uh object Explorer inside of management Studio you're going to see this file this shortcut see the link here it's going to be able to click on that and then go through and see files that are coming from the pragmatic Works Data leg and now we can go through and start to work with these files and integrate them into other aspects of our organization that's great now what I want to do is kind of start focusing on something called a data pipeline this is my raw layer of data this is my data that I've got from my organization but we want to go and we want to actually store that in a Lakehouse so that my analyst Downstream have an ability to go and connect with that either in a powerbi data set or by using it in like a SQL database type of structure so where I'm going to go is back to my workspace over here I'm going to go and select the workspace that I have currently open learn with the Nerds data engineering now you will notice that there are a couple of items here that were created as a part of this Lakehouse there's going to be the lake housee itself this option right here and then a couple of like linked items and those are going to be what we are going to use to be able to allow for connection to my Lakehouse tables I create that are stored on the data Lake but are access T via SQL operations so we can go and actually use a SQL analytics endpoint or semantic model both of those are used heavily inside of our other learn with the Nerds on fabric so if you want to know more about them check them out now what I'm going to do inside of my workspace is again because I'm in the data engineering Persona I have an ability to go and choose a new data Pipeline and I want to go through and kind of give you an overview of what is the data pipeline experience what even is data Factory which is where a lot of this technology comes from so so I'm going to first click on new data pipeline but I want to give you some kind of background to what this is so before this goes too far let's talk a little bit more about data Factory so data pipelines are going to enable powerful workflow capabilities at Cloud scale so with data pipelines you can build complex workflows that can refresh your data flows inside of uh fabric it can go through and move petabyte siiz data you can go through and Define sophisticated control pipelines and you can ultim use them to build complex ETL and data Factory workflows that can perform different tasks at scale there's different control flow capabilities that are built into Data pipelines to allow you to build workflow logic that provides Loops conditionals so kind of a similar concept to if you've ever worked in SQL Server integration Services some of the stuff we could do with that now we're going to be focusing primarily on something called the copy data activity which is going to allow you to take data from a source and load it to a destination I have data in my data Lake I want to load it to my lake house even though it's integrated together there might be some things we want to go through and make sure our set up well or transformed or set up the with columns that we want or maybe don't want so this can be a tool we use to do that now with the data Factory pipeline there's going to be different connectors and there are many connectors probably about a hundred or so but there's going to be different operations you can do depending on this connector that you are working with for example I can go through and I can do a lot with my data warehouse including using like a lookup activity or a get metadata activity or authoring some sort of SQL script whether it be a hard-coded written out SQL script or one that is a part of a stored procedure because it's more of a SQL operation with data verse I don't really have that capability you can take data from data verse you can write data to data verse but you really can't query it with SQL like you can data warehouse at least easily inside of this tool as of right now maybe that'll change in the future so some pretty cool different data connectors we can work with now ultimately what we're going to walk through to begin with is this data Factory pipeline copy assistant which is going to provide you almost like a wizard like experience not like wizard likee cool although it is pretty cool but a wizard where you can go through and walk through the steps of authoring your own data pipeline without having to actually understand all of the different orchestration that's happening behind the scenes you're just going to say where you want to take your data from where you want to write it to so let's go back over inside a fabric and let's create our first data pipeline now this is going to be a pipeline I use to point to my fact table to begin with I want to go I want to grab data from my fact file essentially inside of the uh data Lake that I have that shortcut created to and then I want to go through and I want to move that into my Lakehouse so I'm just going to call this one to begin with backed sales it's going to be the name of my pipeline in this workspace and then I'm going to create this now whenever you go through and create a new pipeline it's going to bring you into the the data Factory pipeline experience and we'll dive into a little bit more conversation around this as we go along but before I like uh start getting to and the nitty-gritty with some of this I just want to walk you through that assistant the copy assistant kind of walking you through the wizard-like experience because there's a lot here it's kind of just a blank screen what am I supposed to do Austin let's walk through that step by step now over inside of the home ribbon you're going to see several different things so uh for as a part of this orchestration here we have our toolbar and then we have kind of like our pipeline canvas so this would be our canvas here and this would be our toolbar up here inside of the canvas you can go through and use different activities to paint this awesome data ETL portrait again think like a director you are pulling data from here you're pulling data from there you're joining it together you're transforming the data you are going to be the author of this and this is your canvas to paint a beautiful data picture so that's what we're going to be doing for ourselves as well I'm going to go over here inside of the home ribbon to the copy data and choose them this drop- down option the copy assistant so from the copy data choose the copy data assistant and that's going to bring up that again wizard-like experience to walk you through this process step by step so to begin with we need to choose a data source where are we getting our data from and how do we know that we can connect to it so there are many different connections that you can go through and make there's workspace ones that are going to be for your fabric workspace Azure connections database connections no SQL databases Lots the different ones there now the one that we're going to choose is going to be the Azure data Lake storage Gen 2 now if for some reason you can't find that very easily here you can always go through and just search Gen 2 in the search bar right here and that's going to bring up the only option for connecting to a data lake so go through and choose that that's our data source that's the place we want to go through and access our data we'll say next and then we need to go through and Define a connection now because we went through and created that shortcut earlier here you should see here a connection to that inside of this user interface we've essentially created something like a linked service if you're familiar with traditional data Factory but you're not going to have that same functionality with Link services and data sets in fabric it's all just going to be wrapped up in the connection so if I choose this connection here it's going to already be able to authenticate I can verify that by just clicking test connection and going through and making sure my connection is successful and then once I've connected I want to go through and pick what data am I actually going to work with there so I'll choose the next option here and then go through and actually pick a data source now you'll see based on how I have my data link set up I have three different folders here so the worldwide is what we really want to go through and access and then specifically the fact sale. Snappy PAR file par is just a data compressed file that's going to be able to be great on data Lakes because it's going to minimize your storage footprint when compared with something like a traditional comp a separated value file because it's colum or compressed instead of row compressed so when I choose the fa sale parquet file it can automatically detect that is a paret format and I can get a nice preview of what my data looks like now I can go through and say this is what I want this is awesome I'm connecting to that data what do I want to do I want to go and actually get the data destination now where am I going to write this data to I'm pulling it from my data Lake I want to put it in my lake house so I'm going to choose from my available options here here my Lakehouse connector so my Lakehouse workspace we're going to choose the worldwide importers Lakehouse so I'll just choose that and say next and then I have to decide if I want to either create a new Lakehouse from right here or connect with my existing Lakehouse and because I already have one that exists I'm going to choose the worldwide importers Lakehouse for myself and then that's all you have to do on this menu again walking through this pretty easy so far nothing crazy you're not writing any crazy code or anything like that to do this and then just say next now here I do have an option where do I want to actually store this data do I want to store this as a table or as a file I'm going to store this in my root folder called tables which is going to be where I can have all of my database data warehouse like tables inside of my Lakehouse I'm going to create this to a new table because I don't have any tables yet called just fact sale happy with that do do have an ability to come in here and change column mappings and data types and things like that potentially we're not going to worry about any of that for this one just wanted get the basics down so all we have to do for this is once we have the name there say next at the bottom one more time this is going to give us a final overview of what this is going to do it's going to take data from the data Lake and load it to the Lakehouse it's going to say from this connection here it's going to load it to this connection over here you do have an option here that says start data transfer immediately we want that enabled so we're going to go through and we're going to say save and run that's going to kick off this data pipeline that's going to go through execute against that file pull all the records from it on that external data Lake and move it into my fabric environment now now this is going to be the traditional kind of copy data experience that you're seeing now and we're going to walk through that in just a moment here but while this runs let's go and let's have a little conversation about some of the different offerings that pragmatic Works can give you if you want to go through and learn more about working inside of data Factory so I'm going to number one go over inside of a web browser and I'm going to go to pragmatic works.com and if I go to pragmatic works.com and if you do this right now you should see inside of this screen a popup window that is going to introduce that CT XP that's going to give you the ability to go and sign up for the trial for like the Beta Trial of this uh technology that we're introducing so if you go through and click learn more on that popup it's going to allow you to join that weight list right there make sure you join the weit list get in here as fast as possible if you haven't gone through and learned about uh or taking the powerbi data analyst certification we have courses on that already on the pragmatic Works platform but you can go through and gain help using that put your email in there join the wait list get on this so you can go through as quickly as possible and learn more as well as be able to get some nice certifications that can help you further your career so we're going to have some awesome things like gaming gamifying your Learning Journey it's going to make this fun it's going to make this an awesome experience we got Mr Percy here Percy is the star of the pragmatic Works platform he is going to be your little uh assistant as you're going through and working through this technology now we have that but I also want to introduce you to the plag pragmatic Works uh on demand learning management system so I'm going to go through here and sign in really quickly and if you've never seen this before Oh I'm in a Incognito browser let me choose a different browser here really quickly so I can go through and uh quickly log in so I'm going to go to pragmatic works again and sign in and I'm going to show you some of the different courses that we have available to you if you're interested and learning more now there are going to be some of these courses that are completely free to you and we have a nice free trial subscription you can go through and sign up for and it should be down in the chat right now so go ahead if you don't have a subscription already you can get free access to a lot of our different videos on this channel we also have some paid uh access videos as well so you can go through and decide what you want to do if you're interested in learning more about some of these Technologies like introduction to data Factory if you're like Austin that pipeline was pretty cool what else can you do with that take that introduction to data Factory course it's going to blow your mind it is so fun now the other classes we have are some of the available ones down here hey look at that good-looking guy right there that's going to be the Advanced Data Factory class for once you've gone through and learned more about it you can go through and see what else you can do with pipelines now this is specifically going to be in the data Factory that's not in fabric but all the topics that you learn are going to have like a onetoone scenario for working inside of Microsoft fabric as well you see dashboard in a day if you've never taken that that one before definitely check that out also it's a nice experience walkr of working inside of powerbi now if you sign up for one of those free trial subscriptions and come over here to our categories you can go to the free for Life category and see all the free courses that are available to you it's about 25 I guess uh looking at this so a lot of our different learn with the nerd sessions are on here as well as a lot of our inaday recorded sessions that we have available to you also so check that out go through sign up if you haven't already get on that Ser XP weight list because again it's going to be a game changer for learning or just taking certifications as well sometimes it's good to go through and learn about Azure hey Austin what are you talking about data lakes and virtual networks go take my a900 Ser XP course and you'll learn a lot more about that all right let's get back to the show now this pipeline is still running and it's going to run for a little bit here but I can tell it's running by going through and clicking in the background of my pipeline canvas and going through here and looking at the output of this I can see that this is still in progress right now and it's going to be for a few more minutes while that's running I want to walk you through a different experience of working with data Factory pipelines so I'm going to go back over to my workspace again and I want to go through and create another pipeline that's going to be using the more traditional method that users might experience if they worked in data Factory before I want to go over to my new Option and again select data Pipeline and then I'm going to give this a name now ultimately what I want to walk through here is something called the child parent child design pattern so I'm going to walk you through this if you don't want to follow along with this part you're just like I just want to watch for this one feel free or if you get behind at any point just go back and watch this a little bit later on and you can rewatch it as many times as you need to so I'm going to go in here and call this my child pipeline I'm going to create this object inside of my fabric environment which is a pipeline right it's going to be able to see the same thing as we saw before but instead of going through this time and using the copy data assistant I'm just going to add a copy data activity to my canvas now when I do this my experience is going going to be altered quite a little bit uh but we're going to have the activity that we're working on here my canvas is still right here but then you're going to go through and have to Define all of those settings not through that wizard-like experience but just by yourself based on what you want to do so for this one here I want to go through and I want to set up another copy data I'm going to take from my source here where's my source it's going to be right there I'm going to use that same connection the same connection you already have can be used over and over and over again and then we're going to have to go through and figure out where our data lives now what I'll point you to is this nice little browse icon right here if I click that that's going to open up a nice fly out graphical user interface place where I can go through and point to the files that I want to use so I'm going to go over here and choose my worldwide root folder and this time I'm going to go over to the dim City file so we did the fact sale before we're going to our dimensional table dimensional file that's going to have our city-based data and once I select that I'll say okay Now by going through here and choosing this this will automatically fill out my file path or the location of where my file lives inside of that external data link I will change my file format from the binary to the par just to make sure I have an ability to access this file efficiently and then what I can do with this is I can preview my data by clicking the preview data button it brings you a sample size of your data you can go through and verify that everything looks right for extracting that data from the data Lake that's our source just like we did before it's just a little bit different configuration for how you go through and do it for the destination here we're going to go over to the data store type we're going to use a workspace based data you can also use external meaning you can write data to again data verse or an address equal database or many different sources and destinations of data and then we're going to use the Lakehouse for the workspace data type and again that worldwide importers so the worldwide importers is going to give me an ability to go through connect with that same Lakehouse that I'm currently writing that fact sales file to and then I just need to go through and decide my table name so for this one I'm going to choose dim City that's going to be the kind of similar name of the file I'm extracting I'm G to use the same table name as well I am going to expand one Advanced option and choose the overwrite option here for a little bit later on it'll make sense when we do that so this is again the same experience as the copy assistant just using instead the traditional data Factory experience so let's go through ahead and run this one as well in my home ribbon I have an ability to go and choose the Run option directly here and that's going to go and actually execute this data pipeline to be able to take the data and write it to my Lakehouse when you run it it's going to ask you to save it and run it choose save and run this will run this will take not quite as much time do we have any other questions right now as we've been going along any other questions that might be good to talk about uh the cost of fabric is compared to synapse uh I'm going to give you the traditional it depends uh unfortunately so depending on your organization ation you're going to have to purchase some sort of fabric license that's going to be enabled across your organization that gives you a specified amount of compute depending on the selection you make that can vary and cost greatly depending on the size of your organization and what you want to do with your fabric workpace now when compared to synapse where we have like the dedicated SQL pool or something like serverless SQL on demand it can vary greatly so you'd have to probably get with someone on that but in general fabric is going to have a capacity license either a fcq or a p SK that you would need to purchase so if you do have premium per capacity that also gets you the ability to work with fabric enabled items all right this has gone through and run successfully copy data succeeded took about 23 seconds so not long at all for that one that other one's still running I think in the background let's go over to our Lakehouse and let's see if we can actually view this for a moment here so as we go through we'll see it's on identify right now this will ultimately kind of map over there in a second but we have this dim City and we have this F act sale table that's going to be stored in the lake house let's see if I refresh this if it'll uh go through and load there yeah so by clicking the refresh button up here it goes through and I can see my dim City table and my fa sale table as well if I expand my dim City table I will see all the columns all the data types inside of that so that was already written for us in the background using that data pipeline done everything for you you don't have to know SQL you don't have to know python you can just use this graphical user interface and you'd be able to do a lot of functionality in inside a fabric here's my data here's a nice view of it from here technically I could go through and query this data as well with tsql operations we're not going to do that for this one but if you want to know more about that again check out some of those other previous learn with the nerd sessions now we have our table here but that's one table right and I have like five tables what happens if you have a hundred tables right are you going to go through and you're going to create pipelines for every single table and file inside of that that's going to be hard to manage hard to monitor let's look at a way that we can go through and really dynamically use pipelines to create a metadata driven pipeline so that we can extract some information about our data and use it alongside of our pipeline orchestration so I'm going to go back over to my learn with the Nerds workspace one more time and create one more data pipeline now before I do that I will have to come back in here and edit my child pipeline as well before I can actually use it because I'm going to kind of use this like uh hierarchical pipeline structure as you want to think of it so let's actually first go over to the child pipeline let's edit a couple settings about that so what we're going to do with this is enable something called a parameter now parameters are not a new conversation inside of the powerbi and Microsoft ecosystem right you have parameters that you might use in paginated reports or potentially a parameter that you could use in like an interactive report as well but you have an ability to work with parameters on pipelines also so I'm going to go by clicking one time inside of my pipeline canvas inside of this gray area right here pull up my pipeline level settings and add in a parameter to the pipeline I'm going to create a parameter and I'm going to give it the name of file name so that's going to be the name of the parameter and then I can also give it a default value if I want to a value that's available by default I'll just call this one placeholder notice it is a string data type that's going to be perfect for what we're doing here now with this parameter what I'm going to do is go back inside of my copy data activity by clicking on it one time you'll know it is selected because it turns that nice shade of green fabric green fabric green here and then we're going to go through and we're going to map that parameter to my source and my destination inside of this copy data activity so I'm going to go back over to my source settings for the copy data activity and instead of using this hardcoded file path I want to go through and I want to click one single time inside of this box and I'm going to go and add some dynamic content to this so what this is going to allow you to do is point to that parameter and be able to accept different kinds of values maybe you get a name called dim City for your uh file name maybe it's dim s fact sales maybe it's dim product maybe it's Dem person Dem customer whatever it might be you want to go through and be able to point to different ones based on what this pipeline receives so what I'm going to use here is this file name parameter inside of the pipeline Expression Builder you might see this and be like Austin I I work with power automate a lot this looks a lot like the power automate language and it is very very similar there might be some differences here and there but has a lot of uh similar kind of use cases and functionalities that you can do for this so by choosing from the pipeline Expression Builder this default file name parameter here I'm just going to go ahead and say okay and that's going to allow for some mapping of a file name I'm going to receive a little bit later on now the other thing I need to do is go and also map on my destination what is the table name name that's going to be created called as well so for this one I'm going to choose the X option on my table name dim City and I'm going to go through and I'm going to click on the add Dynamic content by hovering over or clicking inside of this box as well does the same thing launches the pipeline Expression Builder now you might be thinking okay let's use file name that could work that could work the problem with that is if I choose just file name think of how my kind of file names are structured dim City uh. Snappy dop park that's not really what I want I don't want to say Snappy dop par for my table name right right that doesn't make sense so we need to get rid of Snappy dop par how we're going to do that is by using a dynamic expression here which I'll put in the chat as well if you just want to copy it out so I'll put that in the chat it should go in there but it's going to be using the replace function inside of this so I'm going to say at replace I'm going to be able to expect three values what do you want to replace what's the object that you're going to be replacing then what do you want to replace and then what do you want to replace at it with so what I'm going to choose here in this uh replace parentheses is the file name parameter I'm going to add in a comma after the file name parameter and say. snappy do parque to get rid of that I'm going to place one more comma after that because this function needs to replace it with something and I'm just going to put my single ticks in there not a space not anything literally I just want to replace Snappy parque with nothing so it's going to allow me to do that inside of this functionality with the Expression Builder kind of some cool stuff we can do with that right once you say okay here everything is going to be good to go for moving forward and using the parent child design pattern but we got to go build our other parent pipeline first which is going to allow for metadata driven styles of ETL so one more time let's go back over into and I'll actually let's save this let's do a best practice let's save this pipeline by clicking your normal save icon there and once that's saved down we'll go go back over into the fabric workspace let's create that other pipeline so again one more time let's go through here and click the new button let's choose data Pipeline and then we're going to go through and we're going to give this one the name other one was called child pipeline you're guessing it parent pipeline so I'm just going to use parent pipeline as the pipeline name again just identifying these objects so that if I'm working collaboratively with others inside of this workspace they know what it is I know what it is everyone's happy right so I'm going to go through and say create once I do this it's going to create and give me that blank canvas again this time we're going to work with a few different activities not just the copy data copy data is of course the star of the show the star of our movie that we're putting on with this canvas here is the director right but we can use some other ones as well we can have some like character actors to kind of help us alongside of this now if you go over from your home ribbon to the activities ribbon you're going to see the entire list of activities that you can work with inside of data Factory pipelines inside of fabric now there's a lot of normal ones that are coming from data Factory experience that you might have seen before there's also some new ones like the Office 365 outlook look for a YouTube video that on may work walking through that one in just a few weeks that should be coming out on our Channel as well if you're not subscribed if you don't have notifications on what are you doing get them on right now now what I'm going to first build with this is the activities I'm going to go through here and choose the git metadata now this activity is an awesome one because it allows me to go through and return information about the names and the data types and all the different stuff about the files that are on my data link so by going and using a get metadata activity I can select that and you notice it doesn't say Source it doesn't say destination it's a different activity it has different purposes so it's not going to be going through and using copy data it's going to have something else as functionality by going over to the settings here I need to go in and I need to specify a connection so I'm going to go through and choose the connection here and then I'm going to choose need to choose that file path again now for this one we're not going to be going down all the way to a file name we're just going to be looking at a folder in here so go ahead and click the browse icon there and then choose the worldwide folder once you do that don't click any of the file names there just use the worldwide folder and say okay so by doing that it should populate just the file name right here in this first box that's going to be our container or like highest level of folders inside of our data Lake and then we're going to go through add in what we call a field list what are the different things I want to return about this folder well what I specifically want to talk about is the child items argument the child item argument is essentially saying what are the files that exist inside of this folder or maybe they're nested folders and folders but what are the items there I want to return them I want to use them as a part of my get metadata activity in the metadata Duran pipeline so by selecting that and then going back to my home ribbon I want to go through and I just want to test this out let's just see if this works let's see what we get out of this right so I'm going to CH ose the Run option here save and run go ahead and save and run this this one won't take but a few seconds here once it makes the connection and actually executes because it's not moving any data it's not going through and it's not uh taking records or doing anything it's just literally going through and gaining some information about our data on our data lake so you can see this is already succeeded if we have that output here you should see a succeeds option and then once we have succeeded what we can do with this is we can actually look at the output of this now again I mentioned power automate earlier if you have some functionality working in logic apps or power automate has a similar workflow we're going to have activities that give us things that we want to use Downstream in future activities so I can actually look at the output of my get metadata here and I can see there's that list of items hey I want to use those items to be able to work and do a copy data activity as well so you can go through and leverage them inside of this architecture now I'll pull this back up for a second here there's one thing I want to get rid of though you know what this fact sale I'll tell you it takes like 10 15 20 minutes sometimes to run I already have it going so I don't want to run that one right now I want to actually go through and filter out that activity so what I could do is go back over to my activities ribbon here and choose the filter activity now over here on the far side of this you may not see all these depending on how big your screen is but over on the far side of these different activities here there's a nice little dot dot dot or an ellipsis that we can go through and we can click to see some other activities that just don't show up on our pane there and we can use to pull in the filter activity this is going to allow us to filter the results from that child pipeline from the child items of that get metadata so I can go through and get rid of fact sale now a part of this that we need to also talk about is how we can go through and logically go from one pipeline activity to another it's going to be using this green little check mark that you might have already seen a couple times here but that's going to be creating something called a dependency or I like to call it sometimes a precedent constraint that allows me to pass values from one activity to another and also tell data Factory the way I want these activities to go through and actually uh run step one step two step three so what I'm going to do is use this green check mark by clicking one time on it with my mouse key I'm going to drag an arrow over to filter and just by simply touching that filter there it's going to make that dependency and make that connection from one to another so now only after the get metadata activity runs is the filter activity then going to go through and going to execute after that now the filter activity again has its own list of unique things that it can do so by going to the settings of the filter activity I can go and choose the items that I want to filter and the condition on which I want to filter them out let's talk talk about the items first so I'm going to go through here click on this box and this is going to be where you're G to have to start learning a little bit more about that pipeline Expression Builder so if you're like Austin I want to create this something similar in my environment don't really know how to simulate what you did here with what I'm doing again go check out some of our courses on our on demand learning platform because they have a long long reach of how you can go through and do this exact thing in intro to data Factory or Advanced Data Factory or intro to synapse analytics this is kind of Technology that's existed before it's just being integrated into fabric now so choose the dynamic content item here and we need to go through and pick which of these items we want to actually filter out now Microsoft did us a solid here and gave us a really easy way to be able to point and choose what is the item that we want to work with so by going three items down for me it shows me to get metadata one child items so I'm going to select the child items here and it's going to give me this uh little language for the activity get metadata one I want to take that output that output we looked at earlier with all that list of file names child items and I want to use that as my item that I want to filter so that gives us the ex entire code we need there to be able to point to that all you got to do for this one is say Okay that was easy not too hard at all so far right now the condition might be a little bit uh more kind of uh nitty-gritty on how you go through and get to this but I'll give you the code for it here I'm going to go over to my condition I'm going to click in this box and add Dynamic content for this as well now what we're going to be doing is we're going to be looking at the item that we want to filter out and then we're going to be determining what is the filter going to be so what I'm going to write here is I need to write out my expression for getting rid of the name fact so I'm going to use not at not it does not it does not contains contains is just a function that says it contains this value and then I'm going to put in another parentheses here and inside of the contains parentheses I'm going to point to this filter item now that's going to give me an item oh another parentheses so we have a lot of parentheses here right so what I'm going to do with this is say after item ending parentheses. name again I want to go through to the name property all of those names of the child items the fact sales the dim City I want to filter that and then I'm going to go through and say what do I want to get rid of I do not want this contain a comma after this and then the word fact if it contains the word fact it's not good to me I don't want to work with it and I'll put that code in the chat as well if you're looking to follow along so there we are there's our expression there's how we're going to get rid of fact so let's go ahead and say okay because oops I might have messed this up somewhere so let me close out my other parentheses there I think that might be it contains let's see if we do it one more time oops well uh luckily for me I messed this up somewhere things happen and I have another example let me go and paste this one in the chat now don't know quite where I missed that I probably missed a parentheses somewhere roast me in the chat feel free but I just put the right one in the chat for you there now so now we're gonna go through say Okay green means good red means bad just like normal so we got green items here we're happy we're good to go so what I want to do now is run this one more time I just want to test this again see what happens for this time when I want to go through and execute it I'm going to choose run save and run there is a run ribbon here we can go through and view different run histories and also be able to schedule this out if you want this to run on a weekly or daily or hourly occurrence potentially so we're going to go through we're going to use that get metadata activity we're then going to pass the values from the get metadata into the filter and then from that filter we can look at the output as well so filter one here the output of that I looked over six items but I only brought back five and what you will not see in this list of values here is is the fact information table or file so we've gone through we filtered this out pretty cool now we're still not done yet how do we actually go through and how do we get the data into the Lakehouse that's the whole point of this right we need to go and add in one technically two more activities but let's talk about each of them we're going to add in one from our activities ribbon called the for each activity so just as we did before we're going to have to go in and we're going to have to use from that output of the filter the items that we want to work on the actually what we want to do something with so I'm going to take from the filter I'm going to use that green check mark and say when this runs successfully pass those values into the four each now the four each isn't a loop a lot of people say oh it's a loop no it's not it's an iterator meaning it's going to receive five items in this case and it's going to iterate over each one and for one two three four five it's going to perform the same actions but there's going to be different uh values that are getting getting passed into that so it's going to be dim City this time it's going to be dim product all these different things are going to be passed into the for each and then we're going to have our worker pipeline that runs as well that child pipeline so for the for each activity there is something I need to do with this also I need to go in I'll kind of move this over just a little bit so it's maybe easier to read by clicking on the for each here I have to determine what is the item that I want to go through and I want to iterate over and this is going to be a little bit tricky but I'll talk you through it on this one so by opening that Dynamic content Expression Builder we're going to go and we're going to use the filter activity output now for this one we're not going to have a nice item we can point to and Microsoft like didn't give us this exact thing because technically this can point to a lot of things but we're going to type out dot value we want to go and from that filter output there is going to be a value array which means a list of items and we want to go go and we want to access that array so that we know what to go through and iterate over so the value here is what we want to go through and point to I will copy this out and put it in the chat also for everyone and then say okay that's going to be what we're working on the items that we want to go through and pass to our child pipeline now for the child pipeline how we're going to go through and determine how it works is we're going to have to go inside of the four each I told you I like movies my favorite movie all time is Interstellar but a movie you might seen before is inception we're going to go two layers deep into this pipeline dreams within dreams within dreams pipeline activity canvases within pipeline activity canvases within pipeline activity canvases now technically you can only go one layer deep with this here but by going to the for each activity I'm going to click on this pencil icon here and it's actually going to take me inside of the for each activity canvas where I can go through and have a nested set of activities that need to run and I need to determine what work needs to be done for each of those file names that get passed into the activity so what I'm going to do with this is I'm going to go through and I'm going to use the invoke pipeline activity this might be one again that you have to go through to the ellipses to be able to see but the invoke pipeline activity is going to be that one right there I'm going to go through and call upon my child pipeline from my parent pipeline to be able to go through iterate over all those files pass that into the child it's going to go for each of those different names it's going to take the file it's going to load it to my Lakehouse table this can happen against two against 200 different files to tables should you need it to now what we have to do for this one is only really one thing what is the pipeline we want to invoke we're going to go through and we're going to invoke the child pipeline this would give you a list of all the ones available but we have the child pipeline set up it has a parameterized value inside of it which we need to go through and determine for this so we have to go through and use what are we going to actually be passing in to that child pipeline so by clicking in this value box here we go we click on the dynamic content again we go and we choose the first option here for each item so it gives us that item parentheses and then we're just going to say here dot name so item parentheses open parentheses close parentheses. name that's the value that we're going to be feeding into this from that for each list of items that we're iterating we want the name to iterate over all right say okay here let's go back up to our main canvas let's get out of the Dream Within A Dream let's go to the main canvas here by selecting this option right there and let's go ahead do the best practice save this thing down make sure you save it and then let's run this thing and see if all of our hard work paid off so by going and choosing the Run option here this is going to go through it's going to get the metadata out of the data Lake it's going to filter the metadata and get rid of the fact table we don't want to get the fact table again and then it's going to go through pass all of those values into the for each activity and when that runs it's going to go through and for every single one of those five files GNA go through and invoke a pipeline for each one that's going to go and copy it to the Lakehouse this is awesome the functionality the ability to do this the ability to create metadata ger pipelines is something that we focus on in many of our different uh boot camps that pragmatic Works teaches because we want people to be able to dynamically point to this this is way easier to monitor once you set it up it's pretty much good to go it does all the work for you every single day so if you're interested in learning more about data Factory itself or fabric check out some of our boot camp offerings that we have available we got stuff filling up for fabric boot camps a couple months down the line already so if you're interested sign up right now now we've gone through we've run this everything succeeded here we're super happy that this ran let's go back over to our lake house and let's see if everything worked out fine so we have here it's going to do a re refresh you might have to again go in and click the refresh button up here back on the Lakehouse uh but over in The Lakehouse now all of our tables every single dimensional table all of our uh everything from the data Lake that we brought over into the lake housee is there so we've just taken all these data Lake files whether they CSV or park or whatever they are we've now integrated them into a lake housee that acts like a data warehouse there is a data warehouse in fabric it's a little bit different structure but there is a Lakehouse here that we can go through I could give this now Downstream to my powerbi analyst I could give this to SQL uh analyst to be able to go through and gain business intelligence insights into this data we can all connect to it together we have a single source of truth I love this functionality now uh we've done a lot here so let me see if there are any questions before we kind of start diving into some of our next options here yeah Phil hirs um one of the things I did earlier that might not have worked for you I should have said it again so I apologize on my child pipeline I did have to go through and make sure I enabled this Advanced option here to over right if that failed for you if you failed follow along and that dim City failed that's the reason why I like to give reasons why things fail failures and pipeline things not executing correctly can be very helpful sometimes because you can actually go through and understand why something happened or why this functionality is not allow so make sure for your table action of the child you have that overwrite set that was specifically because we were already we already had that I didn't want to have another filter that I filtered out could have done that technically but trying to make this as simple to follow along with as possible other questions that we have here let's see if we got any can tables created lak housee accessible by data bricks yeah so all of the different uh Lakehouse um files essentially or tables that you have there can be accessible in external tools I can go and actually log into management studio and I can go through and query my Lakehouse I can log into uh the powerbi report Builder tool and I can access my Lakehouse via that method as well if you go to your Lakehouse uh ellipses on the worldwide importers there and look at the settings what you'll find over inside of one of the settings is this SQL analytics endpoint this is your logical SQL endpoint that you can go through and leverage in external tools to be able to make connections so you can see mine there if you want to go and connect with it that's fine there's nothing here too private for us as we go along uh let's see any other questions I got some starred ones here I can't actually see them um you're a wizard H thank you so uh okay well I want to talk about one other thing we went through and we went and uh ran a pipeline from that copy assistant and I told you it's GNA take some time for that to run let's go actually see how long it took to run here so I'm going to go back over in my list of items I have currently opened to my fa sales Pipeline and let's see exactly how long this took to run 10 minutes and 36 seconds you're saying what that some of those ran in like no time at all what's happening why is this taking so long let me tell you why but let me show you a different method for how we can understand why there's a lot of stuff going on with this what we're going to use to be able to better understand what this file is is something called a spark notebook part of the data engineering again Persona inside of fabric assumes that you want to work with a large amount of data potentially or you want to be able to get your data integrated very quickly inside of your Lake House or wherever you're going to be sending it to so one of the abilities you have inside of fabric is to not only work with data Factory pipelines but to work with something like a data bricks type notebook inside of this environment as well whenever you provision a fabric workspace you get fabric compute part of that compute enables you to work with spark Apache spark now if you've never worked with spark before I'll give you a little overview of it it's essentially this open- Source distributed computing system you got all of these different cluster ERS of computers that are great for big data processing and analytics you want to go through you want to have this uh various programming languages you want to work with whether it's SQL or python or r or Scala you can use all of those in a spark notebook inside of fabric now data Factory pipelines you know great for graphical user interface great for introduction to some of this but they can work a little bit slowly in a nutshell Apache spark is more focused on the distributed computing these clusters of computers where they go through and they can work across vast amounts of data but do so very quickly so I'm going to go over to my data engineering workspace and I want to create a new uh notebook here so down about over halfway of this one create a new notebook for yourself in this workspace now we're going to go through and we're going to be able to connect to all of those different Lakehouse tables as well as all the different shortcut files by basically attaching a Lakehouse to this workspace or to this notebook excuse me so I want to go through and choose the add option here and I'm going to add in an existing lake house by clicking the add button now this is going to open up for me my one Lake data Hub and this is where really uh you start to see maybe the the benefits of working in fabric I can go through and I can look at data across my entire organization should I need to so I can go through and look at lak houses in other environments or warehouses in other environments uh than mine I can be given access to just go and connect with the data I may not be given access to the actual powerbi workspace but hey I need to work with that data and build a powerbi uh visualization or report or create some sort of dashboard ultimately you can just be given access to the endpoint essentially and then go through to your one Lake data hob and connect to it without ever actually interacting with the Lakehouse itself you just got to go through and build powerbi reports using that so I'm I'm going to go through to and choose my worldwide importers Lakehouse and add this to the notebook now this is going to provide me again with kind of like that object explore experience from SQL Server management Studio or I have a list of tables here and a list of files as well now by choosing my external data Lake files uh shortcut I can go through and see all of these different files here and again the one we extracted for this was the fa sale. Snappy dopar file so I want to go and just drag and drop this into my notebook user interface and it's actually going to use code language Pi spark python for spark to be able to create this connection for me you don't have to be a python expert now cond knowing python help of course absolutely I'm not a python expert but I can go through and use some of this functionality to use this technology and be able to go through and gain access to some of this awesome compute power so by going through and just simply dragging and dropping it here it's going to create something called a data frame a data frame is going to be something like a stored table it's not really a table it's like this a way we can store our data inside of a spark cluster that cluster of computers that's running and then we can go through and actually work and use it so by uh dragging and dropping that here it creates the connection for me it's going and using spark. read. par to go to my files folder go to my external data L folder kind of go to that file location and then go through and actually build a way that I can connect with it by doing that I can click the Run cell which is at first going to send a request to Microsoft saying hey Austin wants to go through and provision a spark cluster this will go through and turn on pretty quickly for us it will make that connection it will give us that uh ability to leverage spark and then it's going to actually pull back our data very very quickly for us as well now once we've gone through and made our connection we'll be able to maybe understand a little bit more about it this is going to be another integral part of really being a data engineer in fabricc learning to work with python potentially just using SQL as well I'll show you that you don't actually need to know python at all you can do SQL if you know that you're able to work with this same type of compute here now let's see if there's any other questions can we load the data back to Azure SQL database after doing some Transformations um sure absolutely if you wanted to uh you can do that uh 100% if you're looking to to orchestrate something like that you can take data from one location as long as you have an ability to connect with it from uh from this lake house from this notebook here you can actually go through and do that the notebook's taking a little bit long to fire up right now so we'll keep going on a couple other ones uh are the tables Delta tables great question here yeah some we didn't talk about kind of gets into a little nitty-gritty here but yeah this actual triangle icon here does mean that these are Delta tables so if I go through here and actually uh were to go to my Lakehouse I would actually be able to see the underlying files and that there is a Delta folder which is going to store all the version changes of my data as well as the data alongside of that um are I saw another question here can we Implement role level security on the lak house yes I talk about that in my fabric boot camp where I go through and show you how that is enabled now it's not as easy to do as you might think there's a couple of different things you would need to go through and kind of work with to be able to kind of intimate that but maybe I'll have a YouTube video on in a future would take way too long to explain today Microsoft does have some nice documentation for that though if you're interested in working with that let me see here do we I'm just going back in a couple more this has taken uh that session started a little bit slower today do we strictly need a Lakehouse Azure SQL database setup um no uh that that's not a requirement uh you can load data from an on premises database using things like uh data flows to connect to on- premises data there's still a little bit of uh functionality where you can't quite work with on premises data and uh data Factory right now but it should be coming hopefully in quarter one of 2024 according to Microsoft all right back over here let's go and let's look at this so this took a little bit longer to start out than I thought but once the Apache spark cluster started up which took about two minutes and nine seconds I was able to go through and connect with all of my data right there in about six seven seconds or so so this brought me back just a sample size of the data that I can go through and look at let me go through and scroll down just a little bit and I want to go and kind of hover somewhere over this it's about right here beneath this I want to add add in another code cell so a notebook is going to contain really three things it's going to either contain markdown cells for you to have kind of like commented out code or it's going to contain Live code cells or also potentially visualizations that you can go through an author by connecting to different libraries of code packages and things like um matplot lib or Seaborn or some of those different libraries that are very popular when working with spark now what I want to do now is I have a data frame that data frame exists in memory inside of my spark cluster so by going through and referencing whatever I call this data frame variable for which is just DF that was created for me by default I can say DF do count and then just put the open and close parentheses let's see how many records we were actually executing against so we go through here we run this and we see whoa that's a lot this is 50 million records and that's kind of why that pipeline took a little bit longer to run 50 million records takes a little bit of time to be able to go through and store from one location to another so what if we could potentially speed that process up what if we could go through and use this technology to be able to also go in and write the the same type of file from the data L into my Lakehouse but just see how fast spark could handle it that's what we're going to do so I want to go through add in another code cell I'm going to create something called a variable that's just going to be something that I get to determine it's going to be called my table name variable so how this works is you would go through and Define what you want the variable name to be table underscore name and then what do you want it to equal this variable that I'm going to use is e going to equal uh in in double quotes you can use single quotes as well single tick but I'm going to choose WWI worldwide importers sales just give it a slightly different name technically we're doing two copies of the same data but it's just a demo anyway right so we're going to go through we're going to choose table name WWI sales we're going to run that and then we're going to be able to use that variable now later Downstream in our notebook here so I'm going to add in one more code cell and by doing that I'm going to type in something this time if you want to type it fine if you want to just have me paste it in the chat here in a moment you can do that as well I'm gonna say for my data frame I want to write that data frame now this is the data that is coming directly from my shortcut that's where I created this data frame too I want to write this in the mode and I'll choose overwrite just to go through and make sure it doesn't have any issues I'm going to choose format here the format of Delta and that's what our nice contributor earlier in the chat was talking about what is Delta Delta is essentially the property that ensures that in a Lakehouse you get the ability to work with asset type transactions meaning that this is going to be reliable and you're not going to have a data swamp which means you have just data stored everywhere there's no governance there's no way to determine what's valid data and what's not data we call that the data swamp versus an actual data link so Delta helps us achieve that then I'm going to choose to save this I want to go through and save it to my tables back slash that's going to be in the tables of my Lakehouse and then I'm going to add alongside of that my table name variable so that's going to be what I use to be able to write this again from my uh my my data Lake directly into my lake house and then I'm going to kick this off and run this now it's not going to run in like five seconds but let's see just how much quicker this runs I'll tell you this back sales pipeline I've run it different times over the last couple weeks and for different events uh it's taken anywhere from 10 to 25 minutes sometimes to run and that's going to depend on how much compute you actually have available to you again depending on the type of license you select and when you're purchasing a fabric license you're going to get more compute available to you if you don't have a lot of compute it could take even longer if you have more compute obviously it's going to go quicker that's what you're really kind of purchasing when you purchase a fabric license um let's look at a couple more questions while this is running here uh to replace the process how about there's some data operations can be integrated with snowflake yes so you can go through and connect with snowflake inside of your pipelines potentially uh there's also going to be uh I think ultimately maybe some way to have a shortcut that goes to snowflake now that's maybe something that might come in the future hopefully I like to see I want to see shortcuts everywhere I want to make it very very easy to connect to my data and again unify data across different cloud architectures and integrate them into one environment that would be awesome to see um can we switch create switch using on Prem server or uh files yeah you can go through and connect with on Prem assist again there's some limitations to that so I don't quite understand the question uh can we create switch but there is an ability to go through and use either an all premises SQL database and if you're trying to do something similar to what I did with my pant pipeline create metadata driven instead of using like a get metadata activity you could po potentially use a lookup activity that allows you to go through and uh run some sort of SQL operation maybe against some sort of control table or against your system uh tables to be able to see what data you want to go through and extract and actually work with accessing Azure key VA currently is not available that is going to be one of the big things they're pushing for I think in quarter 1 2024 for Microsoft so I know for security reasons why that is so important for individuals so that we're not potentially exposing our secret information or access keys and things like that uh so that will be coming but currently is not available for us inside of fabric as your synapse migration from fabric is possible as well that is going to be something that you would probably need to use like a copy data activity or some sort of orchestration like this depending on your scale of your data if you have a 100 or 600 million or a billion records inside of your synapse dedicated SQL pool it could potentially take a long time to do that from a pipeline so using spark would work but you do have an ability to connect into to load this to a Lakehouse similar to what I've already done so far in our course today uh let's see if we have any from earlier any more questions here as we go along can you bring the data into spark canvas using SQL absolutely so one of the other things that uh ultimately we'll do here in a moment when this finishes running oh it just did finish running okay uh we I'll get back to that in just a moment here so where did it take about 10 minutes before 10 minutes to perform this operation this went through this time and ran it in three minutes so we've cut down our time to actually have this run by about uh onethird so we've gone through and really sped up our process now again imagine you're executing against even more than 50 million records now some individuals saying Austin I don't even to have 50 million records but we got people out here who are working with billions of records from time to time so this is really an optimal tool for working with these big data scenarios because it really speeds up development time instead of having a pipeline running for hours this can go through and potentially run it in minutes right so that's going to be the big use case for working with spark now someone asked about hey I don't really know python right this is all working with pie spark I don't understand this at all Austin you lost me here what if I told you you can also go through and use SQL operations in this as well let's actually go through and do something with another code cell here called DF so I'm going to take my data frame and I want to go through and do something called create or replace I was hoping intelligence would pop up here might in a moment create or replace Temp View I want to go through and create a temporary view that I can go through I can store my data frame inside of so I can access it through some of those other languages that are available inside of of fabric notebooks so I'm going to create this I'm going to call this my fact view this is going to be the name of the view that I want to do this is going to happen almost instantly we're not really moving data we're just having a different way to access that data and then what I can do is another code cell down is say I want to use sqls so I'm going to use something called a magic command percent percent SQL and for this code cell even though that this notebook was primarily written in Python py spark I can actually go through and just say I want to select star from fact and I want to then go through and run this and bring back results I want to use SQL to be able to determine what's happening with this there we go that easy to start using SQL you don't necessarily have to go and Learn Python although it will help you to be able to understand more and more of what you can do and really enhance your uh your abilities of what you can connect with so that's select Star right what if I wanted to do something else Austin what if I want to go through and perform some other some kind of aggregation well again you can do that as well here and the awesome thing about that if I use percent percent Sequel and say something like I want to select my city key and I want to do a sum of the unit price again this is just SQL and then I want to go through and say from fact and I want to group Bri because we're doing an aggregation and uh comparing that to my city key I want to look at this now this will also give us to return results for ourselves that have been aggregated that we can then go through and study inside of our spark environment the awesome thing about this again and maybe this one doesn't lead itself to this exact example great but you can come in here and look at this as a table or as a chart and I absolutely love this I mean the ability to do this and just go quickly visualize your data with SQL is awesome so you can go through and look at this again maybe not the best example of the chart I would use for this but we can go through and better understand that as we continue along all right um I want to see if there's any other questions that we have for today a Gateway required to connect to data no a Gateway is going to be specifically for on premises data um it was only showing a yeah so currently um data lakes and Azure SQL excuse me Azure data Lake storage gen two uh and Amazon S3 buckets are the only way you can go through and connect to external data there's going to be more that are added uh cicd best practices we talk about that again in the fabric boot camp we talk about how do we then go through and talk about uh some uh continuous integration continuous deployment or delivery or viage may vary on which one you want to choose for that integrating devops into powerbi again conversations we have there so definitely check that out now as we start to wrap up this boot camp I number one want to say thank you so boot camp at this learn with the nerd session uh as we start to wrap this up I want to say thank you for attending uh hopefully you have had an awesome time I want to put a couple of links in the chat number one my email address if you want to talk to me about any of our offerings send me an email address send me an email we can chat today about that we can set up a meeting say hey Austin I want to go through and sign up for a boot camp or I'm interested in some of the other pragmatic Works offerings check that out I can help you out with that if you're interested in connecting with me on LinkedIn where I post a lot of fabric content and have YouTube content it's like kind of link to that you can add me on LinkedIn as well always love to kind of go through and see who's been in some of our courses and then as well I want to make sure that you go through and have that free trial sign up so that you can go and get access to the pragmatic Works library and see what's available for you there so definitely check that out as you go along and see what you have the ability to do very fast yeah we got to move pretty quick here uh Michelle so sorry if it was too quick always remember you can go back and watch this anytime we only had an hour and a half to do this session would love to spend more and more time again look for more videos in the future where we go through and talk about fabric this is not going away anytime soon so I again want to say thank you so much hopefully you had a great session watch this again if you need to kind of go back and look and review and see what's available to you and let's see what other offerings we have here we have coming up pretty soon the Excel beginner to Pro learn with the nerd session it's going to be taught by Allison in just about a month or so January that's our learn with the nerd session so you're like Austin I want to work with Excel this is this is a little bit much for me go and check that out we're going to have some awesome Excel content there we're going to continue to have Excel cont with we have a boot camp coming up for that as well so thank you so much hopefully youall had a great day I'll see you in the next [Music] one
Info
Channel: Pragmatic Works
Views: 33,951
Rating: undefined out of 5
Keywords: microsoft fabric, fabric, data engineering, full course, microsoft fabric demo, microsoft fabric lakehouse, microsoft fabric power bi, microsoft fabric pipeline, microsoft fabric for beginners, microsoft fabric launch, microsoft fabric review, pragmatic works, austin libal, fabric live, data engineering tutorials, data engineer, fabric microsoft, ms fabric, microsoft fabric tutorial, azure fabric tutorial, azure data engineer, data science, data analytics, fabric training
Id: e9CN96Y9PcA
Channel Id: undefined
Length: 88min 57sec (5337 seconds)
Published: Thu Dec 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.