Azure Data Factory: Beginner to Pro [Live Event]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
i'm going all right guys uh sorry about that i think we might be in order um so hopefully you guys will be uh seeing this for some reason it was a static image you should probably be seeing maybe a second there a purple screen so i think we'll be back in order we're back in shape the thing i wanted to bring up to start this off correctly was gonna be just a nice little simple slide so it should be just like this hopefully this is getting adjusted so hopefully things are a little bit better just did a quick little restart always fun in the tech world right so sorry for the delay but data factory it's what we're here for so we're going to be covering as i mentioned it's going to be pretty quick pretty fast but this will all be uploaded for you to review later and we might be updating those downloaded files to better assist you to go through this because you need multiple resources in order to actually get through this right so we're going to see that we're going to have a resource group we're going to have a storage account it's actually going to be an azure sql server with a database in there and naturally of course an azure data factory so a lot of different moving pieces which i've already pre-provisioned but we're going to talk about it we're going to go through it so you're aware of it and if you are watching this as a recording you should be able to create these resources now just in case right azure is always reinventing the wheel always reinventing itself so it's uh you know buttons got might get moved by it might get changed but that's why every so often we do these along with the nerds so we can have those updated items so that's what we're going to check and see so just in case as far as an agenda goes for the event we're going to be talking about just in general what is azure data factory we're going to talk about kind of the hierarchy of resources integration run times like services data sets data flows and then of course at the end if we have time it's like a little extra point here that talks about synapse pipelines and data flows so that's what we're going to be kind of getting into and that's what we're going to be exploring as far as the itinerary for what today looks like um we'll probably move through we do have a technically a scheduled break here at 12 25 just in case any times i mentioned are going to be based off of east coast so 12 25 will be that time frame we'll take a small little 15-minute break and then we're going to continue beyond it as mentioned after this class is over we're going to get this kind of encoded and then it's going to be available to be viewing on demand so that'll be there i do have my commander-in-chief here in the background my partner in crime if you will mitchell pearson he's monitoring the chats if you guys want to give a quick hello and uh what's up to mitchell pearson he's there he's gonna be watching for questions answering what he can um and then as well of course he'll pop a couple questions my way maybe on the break that we can answer those and also if you want uh we have uh devin knight his birthday's tomorrow if you want to wish him a happy birthday guys so a lot of different things on the menu that we're going to be getting through that we're going to be covering so hopefully we can get started all right so azure data factory right this is effectively going to be a resource that we can provision inside of the azure portal itself that will give us the capability of course of creating pipelines data flows all along the time we have to use some resources like integration run times and link services which is what we're going to learn about but this is going to give us the ability to tap into some features like pipelines it's commonly known to give us the ability to do orchestration uh integration with many other applications and also data movement while data flows is going to give us a more traditional cloud native etl process an extract transform and load process where we can connect with sources so that's what you extract then you can make some transformations the t for transform and then you can load it into a destination right the l so we have the capability of accessing both those pipelines and data flows in an azure data factory instance so you actually have to provision this so just in case if you're following it of course you do need to have an azure subscription um generally you need to have a form of a role you can see a couple of roles listed here in order to be able to create resources within that azure account itself you could do a trial account which gives like 200 bucks and 30 days of full access but in some way shape or form to be able to do this stuff you got to be able to create resources right of course the approach for us in creating these resources is going to be through the lens of a demonstration purposes so we are going to kind of create let's say like our sql server it's going to be set us we're going to create it so by default be the administrators and in our case i'm going to go ahead and open up the firewall rule just for me so the idea is if you're following and training along definitely we're just going to set this up for working with azure data factory for the scope of this class but don't forget if you want to get deeper into things like knowing more about let's say storage accounts or azure sql server make sure you check out in the youtube we have a older an older video from mitchell pearson called azure data services which does focus on those elements so you can go a little bit more in depth but we should have some of these topics as well in the future so keep your eyes out guys so to start with what's a resource group it's just an organizational container for those of you new to azure you need to have a resource group in order to basically store any service whatsoever it kind of starts with a resource group there's technically one parent resource above this and that would be a subscription everything that you create and do is tied to subscription which is effectively an organizational container for billing while the resource group is effectively an organizational container for services so generally if you're going to be working with an end-to-end project you're going to keep all the assets all the different pieces of that end to end project in one resource group it makes it simple and easy so you can update and set state policies across the entirety of the resource groups which will traverse all the objects you can create some permissions related stuff that'll trickle down from a hierarchy perspective to those objects it just allows you to update maintain keep move all the assets of an n10 project inside of a resource group so you do want to keep it kind of organized of course from there we're also going to have a storage account um which is going to we're going to be using the service known as the blob service um many of you probably are familiar with the term azure data lake store right so that's actually the storage account we've created it's just an option we're going to look at it real quick as we go through the process so you can see what we're discussing but effectively we're going to use this blob storage system for hosting and storing various files that we'll be using inside of our demonstrations whether it be to host a file that we're going to use as a source system or maybe we're just going to end up writing to a location within that data lake itself but we have this storage account we're going to use it for that blob service for those of you who are newer to using it you can almost consider that data lake like a root directory you create multiple containers in there those are root directories and then you store files and folders within there so that's the idea behind it naturally we're also going to use an azure sql database this will kind of serve as giving us the data lake and a database so in some way shape or form one of the other can serve as a source and or a destination so we need that kind of capability for us lastly of course the whole reason why we're here what we're going to learn primarily around is going to be the data factory this is once again a resource that you individually provision it has its own kind of governance and access and security so these are all separate services which i've already gone through the motions and created so we're going to look at those very quickly as far as for a setup should you want to kind of stand this up for yourself so you want to follow along as mentioned we should have a follow-up to this class i'm gonna probably update that resources folder with an arm template for you guys um uh turning off the camera yeah yeah let me try i'll try that mitchell good girl see if that helps out with no camera so let's go through some of these resources that i created and we'll showcase this really quickly so i'm going to back out of this and i'm going to go ahead and open up my browser right so i'm going to go over here and log into portal.azure.com i'm actually just going to bring this up on the screen over here right so we have that set and you can see i already have a resource group here located which i have all of my resources within these are what we're going to be using for today there's effectively nothing really been loaded inside of the actual uh the data factory it's completely blank the sql database completely blank but we're going to walk through what these resources would look like so the resource group we're not going to go through that's very basic in the setup we'll just actually go ahead and showcase first how to create a storage account which once again has multiple services the one of which that we're going to focus on is going to be the actual blob service so you can see right when i go ahead here and i either hit get started create an item if you just hit create a resource in the upper left all of these things will get you to the marketplace and you can see storage account is visible just here right so when we select that storage account option you're provided with the capabilities of deciding okay i'm going to make a storage account what's description should this be under so you just want to pick the corresponding organizational billing container and then which resource group do we want to connect this to where do we want to place this now if you already have a resource group created you can see you can choose from those items or at this juncture you can actually create a brand new one should you decide to it is really your choice you can name it whatever you want the resource group technically that i'm using is this very one here it's called it azure data factory hyphen learn with the nerds and you can see you just kind of go through give it a name everything has its own kind of naming convention here definitely kind of keep an eye out it tells you like hey this one has to all be lowercase um lowercase numbers and letters only right there's no special characters everything seems to have its own fun naming convention so you go through and you configure it i haven't done anything out of the ordinary for this setup except for one item in particular and that's just simply the advanced tab in the advanced tab i have gone through and there is a section here for data lake storage gen 2 and there's just a check box basically do you want this storage account to be a basically leverage hierarchical name spacing right this basically just allows us to work with more traditional big data analytic workloads but it still kind of maintains a similar idea that we're working in a file storage system right you still have a storage account you still create containers and within those containers you do have your respective folders and files but there is a bit that goes into discussing what is the difference between blob storage traditional and what is going to be the difference with adls so definitely something to check out we have some of those videos and information located on our website and also the ads class also goes into that but this is the only thing that i check marked once i configured it and i literally went through and hit review and create so for those of you who are going to watch this after the fact that's it just give a name to the storage account put it in that resource group that everything else is going to go into and check mark the enable hierarchical namespace that's just a more fallen line with what i've done is that a requirement technically no you don't have to but it's just the route that i went so once you go through creating that it's a fairly quick process to create your storage account the only thing i've done is created a degree of containers so this is what it would look like with the storage account done and available where you can go and configure right you can go over right here under storage accounts you can locate the resource group in question you'll see that i've actually put my resource group right here on my dashboard so i can very easily access it and when i look at my storage accounts the only thing that i've gone through which you can just kind of set this up is the containers in the container section here you simply are just creating effectively like i like to consider root directories right that is kind of an area where i kind of go and that we're going to be focusing on um so yeah so that's what we want to do and we will then move on from there you can see i've only created one container called data and then inside of that once you're in a container it's like you're navigating a folder directory right you can see i made a folder called source i made a folder called destination and we can upload items directly to it right if we actually look and examine the resources which can be found in the description from the download you're going to see that i do have right here under resources just this little uh kind of item here that we can go ahead and bring into the file so as the classes go on we're going to actually go through and upload these respective items into their resp into their respective locations as they'll serve as resources for us so i do believe this one right here we're going to end up loading into our directory which you know what we can go ahead right now as you can see right now class files resources demo 2. i'm going to go ahead and kind of locate this and i'm just going to upload it into my my data lake right so i'm going to go here and why not right now i'm in my source location i'm just going to go ahead and hit upload which all this does is show you a local file directory right and there is just in case real quick there's a question does it cost anything to create a resource group the answer is no there um it's actually a super simple organizational container there's no cost for the actual resource group but obviously the services inside of it would carry potential costs um so i'm going to go find that file that we just talked about right i'm going to go over here to my desktop i'm going to find my azure data factory class files and i'm going to grab that input emp and i'm just going to upload it super small file you can see it's just 52 bytes super basic txt file there's nothing much going on in here if we actually click on this you can look at some of the details and even view it because it's a text delimited file i can actually see the results in here and make changes if i wanted to but the moral of the story is i now have a file right here inside of the data lake that i can interact with um it's there and you can use this for you know now it's inside of our azure environment inside of the storage account that we can tinker with so that's one of the items that is my the data lake itself on the other items in question that we'll look through very fast is of course an azure sql database and i just went through and developed the most basic sql database that you can imagine just hitting create you can type in sql db you can see there's azure sql database and this one definitely has more configuration options like i said having the um the actual arm template to create all these resources automatically for you uh will be something that will be helpful but you can see it simply says hey you know you want to create an azure sql database actually this one is not where exactly i wanted one second sequel database i think that was a third party item so here it is sql database that's what i want to create maybe all right there we go so you can see here it's once again asking those same kind of questions right um what is my subscription what is my resource group so once again we're kind of keeping everything nice and clean together we give it a name um you do need to have a server so the nice part is when creating a database you can also just create the server so you can either point to one that exists or nicely create a new one right and these are all done with the basic default settings uh because the uh we're not focused on like sql database and data lake and those things we're keeping basically all the settings for these resources as standard the only thing you do want to consider because this one will have the potential biggest impact on cost would be the configuration of this database and for our example i've actually gone in to configure this and i've set this to the i think i'm using the standard s0 provisioning so it you can see it's roughly 14.72 cents per month but you can actually go here and use the basic option which is just 4.90 per month estimated that would be sufficient for this type of demonstration so you just go through and like i said you configure a sql database with no information sql server with no details and that will put you in a spot here where you now have a sql server and a sql database this will take a couple of more moments to provision than the data lake will but then you'll end up with this the only remaining items once this is provisioned because i'm going to be using the query editor built into the azure portal is that we need to grant us the capability of connecting to this so in this case what i do is in the sql server itself there is an area in the settings for this called networking where you can basically white list ip addresses for the most part with an azure sql database the server itself is effectively locked down by default with the settings you just can't access it so if you go into the network settings for the server you can actually add your specific local ip address so you can go in there and just add your ip address so while you're working through this demonstration only you only machines from that ip address can access this so it's still pretty darn locked down but yeah that's the only thing i've done so a standard data lake a standard sql server a standard database and just opening up the firewall to my ip address like i said just wanted to put those basic instructions out there for those who are following along but in the future you should have a nice um a nice uh arm template that will actually create all these resources for you you just need to populate the relevant details and information and then lastly of course it's the data factory which i've been seeing some people talking about adf in there and someone actually wrote something that's pretty common right for azure data factory right we go and once again we look at creating a resource and we can see what the description is here but there's an interesting one that i already just saw come through in the chat right a hybrid integration service here someone wrote in the chat ssis which is a very similar popular discussion that people have as to well what is adf ssis is microsoft's on-premises solution for etl work extract transform and load so as soon as this was announced people like oh is this going to be ssis in the cloud and you know it's very much in the v1 version those who've maybe looked at data factory way long ago definitely a little bit different it really wasn't giving us that ssis look and feel but in the data factory that we have now that we're going to see and we're going to explore i feel it very much does give us that similar type of capabilities that we find in integration services it's just a different designer right you have pipelines which in my opinion relates more to like the control flow in ssis and naturally you can kind of guess because the naming convention data flows works a little bit closer with um equivalent to the data flow task in ssis so there are some parallels we i saw that right in there um yeah i saw that information in there as in the chat so very good kind of description i see it used all the time but like i said course if you're going to be following along with this after once we put that recording up you of course have to create the data factory itself which does have its own process there's of course many many different configuration things that you can set here as you can see you start the same as everything else what is your subscription what is your resource group keeping everything in the same container and then you just give it the name of your azure data factory instance there's some additional fun stuff in here that you could get into like i said we've done a default kind of installation here a provisioning of this if you will where we did not kind of set up a ci cd so continuous integration continuous development with git you can definitely do this you can see devops or github you can do it it's always a great thing for production ready work but you can say you know what i just want to provision data factory and i'll set this up later when i want to do it and there's of course all the standard options you would expect coming from provisioning a azure resource where you can kind of configure this to work with a managed virtual network if you're leveraging that for you know just security within your organization if you have a virtual network in play and you want to put this as part of it you can enable that here but the nice part is you can actually do this after the fact as well so even if there are choices that are made here you can kind of go through the process and set this up you can see there's some discussion here about the self-hosted integration runtime i think i already saw some conversation on that in the chat but don't worry we're going to get into integration runtimes here in a moment so we'll discuss it but like i said across the board if you're going to mimic the environment i have the only thing i've done is select this get configured get later and literally i went and hit create resource that is what ends up giving me this resource group and we have lots of items in here right i already showed my data lake off the sql server's in there there's really nothing going on i did add my firewall rule in there so we're going to get into using that here in a moment as just the relevant resources we can leverage but there's a lot there's a lot with azure data factory and that's what we're going to start focusing on now that we've talked about like the layout of the land and what i'm going to be using for this let's now get into more of the conversation around the data factory itself right so inside of the tool um and funny enough if anyone here is maybe been um we had a synapse analytics learn with the nerds uh it's gonna feel pretty deja vu-ish here the ui or the interface design that synapse analytics took is from data factory so when we're in there it feels very deja vush as far as the appearance how you navigate in all of these items and officially we basically kind of relate to these as hubs so when we go here and we launch data factory for the first time which we're going to do just when we're done with these slides is we start with the home hub right you can see just right here on the left hand side we actually have some icons what i can do is i can actually bring that up so we can move between them right here is my data factory right i'm going to go ahead here and launch it so this actually launches in its own browser just like if you were launching the synapse workspace i said it's pretty similar and you're going to see that we have these areas here so this is what we're kind of referring to the hubs this is what you navigate through this is kind of what you leverage so that's what we're seeing in this screenshot right here in this powerpoint so in this section you have some actions the home the home hub is kind of like any sort of overview basic hub where you have access to do actually a lot you're going to see your recent resources you're going to see um some wizards so you can see that ingest option that's actually a quick access button to launch the copy data activity wizard which is actually going to be the first pipeline that we're going to create orchestrate just brings you to the create pipeline screen from scratch transform data opens up a brand new data flow to create from scratch and then lastly you do have a configure ssis runtime which brings you to the runtime basically it brings you to the manage hub which we'll get into so that you can configure a runtime but we haven't really discussed runtimes we're going to get into that momentarily other stuff that i just didn't include in the screenshot if you kind of scroll down you have this discover more which is a nice link to various different templates and it usually brings you to microsoft documentation the feature showcase in the resource also brings you to blogs and other types of internal documentation from microsoft so the next hub going down the list is the author hub and quite often this is where you do a fair amount of your work honestly once you have a certain things set up as we're gonna discuss there's almost like a hierarchy here that needs to be understood but in the design area you have the ability of creating pipelines creating data sets creating data flows and as you can see at the bottom they've you know it's been a little while now but this used to be another form of a data flow called a wrangling data flow but it's kind of been more formalized given the name power query so if you remember reading about wrangling it's been renamed it's the power query editor online it's something you can leverage alongside of pipelines so it's pretty interesting if anyone's coming from maybe uh now azure analysis services or maybe a power bi background and you're familiar with using uh power query editor right the power of m there you do have a limited version of this out here not every form of the transform if you were to compare it to like power bi let's say you don't have all of the transformative capabilities here within data factory but it's here so it's still cool that it's available to us and like i said it gives us a different way to transform that maybe you are more familiar with if you're just getting into data factory the monitoring hub we're just going online there's only four this is where there's some really nice built-in monitoring guys you can literally watch and see as your pipelines are either triggered by schedules as you're debugging them maybe you were you ran one of your pipelines to check it out and you accidentally closed out of it so now you can no longer see the outputs of it you want to see what was the results of that debugging it's all here in the monitoring hub right you can monitor directly your pipelines your data flows your data flow debugs all the assets in here that could potentially be recorded are available to you right so you have this nice and built at your fingertips there are ways of course that you can even extend the logging even more so but it's nice knowing that out of the box you've got some great stuff right here at your fingertips right monitoring as we go through the class we'll probably just adventure over to the monitoring hub and then we're potentially quite often your where your where your projects are going to start is here right the manage hub actually has a couple of critical pieces we're going to talk to a degree around integration run times because technically when it comes to resources and the hierarchy of them the integration run time is kind of at the top right this is effectively going to be the item that manages the compute the resources are going to be used kind of processing the execution of your pipelines and your data flows this is what's going to kind of meter everything and how you're going to be charged is through your integration runtime and you can scale these things up or down basically you can increase the potential of how how high it can scale i guess you can say by default when you create your azure data factory you will get a default azure integration runtime more on that here in a moment but it's there and technically you're ready to go out out of the box you have your integration runtime which is step number one but you can actually create your own explicit integration runtimes for other needs quite often workload balancing right not only can you create integration runtimes here this is also where you will create your link services more on that in a moment but effectively these are going to be your connection strings for your data sets and we'll talk about more about that relationship but when you create a link service you basically choose what technology more formally the term is called which data store you're going to create a link service for and then you specify which integration runtime you're going to have manage that connection along with of course how are we going to authenticate so we are going to create a couple of link services we have that data lake i discussed and we have that um that sql server right these are going to end up being once again either source or destination it really just depends how we define our data sets but that's going to be one of the biggest things you're going to leverage here is link services and integration runtimes but as you can see there are other types of items in here at the very end of the class we should talk about triggers i think in the vast scrolling list that is um the the chat here so i love it i think mitch is doing his best here i saw something about scheduling potentially somewhere in the list but um there are definitely different scheduling options and that is known as a trigger technically there are ways to uh kind of kick off or execute your pipelines outside of azure data factory that's a little more advanced so we won't be going into it but i'll probably just discuss it so you're aware of that anyone here who may be familiar with logic apps or power automate it could be a pretty cool way for you to go about it but there are various different trigger options here that you can leverage so basic design basic layout of just getting it to azure data factory right the different hubs and how potentially you will interact with those hubs we've already briefly discussed integration run times right once again this the integration runtime is the compute infrastructure you hear the term compute a lot when working with azure these are effectively the virtual resources that you're going to be leveraging right the cpu the memory the required resources to basically execute or process some sort of project some sort of command some sort of query right this is what we're using and the integration runtime effectively manages that and it's going to be for either data movement or ssis package execution because you have three integration runtimes potentially that you can create so as you can see the first one and the one that is built for you by default is actually the azure integration runtime this is going to be what you use to effectively natively connect to you know your various different items that exist within the public networks within the clouds right so this is how you connect to you know aws cassandra you know if these things are available online in the cloud you can simply use an azure integration runtime to connect to it obviously this is what you're going to use for anything like specifically azure so data lake sql db you can use the azure integration runtime but as soon as potentially either your source or your destination starts moving towards the direction of leveraging on-premises sources or destinations let's say i want to pull data from my on-premises sql server and i want to move that into files inside of my data lake that is now when we need to you the answer is yes you can do this but you need to create your link service leveraging a self-hosted integration runtime now we're going to discuss this and talk about this but we won't be going through a self-hosted integration runtime as that actually requires you to download and install an actual self-hosted integration runtime service so you have to go through the process install a service locally within that private network that you effectively link to your instance of azure data factory so you can create that bridgeway that tunnel if you will to communicate between private network and public network right that is what the self-hosted integration runtime will give you the capabilities basically to run activities between cloud data stores and data stores in your private network so really powerful really cool especially if you're maybe in a hybrid scenario solution self-hosted is definitely super super important right but like i said quite often you're going to need to get it involved or you know administration because i said you have to get a service installed generally on some sort of application server that's always on always running that's the idea and then lastly the ssis package execution right this is the third option it was actually right there in the home hub ready to go there is an increased cost to self-hosted integration runtime you can actually find the costs of all of this um i can probably at the break week take around 12 25 what i'll do is i'll get the general documentation around pricing but generally self-hosted does have an increased cost per diu per hour so the the the cost of running a pipeline is naturally across the amount of um effectively diu's it's an interesting term we'll get into that and the amount of dius per hour so that's how it's built and there is a higher cost per hour for self-hosted it's not astronomical but it is more expensive so like i said we won't be actually implementing self-hosted but we are obviously having that conversation now and we'll probably go into that a little bit later but for those hybrid scenarios where you want to take your already existent ssis packages and you would like to deploy them inside of an actual kind of azure compute environment that is what your azure ssis integration runtime that is really its only purpose you literally will be storing your ssis packages inside of a an ssis catalog within an azure compute environment it's effectively within an azure sql database so that also very niche very specific more for hybrid scenarios we won't be actually deploying and creating that all right so irs they're actually critical they're very important we'll look at them very briefly once we get back to the actual data factory itself but it's very that's where everything starts once again now we do have one ready for us out of the box but once again it's not uncommon to explicitly create your own integration runtime for your workload balancing or if you need to go on-prem you have to right you have to create that self-hosted so that is where a lot of times the story will begin when you start with azure data factory now link services once you have an ir right now we can create a link service which again defines the connection to whatever data store that you want to connect to right so if we want to connect to azure sql database i'd go i create a link service i choose that technology or data store and then i have to configure it it's going to ask me okay well what's the server which one is the database and how are you going to authenticate right so it asks us these items and that will change naturally depending on what type of data store i choose that's the options for sql db right if i were to choose something like azure data lake obviously it's going to give us the actual storage account name and then how we want to authenticate to that if it's oracle it's you know they're going to give a url but everything it changes based off of your choice of what we want to connect to right and these link services they're reusable assets when i point to an azure sql server and its respective database that means that this link service will allow me to point to whatever tables views stored procedures whatever it is in that database this link service covers that in more advanced ways technically you could set it up so you parameterize that so you could pass through different database names so you could have one connection to a server and then parameterize a database name but that's an advanced topic for an advanced day we'll probably do that in the future there guys so i have my connection i create my link service that goes to let's say my azure sql database it's all set it's in place good to go then what we say is now i want to create my data set i know i have a connection to my azure sql server and this database the data set would effectively be the table view or stored procedure that i'm interested in that's the relationship between data set and link service if the link service is pointing to my data lake then my data set would be either the file or the folder that i'm interested in right so that's how all three of these technologies come together link service or part of me integration runtime link service and then the data set that's kind of the order of operations if you will small subnotes as we're going through these i'm not going to really take advantage of it but quite often you're going to have an azure data factory instance possibly multiple users will be part of it you can create folders for grouping purposes this is in the data sets pipelines data flows you can create folders and group them organically however you see fit so i'm not going to use it but it's a simple thing that should be kind of made presence so we're going to go here and now create a couple of link services and we're going to be pointing to our azure sql database that i created and we're going to create a link service to the data lake account that i've also created so once again this is generally where this is going to start and when we go here i think some of you will get some of your answers question as to well what can i connect to right so i think we're going to see some fun answers to this one right here so let me go ahead and bring this up we're inside of data factory we need to head over to the manage hub and we're already here in the link service once again small reminder there's already an integration runtime here you can see it's the default one it's present but you do have the capability of creating additional integration runtimes when we create this link service it is going to ask you what integration runtime do you want to connect with and you can see once we go through this process that you can go ahead and configure this and change out exactly what we want to do is it general purpose memory optimize how many cores are we going to have this is the control cords and then you have workers so you can see you have 16 driver cores and 64 workers this will of course increase you're choosing at what level this can scale to so obviously that will produce and have a higher cost if you're letting it scale higher if you do this automatically and you have to remember right if if it's using four dius if it's using four of these cores and it takes an hour to execute that and you use eight cores but it only takes 30 minutes to execute the the overall price is effectively going to remain the same so you there's always a very careful balance of trying to figure out you know how much do i want to scale it there's kind of a a wall of diminishing returns here that potentially you could just be scaling it up and up and up but it's not going any quicker so you are effectively just putting a little more money into that execution and it's really not a big impact so but we're going to take advantage of this default integration runtime and we're going to create some link services right and the first one that we're going to go through and create is let's create one to our sql server right and you can see as soon as i hit new there's a lot of stuff in here i've been seeing people ask questions i think i saw something about hey can we connect to amazon web services you can see there's rds for sql server rds for oracle amazon s3 it's compatible i mean if you're here if you actually are in azure data factory or if you're re-watching this and going through it just scroll through the list there is a pretty large list here and this continues to grow we can see a couple are in preview mode and they'll eventually become ga quickbooks preview you know there's a bunch of stuff going on in here and they're adding connections all the time right so definitely something to observe because today there may not be a connection for you that works but tomorrow there might be there is a lot going on in this list of data stores right so for us it's all about trying to figure out exactly what do we want to connect to so in my case i want to do azure sql database i can just start to type in sql that's usually sufficient there's a whole section for azure but in here i can see there's azure sql database right there and i'm going to go ahead and select this and hit continue and then we are simply asked a couple of things once again this screen is going to be specific to our choice so in my case i'm going to use a simple little naming convention where i use you know it's an azure resource because as we saw we could be connecting to amazon cassandra db right this is an azure resource specifically it's going to be a sql database and then usually i will put the name of the actual sql server here or actually the name of the database itself so if i revisit my resources over here we can see that my sql database name is right here adflwtn database or i could just put the name of the server this little guy here it's really your choice right we need to easily be able to identify what is this pointing to so if i do an adf oops adf learn with the nerds i think i have it here what was it server so you do something like that you know something that's present and visible so you can see oh from a glance i know exactly what this link service is pointing to naming conventions can obviously change and modify over time um you also got to follow the appropriate naming conventions got to keep that underscores going but you'll find your own kind of naming convention to make it as easy and straightforward to you know to recognize you can add a little description if you want to and as i mentioned you choose which integration runtime you're going to have manage this we have our one option here so that is what we're going to stick to and then it's basically choosing what we want to connect how we want to connect pardon me and i'm going to go ahead and just go with an enter manually right i'm going to go over here and find my server so you can see there's my sql server and there is the server name right that's all it's asking for copy this server name and i put that in there and the authentication that i'm going to do and also what's our database name i think it's just this little guy right there i'm going to copy that and once again there are ways to technically parameterize this we'll get into the expression language here in a moment but you see it's actually giving me the option to do that but here this is going to be a connection that points to this server specifically this database and then we have to specify how we want to authenticate there are many options here quite often by like kind of from a production perspective using the actual managed identity options um is a very popular object in that case we would actually have to take the time to look for this application it has its own kind of application id and we need to assign that managed identity access into this actual sql server so it would have its own process where we would have to make sure that's taken care of but then we could use that option and we effectively authenticate this entire azure data factory to be able to access that sql server and or data lake so the manage identity option is definitely a very popular one now as part of my sql server creation i created a general kind of sql admin accounts so you can see right here just admin user so i'm actually going to stick to just using a basic sql authentication type and i just need to provide the respective information you can see also it's promoting make sure i type that correctly it's also promoting the usage of azure key vault here so that is an option that we can use make sure i've got that oops actually another thing about it i might have to switch my firewall here let's see if that's the rule let's see if that's what they're saying if i just fat fingered that let's take a look this is good yeah so funny enough one of the things we changed was um i hardwired in right so i need to actually go over here to networking and more than likely but uh it looks like it's the same but just in case i'm gonna add that into the mix i've got that sets got this server all right let's make sure i'm using the right admin user and let's try that again when you're typing on a new keyboard it can be a little funky so just went in there kind of updated the firewall rules so we can get that set as you can see like i said with sql server it's locked down by default you have to go through the effort to make it accessible but now i have a connection that's going to be managed by this integration runtime that i can use with data sets now i have a way to point to this database and really connect to whatever table stored procedures views whatever's in here this link service is going to give me that capability of connecting to it right once again have to have the link service or you're going to stop when we get to data sets the first thing it's going to say is okay what kind of data set do you want to create we choose sqldb it says all right choose one of the compatible link services that you have or you need to create one now we have this anything we're going to do with the server boom it's right there but we also want to connect to our uh data link right we want to connect there so what i can do is i can create another link service and these are we're going to use for the rest of the day and i can just search data link here we go we know we created the azure data lake storage we give it a name similar convention it's an azure resource adls and then this is where you possibly would create and give the name of the storage account itself so maybe this little guy here writes some indication visibly of what storage account am i connecting to so i've got this set and once again which integration runtime we're going to use we'll stick to the default and then we must choose well which actual storage accounts we want to point to so i'm going to choose my subscription and choose that very same data the data lake that we put up there so this adf learn with the nerds sa is what i named it you can see it's just right there just in case let's test that connection it's using the account key so now we have a connection to this data lake and pretty much whatever containers i end up creating whatever folders i have i'll be able to navigate through those directories and pull back whatever folder files that i want i can write into this directory we can create new tables in the sql database these are what we're going to use for our pipelines got to start with the linked services right that is critical it's essential and i said everyone here is going to be there regardless of what you're trying to look at connect to this is what you're going to lean into and if you're more interested let's say into let's say like the data flow of things when we get to the data flows you're going to see that there is there aren't as many connection options when we talk about the data flow itself so there's far more connectivity there in regards to just kind of azure native you can't connect to amazon s3 dataverse they've been adding more and more but it's about 20 percent of what you have available with pipelines so a lot of times let's not forget pipelines are going to be your orchestration and your data movement tools this is probably going to be your go-to for connecting to external sources to bring that data into some sort of azure kind of storage element whether that's a data lake sql database sql pool whatever it is because once you've made that movement and this is generally referred to as possibly like an elt remember we talked about etl this is going to be more like an elt operation where you use that pipeline to just take the data from oracle let's just say you moved it into files inside of your data lake and now you can connect to that with your data flows and do whatever transforms you're interested in doing and writing that to whatever maybe final destination table maybe that's your sql database right so hopefully that's starting to paint the picture of how these different things are going to fit together right that's how i related the ssis control flow to pipelines and the data flow tests to the data flows here they work seamlessly together and they're really meant to be and you'll see that way more once we get into things like well how do i run my data flow it's going to be through pipelines right so we've got our link services there we've talked about the integration runtime i'm going to head over to the author area here because in a moment we're actually going to walk through doing our very first pipeline but we are going to use the wizard to be our first experience and the wizard's kind of nice the wizard technically could help us and allow us to create a linked service from there and a data set but we created our link services ahead of time it's always good i always like to create these objects ahead of time and then leverage them within the pipeline and data flows but technically you could create it and you'll see that when we go into this process so just a couple items on what we're going to start with the copy activity wizard this is a very common utilized transform or technical activity within pipelines and it serves a very very simple purpose the copy activity wizard is literally a source and a destination that's it and just in case i keep using the term destination what you're going to see here in a moment i don't see it in the screenshots i actually have destination i think the wizard shows destination but the term that is used that you will find in these activities and the data flows is actually the term sync s-i-n-k so if you hear me say destination um quite often that's just habitual uh it's synonymous with sync right it's the destination you can see it's even in this slide here uh but we're gonna see that we're gonna set up there's a really cool wizard um there is an option for sharepoint john you can go from there's like i said if you if you go into that data store that's a pretty massive scrolling list there's a lot of options but we're going to go through right we're going to check it out we're going to walk through this wizard which is going to start off with what do you want to call this what's your source what's your destination it creates that it creates whatever you did let's assume we hadn't create links it would create two link services or depending on you know what you're pointing to could be one but it'll create your link services your data sets it'll create the new pipeline and it'll have the activity already configured and it's a wizard right it kind of walks you through as you can see steps one two three four and it creates this for us which is pretty awesome and you can see obviously you know a couple choices hey do you just want to run this once whenever the wizard's done do you want to create a schedule there's a reference for a tumbling window which is effectively something that you can schedule within an actual single day but you do it at the minute or hour intervals and it can be something you can do for if there's any if there's an issue or something fails you can kind of revert and pick up from that piece so it's interesting but there's choices as in a wizard you move through various choices and you have to make those decisions right so these are just what you would find in the wizard itself best way to get through this is actually to showcase the copy activity wizard and what we're going to do is we're going to go ahead and um yeah we're going to create a table that has information in it in our sql database and we're going to take that information from the sql database and write it into a file inside of the data lake so that is going to be the direction we'll be going for this example so for this you're going to see that i've kind of uh in the resources folder let me bring that up here i have just a simple script if we open this up to want to open up a very basic sql script there's a couple of ways we can go about approaching this you can either open something up like sql server management studios and connect to it but i'm actually going to take advantage of a built-in capability right here within the azure portal which is fun you know sql server manager studios would perfectly work for this but i'm going to find my database right i'm going to right click and i like to kind of have a tab open for it so this will be where i have my database and i have this link so if i want to go to any other resource i can but basically what i want to do for this database is to connect to it right connect to this database so i can run scripts against it and one of the ways we can do this is right here in this query editor which is pretty fun you just have to provide a an account that has access to this as long as i typed it right there we go and we can see as of right now there's nothing in here no tables no views no stored procedures nothing so it's pretty blank but you can just straight up write sql queries in here select statements you can add as you can see create statements alter statements so we can do all of that so for me i'm going to go ahead and hit open query i'm going to find that very file that we were just talking about right in demo one and it just loads in a create table statements you can see boom right there create dbo emp it's three columns the first one's going to be identity column so we don't need to input values it's just going to start with one and increment by one and then we're just gonna insert two records john and jane doe that's it first and last name three columns two rows super simple stuff it's all about the process not necessarily the data we have in here but we'll be working with other data elements in a little bit but i'm going to run this we're going to see that boom it's successful if i refresh this i now have a table with two records in it right nothing crazy nothing fancy but we have now something inside of our azure sql db that we can query and this is going to allow us to do the copy activity wizard so i've got this here the idea is let's just take the contents of this table and put it into a delimited text file somewhere in my data lake and to be honest if we go into the data lake if you remember the structure i created we have our data container and there's this destination folder i'm just going to drop it right here that's what we're going to end up doing so i'll leave this here because technically when it's all said and done i should just be able to hit refresh and voila we're going to have a text file that has the contents of that table so let me return back to data factory i'm actually you technically can access the wizard from here as well you notice if you hit the plus sign here and you go to pipeline oh pardon me right at the bottom the copy data tool this is effectively the wizard but let's not forget also in the home ribbon there was this whole hey ingest and these both options will take you to the same screen so right off the bat you simply do you want to go ahead and just create a pipeline using one of the built-in copy tests it's very simple straightforward you have lots of connections you're talking about 90 plus data sources that's what we saw when i hit when we hit uh add link service remember that scrolling list this is what it's referring to there's another direction you can go here where you can actually create a parameterized pipeline but you would have to have an actual control table which contains the basically table names that you want to create we're actually going to talk about a control table later it's one of the examples we're going to go through that so i saw a quick question in there can you read from adl gen 2 in ssms there are ways that you can use an actual azure storage explorer experience in sql server management studios and technically speaking um depending on what you're connecting to you can use what's called polybase to query items in the data league little off topic we aren't going to cover that in here but the answer in short robert is yes you can query files in the data lake from something like ssms um so i'm going to use this built-in copy how do you want to run this we'll just run it once right here we don't need to put this one on the schedule we'll talk about that later so this is okay next is source what is your source and you can see if you choose one of these options up top it's effectively just filtering what your drop down list has here if you don't choose anything at the top you actually get to see all of your link services so you can see i see both my actual data lake oops the zoom there didn't work too well one second all right so you can see there's icons here that shows that first one and my naming convention really helps out here i can see the top one is my sequel database and the bottom one is my data lake so in this case we decided my source is going to be the database so this is just the connection our link service now it says alright alright we need to create a data set so what it's actually done is given me a little ui of all the available tables in here if you have views you want to look at just check the box and the list will be populated by views as well we only have the one table in here so this actually makes life pretty easy i can just hit dbo emp if you want to technically you can preview this real quick see what's going on in there this is what i want to connect to this is the data set that i want to derive using this linked service so that's our source it says okay okay okay do well also you can preview this if you want to do some isolation stuff but we're not going to do anything in this realm we're going to say that's my source what's my destination and we already decided that we're going to go in and use the data lake as our destination um so we're going to go ahead and set this once again you can filter down if you want we know though we want to use the data lake option it's very nicely labeled here so we can just choose it and then it says all right where exactly do you want to go ahead and write this to right so the nice part is there's a nice built-in little browse here right and i'm literally now navigating through my data lake through that link service that we have right and i say okay there's that data folder and we talked about it i wanted to put it in my destination folder and i can go ahead and hit ok so that is the folder path so specifically it's actually a little more specific this is actually my container and then the folder path after it what do we want to name the file itself i'm going to go ahead and call this file um let's just call it emp dot csv right a simple one here nothing too crazy that's what i want to create i'm going to go ahead and hit next what's the file format am i going to create a part k file am i going to create an avro file in this case i'm going to do a delimited text i'm going to leverage a comma delimiter for here and um and i'm going to say you know what let's add the header into the file here right you have some choices that we can make pretty standard stuff i'm going to do column delimiter header yeah nothing too crazy and we hit next and it says okay well what's the name of your pipeline going to be and i'm going to do something like demo one copy wizard something basic nothing crazy if you want to add a description in there there are some additional more advanced elements you can do here where you can point to some reference table for data consistency rules you can add additional logging so that if you want you can add some additional details that will be stored in a file that you can get into staging it's really more so if you're going to be going into synapse analytics so if you're going to be writing data into a sql pool staging can be beneficial there technically speaking we're going from db to data lake the idea is when you're going from like data lake over into the sql pool you can get some cool performance gains um if you use parquet actually but if you're not using part k staging will allow you to take advantage of that mpp architecture that exists there so a lot of these little check boxes there's actually a lot of underground underlying information and then it says okay hey you know what you're using this link service it's tied to a specific integration runtime do you just want to allow it to automatically choose how many dius to leverage which are called data integration units as you can see here it looks at your integration runtime which is basically has a degree of scale it's going to go from 4 to 20. technically if you scale your integration run up to the highest i think you can do 256 dius but as you can see it's number of dius times the copy duration times 25 25 25 cents per diu hour it was mentioned earlier hey isn't the self-hosted more expensive um and the answer would be yes i want to say it's like .27 or 2.28 um we could see it in the documentation but it is a little bit more expensive there quite often though when you're starting off you can just leave this to auto it usually does it in the most efficient way for all of ours it's basically going to use the lowest amount it's going to use four dius for us which is the lowest so it's going to do that automatically if you know you're hitting one of those walls of diminishing returns where it's using more dius but the speed isn't increasing you could manually specify how many dius that you want to leverage here right so there are some options to be tinkering with for us we're just going to go ahead and hit next which it kind of gives us a breakdown something we definitely want to kind of amend here though um which they actually did not have an option for this before but they do now which is awesome we can actually go through the process and you can see this connection name is fine we already created it this is what i mean about setting up the data sets ahead of time look at the data set name here source data set eht destination data set eht when they originally called it this pipeline was called like pipeline something eht that's like how it's supposed to relate to it um they're not great so this is where i usually and before you couldn't rename data sets but it's been a little while now that you now can but i'm definitely gonna hit edit on the source and it actually lets me update and modify the actual name of this data set which will be significantly better for us because once again we can re-reference these data sets right we can actually use these for other items if we understand and know what it's pointing to so this source is pointing to our sql server specifically it's pointing to this table called dbo dot emp so i could call this something like a z if it lets me make the change here there we go a z underscore it's sql database okay well what what table within this sql database is it and this is where i can say if you wanted to i could put the schema something like this emp so it's very obvious this data set is pointing to this table within this server it's going to be much easier to recognize this than this random kind of data set name so i'll do the same thing for my destination and switch it over to something like a z adls and then the the file that's being created we're going to call here employee and if i want to i could put like the format type in what but it's going to create a file called employee if i wanted to i could have put the file path it was data destination and technically employee so i said you do want to take your time and think about the naming conventions here because we want to make it pretty evident and obvious you know what we are using this for because then i said we could reuse this which we actually will so i have this set i've updated those names i think these look a lot better so this is the review and finish as soon as i hit next it's basically creating now this pipeline process of running this because we said run it once run it now so it didn't create a schedule but you can see it's creating the data sets creating the pipeline and now it's actually running the pipeline and it says that it's all said and done that it's correct that it's completed it has it i head over to my data lake here i'm just doing a quick little refresh and we can see there is a new item in here this employee file and if i look at it we've got the contents comma delimited like i told it to do right here so yes it did complete it did places inside of the data lake it is now available to us so with a couple of clicks with following through a wizard we've created our very first pipeline and it's gone through and run and we can see now if we go over to the author hub there is a pipeline and two data sets this is the result of our copy wiz and this is actually what the pipeline ui normally would look like if you were to start things from scratch which of course from this point on we're actually going to do but the wizard is there it's great technically you could go now and add on to this right it gave it the generic name of copy eht if you wanted to rename this you could but you're gonna see as we go through this this is how this has been configured right and there's a bit more once you get beyond using the wizard itself as you can see there's a bit more of an impact sure we pointed to our source data set right you can see it right here this is where it points to and yes i would like to leverage the table that's being referenced in here right we can see use use query table so it's basically saying use whatever is chosen oops here dbo emp that's cool but notice here's the interesting thing and it gives you a little more versatility technically we have a data set that's pointing to a linked service that's just pointing to that sql server in its respective database technically even though this data set does have a table specified within it you could literally just click on this little query button and you can go ahead and literally write your select statements your you could do joins you can execute you literally can do whatever you want actually so technically speaking this data set although we pointed specifically to that employee table we could literally query anything we want from that database from here so it's an interesting idea and approach and sometimes i've seen that done with data sets where you kind of just leave it open-ended and it requires the developer to come up with that query so you can hard code things like we've done you can leave it open for querying as you can see you can run stored procedures these options that you see present here will change depending on what type of data set we choose what if we ended up choosing our actual data lake for a source right in that case we're not going to have options like table query and stored procedure those aren't things that exist in a data lake right it's going to be different so we would see options like hey do you want to do a wild card search within a folder it gives you other choices but this is the user interface now i made some changes to it so i'm going to close out and not save those changes so i can keep it in its original state but this is what the results are of the copy activity it's kind of nice how it gets you started and just does a quick little example right so that is our first little pipeline just kind of getting right into it we have two data sets technically that could be used for any other pipelines or data flows right these are specific to what they're pointing to right we have our sql database if i want to pull and utilize this file in the data lake this employee file it's technically right there waiting for me that we can take advantage of it so pretty cool that that kind of has that kind of nice easy kind of interaction so that's our first item let's have the conversation on what we're going to do next now that we've seen what the copy activity can do let's start talking about some of the other elements around the activities because you know this over here this little canvas area check out i mean there's a lot right if you kind of start expanding this this is effectively the toolbox of all of the different activities that you have at your disposal so for those of you who are new who are just kind of thinking i wonder what are the other things that we can do with the pipeline you know can i go ahead and run a databricks notebook well yes you can can i go over here and maybe go if we go over where's my azure data explorer there's actually under general some pretty cool options you can make web calls you can go ahead and run as just a store procedures you can do scripts and we're going to get into a couple of these right here's our whole lift and shift stuff execute an ssis package there's some stuff with data lake analytics going with old school a little you sequel for anyone's so there's a lot right there's a lot of cool capabilities and options that can be found inside of here for exploring so definitely kind of check that out the first activity that we're going to get into is going to be one that's called the get metadata activity we're going to see there's been a lot of popularity and growth with storage accounts you know with data lakes um you hear quite often these conversations now around like data lake house so we're seeing a lot more usage and investment from a storage perspective in data lakes right so the get meta data activity is going to be a great option when you are working with data lakes as a storage system you can see it literally allows us to point to a file and or a folder and retrieve specific metadata information and these are just a couple of options item name type size created less modified and we're going to use this effectively to kind of go almost to a degree through some kind of end-to-end examples we're going to start small we're going to add the get metadata activity then we're going to keep adding to that adding to that adding to that to kind of show us a kind of a logical end-to-end process but of course i mean when it comes to design and design choices and options for pipelines and data flows the options are massive there's so many things out there so like i said this is just the introduction so hopefully it'll get you excited or what your beak so to speak on what's going on here and the first thing we're going to get introduced and that's the whole kind of idea behind this get metadata activity is you know the information that it is collecting how do we leverage it down that pipeline and we're going to see this is going to be utilized in many different fashions through the usage of what are called outputs and there's a lot of different parameterizations that can go with it we're talking about output parameters there's generally just pipeline level parameters there's some system level parameters there's a lot actually available here but this will be an example of what we're going to see hey i got to get metadata activity let's go ahead and point it to this file i'd like you to go ahead and give me this information this information and this information and we're going to take that input that output part of me and use it as input for other activities that's going to be the idea in our first case we're going to end up connecting to the data lake that's going to be our source for the get metadata activity we're going to get that information we're going to pass the details of it into a stored procedure which is actually then going to actually update a table that's the idea behind this one so that's what we're going to end up doing as we go through this so that's going to be the idea behind it but what we will do only because of timing i don't want to go too far that break point is we're going to go ahead and take that small that one break we're going to have in here um and then we'll continue afterwards so it's just going to be a quick little 15-minute break this is going to give me the opportunity also so that i can scroll through and see uh the plethora of questions and i'll probably get with mitchell and see what kind of options are out there to kind of look into but we're definitely going to get into that one so that we can see you know what's going on there so let's go ahead i'm just going to put a small little timer on the screen here guys and then we'll return just in case also you may have noticed this as well but there is an option here there's a little qr code um that you can go i think we'll provide the link in the chat as well uh but this is going to give you access if you're interested our on-demand learning platform tons of classes out there we have an intro and an advanced adf class on there too but we have a limited time discount so if you're watching this as a recording you can see it's expiring on june 14th so keep an eye on it take a look like i said we'll take a small break and then we will come on back all right guys small little break now we're gonna get back into the swing of things uh we left ourselves off here we have one pipeline in the bag now we're actually going to go through and create we're going to start one pipeline that's going to end up being built upon built upon built upon so first things first let's just go ahead and create a brand new pipeline so we can go make sure you're in the author tab i'm going to shrink this to give myself a little more real estate i'm going to go under pipelines i'm going to go ahead and hit a new pipeline let's go ahead i'm just going to kind of keep the same naming convention here and just call this demo two we're gonna call this get metadata activity right go ahead and pop that in there and then once you get this set you can see if you want a little more real estate real estate there's a little properties option here we can go ahead and close that out also when we go back here guys um i did notice when we were checking out the chat there actually was quite a few people that were asking about hey you know can i connect to this can i connect to that you know what are my options um real quick before i continue here and i was thinking about mitch and i were talking about it uh don't forget the way we do this is through linked services everything starts with link services that's where you choose what we want to make a connection to so if you once again as a reminder if you go to the manage hub and you go under linked services if you just hit new this scrolling list which has its own little categories and breakdowns these are all the things you can currently connect to some of them are a little more open-ended like you see this one here just says file system if you're looking to create a connection that's pointing to a file inside of your local private network like in a you know network directory or your personal machine this is what you have to choose but don't forget if that's the goal you actually need to have an integration runtime that will allow it but you can see you can go you can scroll through all over the place here and see all of these different choices we obviously can't go through every one of these today but this is for you to know and understand what do i have available at my fingertips to connect to and work with so once again in the manage hub under link services just go ahead and scroll through that list there's a ton waiting for us there we're sticking with sql database and data lake but there are so many more options so we're here we got our pipeline let's go ahead and find that get metadata activity which i do believe is under the general section but there's a lot in this little toolbox you can always type in what you're looking for and it'll filter it down and what you've found what you're looking for you simply drag it into this canvas area i'm going to go ahead and close out the little properties pane give myself some more design space but i mean this is how we work with the pipeline you have the toolbox here on the left of your various activities you have the canvas area here just in the middle that we drag our various activities in which you can have multiple activities in here obviously the more activities you have the longer it'll take for this to complete and as we've talked about that kind of affects building but here is my get metadata and when you select it the options below appear notice i deselect it these are pipeline level properties when i click it these are activity level properties this bottom area will update based off of what we're pointing to so these are the configurations for the get metadata activity step one let's go ahead and make it a name get last modified date right this is what we're going to do we're actually going to point to a file on our data lake specifically if we go visit that data lake if you remember earlier on i had brought into the source location this just txt file which once again if we look at it it's very simple straightforward it's effectively what we had it just uses a different delimiter so we can explore what that looks like but this file in the data lake is what i'm going to connect to for this get metadata activity right so that's what we're going to see in the settings for this activity you can see it's in red it tells me like what has to be configured is the data set and if we think about this i have an az adls pointing to an employee file that we just created that's not what we want and i have one that's pointing over here to this employee table sqldb so technically i don't have a data set that i want to work with there's two ways we can approach this right you can see right here there's a new data set option i could do or you could go in here and create a brand new data set as well which is actually what i'm going to do i'm going to click data sets hit new data set this almost looks identical to the experience for the link service and it is the list of data stores here matches what we have in the link service we're basically saying okay what do you want to connect to and when you make a choice here you need to show us a link service that has a connection to that data store so for us we know it's data lake so as i start to type that in it filters it down i can make my choice and it says all right you're going to point to a file on the data lake what is the format and there's lots of options probably the most popular fun one in here it's probably parquet we're not going to be getting it into this class but if you check out any of our stuff unlike data factory and stuff like that we have a lot of conversation around parque it seems to be a format that's really being kind of pushed there's a lot of performance benefits from using part k so put that in the back of your mind to say oh what's park k that's definitely something to check out we have some content on that as well but here we're going to use that delimited text because that's what the file is set to and i will hit continue and now it's just kind of configuring now the data set like we did before what do you want the name for your data set to be similar naming convention adls underscore you put the name of your storage account possibly but i'm going to put to the name of the file right it was input emp tq that was the uh the file that's right over here so i'm just going to give it that name so it says okay you're you're this is the name of your data set what's your link service which data lake do you want to point to well i've got one that points to the very one i'm interested in and it says all right all right we're looking at the data lake which file are you interested in now it should be noted once again there are ways to actually make this uh there was a question about dynamics we're going to get into dynamic activities like setting parameterization there's so many scopes and level of detail around parameters but you could technically make this also dynamic that gets a little more advanced but we will get into some of the usage of dynamic expressions here in a little bit but i just need to point to my file right i'm going to go in here under source there's my input txt and i say okay that's what i want to point to it populates the necessary information i can say hey you know what i know the first row contains my headers in this because it's a delimited text that's generally how it's always going to be fun thing parquet captures metadata as part of the file and it actually has column headers and data types stored within it another cool fun thing so i can say okay i'm going to go ahead and this is going to be my data set and we can preview it right there's a nice little option right here that says i want to preview the data and when i preview this we can see it actually doesn't look good right there's a couple of things that we need to switch and the thing is our delimiter it's actually not a comma delimited file here which is how it space uh how we've set it technically it's a pipe delimiter just from looking at it we can see that it is pipe delimited so after making this adjustment on the delimiter here based off of what is contained in this file uh making that small change now this is a very this is how it should be it's a readable file now i can consume it here so i say okay i've got my data set all situated let me go ahead and use that data set i just created a z adls input emp and now once i've chosen it it gives me the list of options that we can do right so we can see that there is this section here called field list that's the kind of the key piece of this transform or this activity and there's a bunch of choices now in the interesting part is the options that you'll see present here will actually change it changes depending on the data set if the data set is set to a very specific file you'll see the options you see right in front of you if you point to just a folder there's actually some additional choices that get brought up but we're pointing to a file and as we kind of specified we're going to go with last modified these are the what metadata properties do you want to collect from this file i'm going to do that which is the one we wanted but you know just to have fun also i'm going to grab something like the size of the file and also let's go ahead and grab the item name and you can see there's other choices we could have added in the mix but for the most part this is what we're going to set i'm going to go ahead you don't necessarily have to do this but i'm going to save it and we haven't really talked about this but saving inside of the data factory the term here is called publish the reason why it has the term publish is because you can't tie this to something like github or devops so you could choose which branch you want to publish to more on that check out our azure data services class but i'm going to save this basically in this context by hitting publish i'm saving these assets to my data factory right what's that term save early save often so once this has been published i'm going to go ahead and run this and the beautiful part is i can run this right here kind of inline and say i want to go ahead and debug and when i go ahead and debug it's literally going to be running this um within like inside of this designer area and i get to see the outputs you can see it's automatically changed to output we can kind of see that it went pretty quick took about two seconds and it's done well how do we see the results of this right because all we did was connect to a file and ask for some metadata properties well right here in this little section in the output window you can see there's an input window which there's some details in here nothing too crazy um and then we have i think i might have accidentally just clicked the output we have here right last modified date and you can see the time frame here we have the size 52 bytes and the item name is input txtq a small thing of note uh if you notice the time here it says today's date because you know i uploaded this earlier and you notice it's 15 26 you might have already kind of looked at it and realized 15 26 hold on a second this is going forward um to 3 p.m four hours this is whenever we work with adf with times in this regard it actually does it in utc um so you can kind of kind of list that so so yeah so we have it in here that's just a couple of the metadata options there's much more to choose from but from here we're going to see how we can take advantage of these items right i've got a last modified date i got an item name that'll probably be the most useful size could be something but we're really not going to take advantage of that one we're now going to say okay now i'm getting details from a file let me go ahead and kind of log this information let me put this into a table and make note of it so we're going to go ahead and take the details of this and store it into a table give me the file name and give me the date i'm also going to take advantage of this though and do a little bit of the expression language because there's some you know the whole utc thing is going to cause a little bit of mischief here so we definitely have to get involved with that so get metadata activity this is working it's connecting it's bringing whatever i choose from those options back for me so that i can use it with activities that'll show up later on down the line so make sure it's saved it's good to go we're going to create a third pipeline here let me bring this up which is going to end up using the stored procedure activity right very popular very common we can kick off a stored procedure now there is a newer script task that you can do so if you just want to kind of write out a script manually you absolutely could do it but usually for performance reasonings if you want to bind this logic you're going to store procedure you will get some better performance so in this case we're going to take the output from the get metadata activity feed it to parameters inside of a stored procedure which is then going to insert it into a table right so we can the stored procedures could really be doing anything the point is inside of a pipeline we can invoke a stored procedure we can access its parameters and we can you know input whatever we want to that so it's going to be pretty cool how we can leverage this so of course right now with our fresh database um we got to go ahead and make sure that we have the necessary items in here so once again inside of our resources folder we've got under demo 3 we have a stored procedure and we're going to create a table in here so just like i did before i'm going to add these objects real quick like i have my database i'm still connected and i can say hey i'm going to open up those two resources from the demo three folder let me open them both up we can see that we have our creation of a table azure metadata test file name which we're gonna capture from our output last modified date we're going to capture from our output and then technically there's this record insert date and we'll see we have some system parameters that we can leverage here right so we can go ahead and um create this right it's going to go ahead and run this real quick so have our table there created so that's done that's successful and i'm also going to run this stored procedure which as you can see all it's doing is saying hey i want to insert into that table we just created the file name the last modified date and the record insert but it's all going to be powered by these three parameters which is going to be cool we're going to see it inside of the actual pipeline so let's run it it's created i've got these two objects if we refresh it azure metadata test is now here and you can see there's nothing in it just so we can kind of validate this there's nothing in there right now but when this is all said and done hopefully we will have it in there so let's take a look and we're going to get introduced for the first time into our actual expression language with inside of azure data factory so this will be our first foray into this little realm so what i'm going to do is i'm actually going to build upon this but i'm going to keep my original demo too by itself i'm just going to go ahead and hit the ellipsis next to demo 2 and clone it right just going to make a copy of it basically and i'll relabel this one demo 3 and we're going to call it the stored procedure activity right just as we start adding stuff we're going to go ahead and just put that name there so i'm going to hide my properties give me some more real estate we know this is working it's doing its thing it's bringing back that information so we're going to leave that configured as is technically there's some extra stuff size is something we're not going to use but i'll just leave it as is and we're going to bring in our stored procedure there we go and you can bring it in now here's an important thing right we talked about this being an orchestration tool this is how we can set things up to either execute in parallel or execute sequentially from those coming from an ssis background you might remember things like precedent constraints we don't have a similar thing like press and constrain here but still we can see there's these little tails i like to call them the little tails at the end of these pipelines or part of me activities which you can actually have it be for either success failure completion or skip so you can actually change what this is going to be so if i want something a pipeline activity to only run if an activity fails i can do that i can create a failure branch and just connect it so technically this sort procedure would only run in this case if that get metadata activity failed as you can imagine when kind of thinking about the implications of this we can kind of create some degrees of like notification framework we don't cover that in this class i actually do believe there are some videos on our youtube channel that talk about this and we absolutely talk about basically using logic apps or power automate to accomplish this so definitely things to look forward to taking advantage of the odl discount we have right now but in this case i don't neces i don't want this to execute in parallel right that kind of i want the output i want what we are discovering from this activity to flow into the stored procedure so i do need to connect this right which i'm going to do right here and this is going to give me an opportunity right and i'll showcase this there is a section for the expression engine and you can take advantage of output parameters if you don't have something like this connected if it's in parallel that will not appear so let's go ahead and just give it a name right first i'm going to go over to my name property the name of this stored procedure was insert last modified so i'm going to just give it the name what what store procedure is this running it's running my usp insert last modified literally it's just the name of the actual stored procedure and then we have to go to the settings as you can see there's a two there there's two things i need to figure out and it's asking me okay well what is the link service interesting enough it's not asking for the data set and that makes sense we just need to have a link service which points to the sql server and database which contains the stored procedure i'm interested in so we choose the link service we have to the database and we're going to see that we have our insert last modified date right there so that's beautiful we have it there i'm going to scroll this up a little bit so we have this room and you can see there's really not much to do here this literally would run it but remember this stored procedure has three actual input parameters and if i hit import it actually reaches out to the stored procedure and brings us back the three parameters that are part of this definition there is a file name there's a modified date and we can see there's a record insert date and it brought it all back with the respective data types so now we are left to populate the information that we are going to have written to this table and two out of the three things we know are coming from the get metadata activity right so here is where we can make things interesting if you just start to type in the value literally that's static right oh input uh emp whatever it was right we could just type it in here and sure that'll get written but we want to make this more dynamic right we we're searching for something using getting metadata activity give me whatever that name is so i go over to this add dynamic content and this introduces us to effectively kind of like the expression language that exists here and you can see that we have quite a few options right system variables there's quite a few things functions there's quite a few things and then of course we have the activity outputs which is open by default and you can see that what happens here is this is pointing to the name of the preceding activity if i had a third activity i would actually see outputs from both of those but it's only because i've connected this as you can imagine if i did not connect the source procedure i wouldn't have any prior output so we can see here that i'm listing and we're talking about the file name property so i would like from the get metadata last modified data activity the item name that is the name of the property that contains this we saw this in the previous example so all i need to do is point to this it populates the code in question and i can hit ok and it gets put right here so take whatever it's coming from get less the metadata which remember right here what was it item name is right here so it's going to give me that value within this execution that is what this is achieving but it's doing it dynamically the next one seems pretty straightforward right that we could go ahead and say oh we need to put the modified date right that was what we were collecting if you remember it was called last modified if we look at that property again right we can see it was at the top here last modified so that's what we want from the output right well we do recall that the get metadata is effectively bringing this into utc so what we can do here is do a little bit of logic by default we would think okay let's go in here i'm going to click on this go to dynamic content and of course we're going to see get met get metadata last modified date and we want last modified so it's right here so technically we pop this in and it is going to bring that back as is but remember that was like 15 something it was four hours ahead so the fact that i know that i could add um some additional logic in here right we could go ahead and do an add there's actually multiple things we could do here that we could address this under functions you can see there's some date related functions you can add to time add to hours you can convert to utc and stuff like that in our case what we would want to do is basically make sure that we i'm going to do an add hours on here what do we want to add hours to i'm going to go ahead and this is getting a bit more actually i'm going to save this because it gets more involved since this is our first kind of interaction with the formula language let's just stick to what we know so we can see those results that i'm talking about i'm going to go ahead and get last modified date but once again we know that this is going to put in that you know i think it was 15 you know using the military time the 15 29 or whatever when realistically it should have been um 11 29 or whatever it was whenever we added that file right this guy right here 1126 but it's giving us 326. so we'll leave that is for now and we'll see that in a second we write it to the table but then also there's a record inserted date so this one technically we don't really have this but i'll show you a little bit of what we can do here because we do have hey in the dynamic content if we look under functions under date there is like a you know if you've ever worked with like today or now functions in excel we do have a utc now property right so it's right there we can do this but obviously that's going to return the time and utc so the fun part is we can see there is an option here called convert from utc so i'm going to say convert from utc okay what do you want to convert from utc the utc now and the the only other argument here is simply to state all right what how like your this is utc you want to convert it from utc to what and there's quite a few reserved keywords here there's custom formats but since i'm in the east coast i'm going to go with eastern standard time right i'm just going to go ahead and set that up to be convert from utc utc now eastern standard time right that is the logic we're going to use here and we now have the three parameters for the stored procedure using dynamic content that's going to be coming from well two-thirds of it's gonna come from get metadata and the last one i'm actually gonna be using kind of a built-in system function utc now and just converting that right so that's what we're going to be leveraging here so technically i could save this right we didn't create any new data sets or anything like that it's just the pipeline so the idea here is we should be able to run this we already know the get metadata is going to work it's all about are we receiving that information correctly into the stored procedure and then being able to write it to the actual table so let's go ahead and hit debug let's let it do its thing right we'll give it a sec to kind of cue up and start running sometimes it'll take a moment but we saw the last one went pretty quickly just kind of provisioning those resources there we go you can see this switched over here it's now cued i've noticed this little refresh button hasn't been behaving as nicely for me switched from q to in progress let me just check real quick here oh you know you got to kind of spell this right you see eastern stand that's not going to work you got to put a valid format easter eastern standard time so you can see that's a good example though we didn't get to see that yet what does error error logging or air information look like inside of pipelines where we are extracting the output stuff which is absolutely very helpful in the same kind of degree if there's an error naturally you're not going to get your standard output but you can see here it gives me the error name the function utc was you didn't have a valid value passed through and we got to see what the problem was so after making that adjustment i'm going to go ahead and rerun this debug it should kind of cue and process relatively quickly here and let's see here there we go so we've got metadata successful the stored procedure is processing took about three seconds the one thing to note is when dealing with a stored procedure you actually don't get really any output that you can use past this so if we had more activities you don't get really anything too useful you can see there's some billing information it tells you like what was the duration of the execution what was the run time some small things but you don't really get a lot of stuff you can leverage for that and that's one of the common things that's known for stored procedures in pipelines is you don't really get outputs so if you're going to be using stuff like stored procedure and you want to kind of look or acquire the results of what that did you would have to use other pipelines such as like a lookup activity to kind of go to the table that you just wrote against to maybe return some results if that's what you're looking to do so do remember just kind of keep in mind that stored procedures don't really have an output we can leverage but let's go to our table right the idea is if we go over here and we run this we have our file name which we know that's correct our last modified date we did take the utc as is to make this adjustment we would have to do a bit more but what i'll do is moving forward we will advance what we're writing inside of this um this these arguments so we'll add to that you can see our inserted date though did take the time that is right now that whole convert from utc if we hadn't done it that would have been you know the 1701 type of deal but this one did convert it that's going to be good that's exactly what we're looking for so you're seeing now we're starting to use outputs we're getting through the process we're kind of getting this situated so we're going to continue and build this out the end goal of this which we're going to have a small little imagery is hey let's take a look at the last modified date of a file then let's look at maybe some value coming from a table we're going to see it's called the control table and let's compare these dates together right let's take a look and see which one is soon so if the file in the data lake has a newer modified date that means something's changed and updated since the last time we loaded it let's go ahead and process that copy data activity right that's going to be the overall kind of idea behind this we we've got our last modified date but obviously we don't have really something to compare it to from a control table which is what we're going to do next but this is so far doing what it needs to do i'm going to go ahead and publish this and save it and we're going to make some adjustments here and bringing in another activity there's so many that we can tinker with so right now though we're seeing what are the results of our get meta activity and writing that in so now we're going to kind of replace for the moment that stored procedure activity with actually a lookup activity right i was just describing how the store proc doesn't really have an output but a lookup activity is something that we can go we can connect to and this is that general design that we're going to leverage right the we're going to have a lookup activity that gets the last load date they say the last time that we used a copy activity or something like that we're going to then use the get meta activity to get the last modified date and the idea is we're going to compare them and depending on if one is greater than the other we're going to do some sort of actions you can see this is going to lead to an if condition that'll probably be the last one that we get into before we swap over to data flows so that is the design pattern that we're going to leverage right the lookup activity is going to allow us to pull a data set there's definitely a couple of choices and options on how we can do this you can either have it return a single value so literally a checkbook that says hey just return me the first row depending on the nature of your query that may or may not make sense or you can return an entire array or object so that'll change the actual code that you have to write to bring back these results because remember lookup activity is going to bring back results and then we need to leverage it that's the idea behind it so that's what we're gonna do we're gonna further augment this pipeline by create we're gonna clone it and we're gonna add more um and we're gonna go ahead and bring in the lookup and we're going to see how we can kind of make this work we could choose potentially to make this hard-coded or we could run it in parallel there's kind of really two directions we can take here but let's kind of follow what we're talking about right let's go ahead and we'll set this up so that um we're going to create a couple of assets here so let me go ahead and clone this right we're going to call this demo4 we're going to switch this over to lookup activity all right just like that i can kind of knock off the third one here and let me hide the property so i have a little more real estate and like i said i'm actually going to get rid of the stored procedure here but once again i do need to add some assets to my actual sql server here demo4 we're going to create a control table we're going to have a last load date and an execution date so i'm going to kind of write all these into my sql database really quickly so just like we did before just kind of copy all this and just run it so i'm going to go over here we're going to go ahead and open up a new query and it's going to be under demo4 and we can see right here we're creating a table called control table source is just going to contain the name of a file so in our case we're going to literally put in the file that we're looking at because this is the same name of what we're looking at the data lake so we're going to say hey i've got a source item here called input you know tqtxd the last time that it was executed was you can see i just have a date pretty far in the past in 2021 but this is going to create this table with this actual entry in there right so effectively we're going to be bringing back this and comparing that to the last modified date from our file so very clearly the last modified date is going to be greater than right that's going to obviously be the newer of the dates the more the closer of the dates um so that'll obviously when we get to if conditions say yes it's greater than so that's one let's make sure it's in there right there's our control table so i can close out of that we have a stored procedure here that's we're going to see it says hey i want you to grab the execution date from the control table where the source equals and then you can see we're passing a parameter in here right so we're going to have to kind of parameterize that but this is what we're going to end up using for our lookup transform so that's written that's good and then lastly we're going to have this item here which is going to basically you know we we kind of have to go through once we get this set up eventually we need to update that control table right that execution date if that doesn't get updated then we're not that that kind of logical condition of hey is the lookup is the last modified date greater than the execution date if we don't update that table then that condition is never going to change right so we're going to have a start procedure which is going to help us kind of update that so we can make sure this logic is being contained so three objects in there three objects created so now that they're there i've gone ahead and we can adjust this get metadata staying the same we're not making any changes in this realm but we are going to bring in a lookup right so the lookup itself is going to end up running a stored procedure so we're going to start on that front so i'm going to go find my lookup we can see that here bring that in connect the dots technically so i modified the sort procedure so it can use a parameter so we need to leverage the output from the get metadata activity to obviously feed into this actual lookup if i wanted to i technically could have just hard coded this where we only have one record in that table i could have ended up running this in parallel as well but we have our lookup we might as well kind of give it a name we can just call it lookup last execution date last execution dates from our control table and then we're going to leverage the stored procedure that we leverage so as always it's asking us for the source data sets technically we have this one here and you're like wait that's pointing to the employee table but don't forget when we leverage this in the activities we can choose do i want to use the table or do i just want to use a stored procedure right when i leverage the stored procedure this is going to make life a little bit easier and i can say i just want to run this toward proc so i'm effectively ignoring the whole pointing to the table and in our case our stored procedure is last load date right there and if we import the values in here we know we have one item that's called name and we just need to give it a value and as before we're just going to take advantage of an output coming from the get metadata activity hopefully you're starting to see that using these outputs as inputs is what it's all about but i can go ahead and say hey i need the item name from the get metadata activity that's what's going to power that's what's going to populate a value for this parameter so the idea here is we need to return back so at the end of the day we're going to have two values that are going to be returned here there's still a little bit of a question mark right because we know that the get meta data activity is bringing back utc what is the lookup going to return if we carefully look at the table itself right let's take a look we can see that it's 6 13 basically i just put it at 12 o'clock right right on the time i made it nice and even so is that going to return 1600 right the whole utc thing is it is it being impacted in the same way right you have some guesses if you like i'm going to go ahead and publish this and let's go ahead and run this we're going to watch we know the get metadata activity it's going to go it's going to execute it's going to be successful that's run multiple times the question here is what's going on with the lookup and once again let's remind ourselves even though it should say 1126 we know we get 1526 so we're going to have to deal with that because if we're comparing that you know we know everything's in eastern because if i look what is the value from my lookup 12 it didn't adjust this at all this didn't go four or four hours anything like that so like that's a bit of an issue we're comparing something that's being stored in a sql table and i'm storing it as eastern but the last modify which is showing as eastern inside of the ui for the data lake is coming as utc in here so there's a four hour difference which is going to cause some oddities if we're comparing it right it's just not going to be the right type of comparison that we're needing we need to kind of take advantage of the details of this so that is going to have to happen here in the next activity because remember it's we we technically now have two dates and two times now we're gonna have to compare them and this is where you can get some standard kind of logic known as an if condition i think this one's pretty fun i think this is very widely utilized so that's what we're going to modify for this now right the if condition is going to be this middle piece i've got two times let me take advantage of those and do a comparison and if the last modified is greater than the last execution date then i want to do basically it is basically has a true or false section if it's true then do these activities if it's false then do these activities that's the idea and the idea would be is if the last modified date right if that is greater than the execution date that means there's new data in there let's go ahead and process a copy activity and actually move the data from the data lake let's say to a table in the sql database well if the execution date is actually greater than so in this case if it evaluated to false if the last modified is not greater than the last execution date that means nothing's changed in that file so we don't need to run any sort of copy activity that would be redundant right so that's the idea behind this right we're going to use that if condition which is going to give us that capability of performing specific activities should something evaluate from either true or false that's we're going to get into so i'm going to go ahead clone this right do kind of the same thing we've been doing i'm going to say this is 5 and we will go ahead and call this one seconds let's go ahead and call this our if condition activity take its if condition activity all right and we're going to keep it as is but what we're going to bring is a new activity into the mix right and in this case we are going to have to take advantage of a little bit of extra work here because we are going to have to take advantage and make sure we're doing that conversion right so yeah so we need to make that type of change so let's go ahead and bring in our if bring that in here we're going to connect it you can already see there's kind of multiple parts to this there's the actual if condition itself and then what do we want to do if it's true then what do we want to do if it's false so let's go into the if condition if we want we can kind of re-label this if we'd like we can go ahead and give this uh what did i end up calling it so we could just say if condition um test dates i'm just going to put it in there and then we're left with our activity and you can see this is the expression this has to be something set so that it will evaluate to either true or false that's the idea behind it so when we go through here we can go ahead and open the dynamic contents and we want to do and we're going to leverage a function here that's going to be called greater oops greater or equals and you can see returns true if the first argument is greater than or equal to the seconds right notes and it gives you a little and you know note values can only be of integer float or string so it gives us this example right so that's we're going to end up using but there's a bit more that's going to go into this mix so i'm going to kind of go to the next line of code because we know what we want to test is is the last modified date greater than or equal to the execution date right that's the idea behind it so we have to kind of take a gender so before i get into writing this we also want to make sure we look at the last one let's look at this again it is first row dot execution date we got to remember what the output is now the interesting part is this potentially could be done you can see if i wanted to this is only because i did a first row if you unchecked it if you were bringing back multiples it actually brings back an array but we're going to keep it here and i'm going to keep this as a reference so we can come back and remind ourselves how we end up writing this right first row dot execution date so let's keep that there okay now we have that in mind so we know we want to use greater than or equal so i go find that expression i'm going to move to the next line so we know the first argument is i'm going to use my last modified date technically speaking i actually have to go through and we're gonna go ahead and set it so i have four hours reduced from it right i want to take four hours away from what i've written here so to write this i'm actually gonna go and leverage the add hours right right here and you can do this in negative there's add time at day at hours i'm going to go ahead and add hours here and then what do we want to add hours to well that of course is going to be for our if we scroll down it's going to be our last modified date right here right well there is a small nuance here um and the trick here is we need to make sure that this is formatted in the appropriate it has to be a date time and the way it's actually coming in is actually in a string so what i'm going to do is actually go in and find there's a format date option so you can see is as you start to kind of want to elaborate and get this set there's quite a few things that you might have to kind of set up here so i actually add in a format date time okay well what do we want to format this is where i can now choose my activity in question that we talked about and i'll put this entire code on the screen let me actually i'm going to bring it to the next line i think it'll be easier to read that way format date time what do we want to format we want to format this how do we want to format this you can see it's kind of tricky sometimes lining this up actually i need that is my format date time gotta match up the items so yeah it's right after this there's multiple options we can do here this is just kind of doing a standard date time format there's long format there's long short like long date format there's a ton i highly recommend checking out the format date time there is a ton of predefined options in here this is the one we're going to utilize for this option and then let me actually go here because now we're in the add hours so i'm going to say get four hours less right that's one and then technically if i go to the next line because it's getting a little crowded over here let me go back we need to make sure we wrap up the actual greater or equals so we can match up we have this parentheses is matched here perfect so yes we just need to go ahead and make sure that we have our second argument right we have our last modified date but now we're comparing it to the item that's coming from our lookup which we should see that actually right we have our lookup last execution date we can see that right here value dot and then we can put this here as what was it first i've uh i gotta remember um first name i'm gonna put this in here for now and hit okay because i'm gonna go revisits because you need to be very specific right what was that one again firstrow.executiondate right that's what i need so i can go in here let me fix that first when you start dealing with arrays unfortunately it doesn't have the whole thing so you notice like when we have our standard kind of json outputs we get those extra options but in here since it's an array it actually doesn't bring that back for us execution date so we just need to be very specific so we're doing a test right to compare if our last modified date which we've now formatted and make sure we go back four hours which in our case it doesn't matter right even if we go back 100 hours it's definitely going to be the greater value than our what's in our control table right that's going to be clear and evident i think our execution date was back in 2021 or something right 2021 but we still need to make sure that comparison is occurring correctly here so we're doing a condition and we know in this case it should evaluate to true so and just in case also i said if i wanted to a big thing deals with the first row settings um if you once again if you were to uncheck this you actually can specify specific items in the array so we'll leave it as is i'm going to go ahead and just for a test because we have to configure right it's going to return true or false what do we want to do if it's true or if it's false so i'm going to go if it's true this is like a small thing we do for quick testing i'm just going to run a wait activity all this literally does is wait one second by default that's all it does but i can give it a name and i'm going to call this true and then if i go over to the false condition i'm just going to go ahead and put in here false so as long as all that code is done correctly this should compare the times and what we should see is the true activity runs and the false one will not let's go ahead and save this i'm going to go ahead and run this and we'll see there was there's a lot of extra code that got put in there inside of our if condition so when you start starting to really nest stuff in and kind of work with conversions and stuff like that sometimes it can get get tricky right and you can see here there's probably something wrong with that i think i might have typed it in correctly that's what it was yep so we just need to make a change there was a little bit of extra hung out here right in the i believe it should just be like that output.firstrow.executiondate the value was technically the entire object so i need to be more specific of what object i'm looking for so i looked at it and i didn't think it was supposed to be there i didn't pay attention the error message let me know though hey you're pointing to something that's not going to work for the sake of comparing it i was basically trying to compare an object or an array against an actual single value so you can see it's working now because i fixed that and what we should see is only you can see only the true condition passed we were comparing you know whatever the well we changed it to 11 26 of today compared to whenever the heck it was from last year so obviously true is going to be the case but naturally it's at this juncture that we would want to kind of elaborate and add to this logic right the weights those are just placeholders right that's all we had that there just to test this out this is where we want to go through the process and actually put some real stuff to be accomplished inside of this right so we can choose if we wanted to to actually like use a copy activity and say hey let's go ahead and point to that very same file that's been modified and let's go ahead and write it out to a table or something of that nature we could actually end up having that be the process but we've already used a copy activity so we don't need to do that again but something we definitely do want to do and technically let me see here we have the employee table let me see real quick yeah we have that employee table so like i said we've already done that copy activity so the main thing is we got to make sure that we update our table so we need to use that stored procedure and make sure that the value in this control table gets updated with whatever time stamp this gets run so that our execution date will be updated the next time we run it it should return a false condition right that should be the idea so i'm going to go ahead bring in that stored procedure i know we've done this one before we're going to go ahead and run the stored procedure that we added in that was just called usp update if it lets me type come on there we go usp underscore update execution update and we configure it right i just need a link service which points to the respective database it's the same we'll be using we need to find the update last execution date boom and do we have here we just need okay one parameter what do we what are we gonna update the execution date for right so we only have the one record in there so it's going to update this for us there's really two options now i know i've talked about in showcase you utc now technically we could use utc now right under the date functions we've already seen utc now but something of consideration here is that you may have quite a few activities occurring you can see right now we're only at three activities deep but you can imagine there could be many more activities you technically could be calling a second pipeline and you have it set to wait until that pipeline completes to continue down whatever logic you have here so technically when this pipeline begins compared to when this activity actually runs and this stored procedure actually runs you could be talking about minutes um you know 30 minutes an hour there could be a long distance in time so if we're making and doing all these adjustments if we were to use utc now that would take it at the moment that the stored procedure activity ran but if there's a bunch of logic happening in before this there's a chance that there could be this gap right where we update the time frame to something happens right now but what if we were making adjustments to that file right we could create this kind of odd gap in here where we could effectively miss out when an update occurs to that last modified file if it got modified during the execution of this we would want that to basically be captured so we run it again and the next time we checked on this but if we wait to this point to use utc now that could be a problem the reason why i mentioned is there's other things too under system variables technically we have another option in here that's just simply called the pipeline right so pipeline uh there should be a pipeline trigger time when the actual pipeline starts running so that's what we're gonna end up leveraging right inside of this item and i make sure do i only have i should only have one cool i only have the one item here so it's perfect i'm going to go ahead and point to that very system variable pipeline trigger time that's it nothing crazy in here now granted let's not forget that is going to be in utc we kind of did utc so i'd want to do the same thing right i'd want to go ahead and find the whole function where it's going to convert from utc we want to convert from utc what do you want to convert from utc the pipeline trigger time and then what do you want to convert this to and this time around hopefully we spell it correctly eastern standard time literally you can put pacific i mean all the time zones are there like i said if you go and check out the actual convert from utc it talks about the available custom and standard options you have here there's actually a ton going on in this realm so technically speaking this is now set now if we run this once right if we go back over here and we execute this we know what's going to happen it's going to execute and result as true we haven't changed or updated anything yet so this should run through be successful it'll this time around actually execute the stored procedure give it a second here that's doing that you can see it is running the stored procedure it is not using anything in the false category we'll give it a second to let that complete maybe maybe successful so to make sure this is successful i can run over here and see 6 9 3 29 right now right it's got it right as it is at this moment so technically speaking this is now going to be greater than the last modified so if we run this again we should see now that this is going to execute and go down the false path and if it goes down true that means that we have some issue with that conversion the add hours which was bringing it back four hours that means there's an issue with that we might have to look at that once again um pretty sure i put minus four i'm thinking about i wonder if i just put it as four no it went down the false condition so i did write it correctly so this is kind of a quick look at something that's actually pretty impactful when you're trying to do some like audit logs or control tables or there's other things also known as task queue tables this is how we can kind of create a degree of control here on hey how often do i want to run this i have this pipeline it will move data from a to b but i only want to do it when a has actually been updated so this idea of leveraging control tables and kind of comparing those time frames this can help kind of impact overall it's going to be cost right um yeah so in this case that's how we're doing it and just in case utc right now is four hours ahead so that's why i went four hours back so that's that's just for me that's the difference right now with our time zone of course you kind of have to adjust it for others um you know you gotta consider daylight savings isn't playing stuff like that but utc right now is a plus four to you to east uh but yeah so you can start to kind of think about how we could put these things into play um control tables it's all about using these outputs you got to see how we can use these outputs and you can go even further into this realm of parameterization right this little kind of expressions window there is quite a bit we can do in here i mean there's many many different categories many many different sections in here that we could dive into we could explore we could look at i mean just look at all the categories so definitely a ton of room to grow here five quick activities kind of moving through seeing how we can start to make this and you're hopefully starting to see overall kind of this is a kind of a design pattern that you could implement yourself so that right there is what i have slated for right now for our pipeline experience right um there's once again let's not forget pipelines orchestration and data movement we focused here on data movement but as looking through that toolbox you could once again run data bricks run other pipelines run data flows technically when we look here at data flows we're going to go through one data flow kind of beginning to end how do we run this data flow it's going to be through a pipeline right so a little bit of a different ui we still use data sets in this process but this time we actually have some choices of actual transformations that we can accomplish here right so you can actually make some transforms so you can see similar looking display but there is no toolbox there's no activities you basically work with this little plus symbol here to choose what is the next object what's the next transformation i like to do am i going to go to sync am i going to do another transform you need to have at least one source and one destination or one sync that's what kind of makes a data flow valid you can't save a data flow until you have it that way but there's a ton and actually this oh i have it on the next slide never mind i included this last minute because i know we briefly talked about parquet earlier we're not going to use it for this example but once again to iterate there's been just such a growth in popularity with this file format since we've been working and talking about the data lake so much we've been using delimited text for our example but when you start getting into using pipelines and data flows you're bound to run into kind of this conversation of hey if you're moving in data from the data lake to sql pools you very well want to consider parquet it is a kind of column oriented data storage it is in a compressed state and it captures metadata it actually stores column headers as well as data types so it's a very efficient storage method and you get some very pretty huge performance gains when using park k so um definitely something to check into now here's something interesting remember earlier when i reminded you how you could go over to the manage hub you can check out hit link service and you had this nice long sprawling list of data stores 90 plus connections right it was beautiful well when it comes to direct connections valid or compatible data sets let's say for your data flow it actually changes this is pretty much you can kind of see here to a degree in the bottom right corner that this is actually grayed out we'll see it for ourselves in the data flow but you don't have the way an ability to directly connect to all of those data sources it's a fraction of it literally what you see in front of you are the valid data stores to connect to your data fact your data flows that's it once again this is why we talked about pipelines that's the data movement tool if i want to connect to amazon s3 which is not in this list making sure i'm not making that up it's not in this list i would need to use a pipeline first connect and get the data that i'm interested in move it into one of these options that's available to me and then i can leverage a data flow so it's just a small consideration you actually have a smaller selection of data stores that are compatible with data flows so we're going to see that for ourselves you have quite a few transformations some of these terms are going to feel a little bit familiar if you're coming from an ssis background and we do have our own expression engine we do have our own expression builder here it's different than what we have in pipelines this one obviously is driven more from a transformative and an aggregation perspective so it's a bit more robust but quite often you're going to be taking advantage of this right so also i put in the screenshot when we choose our destination we are limited to that smaller sub selection of choices so it's a smaller set of data stores that we have to connect with so we do have to kind of keep that in mind so this is the last example we're going to run through that will bring us to the end of today which we're going to create a source and we're going to see we have a couple of files that we're going to upload we're going to go ahead and kind of combine that with a second source so we're going to do a lookup and we're going to see what that's about we're going to go ahead and use a derived column so that we can add a new column which adds a 10 to the weight so we can account for a padded weight amount and we're going to see our data set contains quite a few products that don't have a list price we're going to filter those out and then we're going to sort it and of course write it out to a file into the data lake so that's going to be the grand scheme of this scenario um since we're going to be working through this one and really putting our heads down and focusing on getting through this data flow demonstration i did want to put a small reminder because i know we're going to get cutting it close to the wire here guys reminder one this is going to be recorded it's going to be put up on the channel so you can take your time you can pause it you're going to be able to go through all these examples for yourself because it is included inside of the download and don't forget within the more descriptions as well there is a certificate of completion that you can kind of go through for spending your time with us this morning afternoon whatever it might be for you so with that being said let's go ahead and knock out the remainder of what we have here for our data flow so first and foremost there is a pretty fun and cool feature that i want to kind of showcase and it kind of illustrates into how data flows actually execute so just over here i'm going to start by creating a brand new data flow you can see i have a name here we can go ahead and just call this something like data flow demo right and i'll close it so i have more surface area and then it just has a little button that says source but right above it there's this interesting option remember in the pi plans we had a debug option this has the word debug in it but it's dataflow debug so what does this do interesting enough when i click on it a little blade appears and it says you want to turn on dataflow debug okay tell us an integration runtime that you would like us to utilize effectively to manage this resource well what resources do you speak of what happens here with data flows it's actually provision you can see it says here getting the cluster ready when we do this it's going to leverage and create a spark cluster the way that data flows actually executes is that when you run this like when we look at it in an unattended fashion is that it actually will translate whatever it is that we're doing like whatever shows up in this user interface this design area we're going to use a ui we're going to define all these different things it actually translates that into something called scala and it actually has to execute this on a spark cluster so it's built differently than actually pipelines but by turning on dataflow debug we get a really fun interesting user interface experience and you're going to see that as soon as we get this started but first is first i like to and if we see if i click add source and we hit new it's the data store but it looks a little different check this out it goes it goes it goes and all of a sudden all these things start getting grayed out these are not supported inside directly connecting with dataflows this is what i was mentioning now this does change right i have seen new items being added all the time for this process so this happens quite often right so definitely something to consider definitely something to check on just because there's always and continuously changes being made now for us our sources that we're going to use for this are right here this product model lookup and this products all so i need to go ahead and make sure that i have this in my data lake so like i've done before i'm just going to go over here into my source folder which i think is right here actually i think i was just in it and i'm going to go ahead and just uh upload upload those very two items that i have there in the resources demo six and it's going to put them both inside of this location so i can leverage it and create some data sources against some data sets against it let's go ahead and create the data sets right first is first let's connect to that products all that is a list of you can see it's going to have like 23 columns in there it's all of our products the one thing it's missing is the product model description which will be what we do second so first and foremost new data sets i'm going to add that right here we are going to point to our data lake once again this is going to be a delimited text right and then let's go ahead and give it uh you know that naming convention treatment we've been doing azure adls and this is going to be pointing to that products all file products all which data lake do you want to connect to so as you can imagine you can have multiple data lakes in there which file do you want to point to i'm going to go and navigate right to that location where i just placed products all first raw is headers now remember when we bring this in here i always like to go ahead and preview this how is it looking not great this is another scenario where our column delimiter is not a comma like it's using by default instead it's actually a pipe so i want to go ahead and make sure that i specify this as being the pipe and once i go here if i were to go ahead and preview this looking much much better the one thing that you won't want to do as well is this schema is going to have an impact on us there's a lot there's a deeper conversation to be had here that would be for a different day but i'm going to go ahead and import this schema from the connection store now that i've defined it correctly because it was using the whole comma delimiter um uh the pipe we want it so it looks good and now the preview looks good right boom yes data set one data set two right need to go over here and this is going to go ahead and be our product models so we can open this guy up we're going to do another azure data lake so i'm going to search that up it's going to be delimited text and this time it's a z underscore adls and our file there was called i think product model lookup so i'm just going to go product model here it's in the same data lake and i'm going to point to that file there we go i think this one is comma delimited we'll find out here in a second right i'll do a quick little preview pretty sure it is though and it is you can see it's a smaller file it really just contains product model id and the actual model name so that'll give us a little more a transparency like over here if we kind of look at product you can see this gives us our product id the product name but like the model name we don't have it so this is going to add a little extra detail this is not uncommon when you think about dimensions and stuff like that so two data sets ready to go now i can take advantage of them inside of this actual data flow so let's rename our first source let's call this products all right and it's going to ask us of course what data set and i can go ahead and point to let's point to product all there we go we've already tested that you can see we have a schema here 23 columns now remember this data flow debug that i turned on effectively there's a spark session in the background we're using features like projection which not to go too deep into that that's extremely helpful when you don't have a schema you can actually kind of query that bring a schema back but also things like data preview this is only available if you have this turned on now there's not much going on here we're literally previewing the data we did that in the data set but the cool part is data preview is now going to be available for every single transform that we go through right every single one so we're going to be able to see what the efforts are of our transformations so i'm going to let that bring that back let's go ahead and bring in our second source product models we have our data set for that where's uh they're having a tough time here there we go boom all right um if we wanted to we could preview it but that's going to be perfectly fine all we need to do now is find a way to combine these right we've got 23 columns here i got two columns in here i can define a relationship between product model id wherever that one's hiding it's there's a column in here these are all of my columns pretty sure this is it right here product model id you can see some of them have a product model some do not don't forget though that in our description we wanted to maintain all the products we didn't want to lose any even if they did not have any model description so that's going to play a big role in what we're going to decide next right very critical on what we make a choice here it's actually just going to find those matches so it's actually going to take advantage of that for us so uh let's go ahead there are other ways we could do that too you could do a join here right a join would provide us what we're looking for we could do a lookup here a lookup is only gonna do matches um and you know what i'm gonna change it let's actually go and show the lookup right we're going to do a lookup here this is going to combine data sets but basically this is going to bring in and give me the models right in here so the i item in here the setup here right what is the primary stream products all okay what is the lookup stream what are we going to look up against product models notice how it kind of connects this now and what do we want to do you can match multiple there's a lot of different options but basically we need to say okay from the left table from the products all table what is the column that we're considering what do we want to look for matches on and there should be a product model id when that equals product model id on the other table so you can see both of those matching now there is something of question if you notice too though it tells me product model id is a string data type product mode is a string a type that's interesting i would have figured this is supposed to be numeric effectively so here's a cool thing that we can do when you're actually looking at these data sources these are coming from delimited text so one of the small setbacks is data types the perfect thing is you can actually have it automatically detect data types for you it samples the data and this is also because of dataflow debug and it will bring back what it considers the right data types but we can also make adjustments notice everything with string here things have been adjusted right one of the things we're definitely going to need to adjust because we're going to mess with it here in a moment you see weight weight is currently a string value we're going to go ahead and switch this into an integer right so weight's an integer you can see there's some date times down here the cell start date cell end date discontinued date i'm actually going to turn those into dates time stamps if it lets me all right and you can see you can make these adjustments i'm modifying this schema in line right and let's make sure product model id technically these are going to be integer values um i'll leave it as is but we have to make sure they're the same so it's string here it's definitely going to be string over here let's make sure it is it is so making that adjustment and here's the cool part i can go ahead right now notice i went from 23 columns to 25 it makes sense 23 plus 2. if you're used to doing lookups and let's say sql you might be like wait a second you know the product model id that was just something we used for the lookup i don't need product model id from you know from i just need one product model id right that was the whole the whole point of it and you'll notice i mean if we look right here there's a product model id and then there's also a product model id so there's a bit of duplication and also we see there's something just called name and name that's going to be a little bit of an issue but that's okay because we can fix this right now we're going to adjust it first off we don't need the second product product model id coming from this table we can get rid of it right i don't need that anymore so i can go in here and say you know what i'm going to use this select transform right choose columns to flow to the next stream you can just literally select you know i'm getting 20 and notice it went straight down to 23 why is that well if i scroll down here and you notice it's automatically skipping duplicate input and skipping duplicate output names right i'm going to say here remove duplicates so we're doing here columns and technically i just need to go find one products all gives me the data set name is called name if we scroll down though here's products all product model id that'll stay notice down here products models name so this is the second name that's okay i'm gonna update this instead of name name which is gonna be super weird i'm gonna call it model name and technically see this product model id gone out of there so now we are down to 24 columns um i think technically we can kind of move this also i can kind of move this up the line if i want this to be like a little bit closer up here product number all right so now we have name and model name right so preview this after getting rid of that extra column here we're going to see that we have this all situated any moment and you can see product name model name now granted some of these are missing model names that's okay we're about to kind of shorten up that list we're going to go ahead and go and remove the items that have no list price right that was one of the criteria so i'm going to go ahead and say remove no list price so as you can imagine we are once again modifying the schema here well typically we're modifying the rows so i need to go ahead and find the filter transform here we'll kind of label it accordingly we're going to call it remove no we're going to no list price something like that list price get rid of anything that has no list price and you'll see that the configuration for some of these like this one in particular all it does is say open you got to write an expression so this is where we're going to see that expression builder here where we can say hey we want to go in and set this up so that the and like we can see if i well we can't preview this yet because we have to have an expression but we if we go back we have a bunch of items that have zero there's it's literally blanks for the list price so i'm gonna do they're actually zero values we're gonna go ahead and write an expression and there are tons of expressions here guys if we go over to functions alone i mean just start scrolling there is so much going in here this is literally all of the functions if you want to get into the aggregate realm you can check for sum you can see there's like some some distinct some if some distinct if any type of aggregation sum average min max's they're all going to be available here and they have some great conditional logic that's baked into it right for us we don't need to do a sum in this case we literally want to test a column and only rows that basically pass this test are going to go further on down the stream so we literally would say and in this case we're going to use one here that simply is called not equals and it kind of tells you comparison comparison not equals operator the intellisense or the the kind of sample text here not the best but we're going to say hey not equals and what are we checking well i want you to look in the column we can go over here to schema input schema i'm going to say i want you to look in the list price column and where it does not equal zero those are the only ones that are going to pass here right so when i save and finish this i can preview this we're going to see we definitely have a significantly smaller set of records i think we have this one's actually not showcasing normally we get a total here it's a little weird but if i look at the preview here if we go over to list price now where are you hiding list price my eyes may have to see there it is we will have nothing here that um is blank this also shortened out our we have a total of 304 records now i think it's over 500 before but you can see now we have much more model name since we got rid of that there still are some blanks here for that but that's okay we didn't want to get rid of any products as long as they had a list price bring back bring them back regardless if they have a model name or not so really the only last thing to do here is to kind of write it to a destination right once you've gone through kind of set up all the logic that you like to do then you can go over and choose the bottom option on the destination which is your sync and you could say hey we're going to write this to a file called products final right we actually didn't create a data set for this but it's okay we can do it on the fly here and i can say i want to go ahead and create a new data set i'm going to write it to my data lake i'll go ahead and keep it to a delimited text as we've been doing here and we're going to go ahead and call this same thing azure adls underscore product final where do we want to write it that location i want to put the first run headers and where do we want to write this to and i'm actually going to write this to the destination why not make sense and we can hit ok i can take a look i always like to open up my data sets as well and kind of look like what is going to be the name here so i can go over when we go over to the settings for this file name option default we can put these options we can put headers there's a ton of different options what i'm going to actually do is in this data set i'm going to give it the name products final dot csv we're going to see here we got all those options it sets so then this i mean we could preview it but the preview is the same as the removable price the question really happens is well now that we've got this set right i've created the data flow i added the three data sets to this the two sources the destinations how do we actually run this right and this will be the last kind of main lesson of the day since we're getting close to that mark the dataflow debug obviously lets us preview a ton of very helpful things when we are developing a data flow but we don't actually have an option within the ui here to just run this the way that you would actually have to run this is through a pipeline where i would say okay i'm done with this let me go ahead and turn off my data flow debug technically there is a cost for having a debug session running it defaults and times out after one hour of no activity to kind of protect you there but i can say okay i want to end up running this i can't do it through here the way we can execute this is actually through a pipeline so i could create a new pipeline let's just call this uh demo six execute data flow and you can see under move and transform there's literally a data flow so imagine all the previous demonstrations we've done with the get the you know the get metadata the lookups the if conditions the stored procedures as part of that process you can just run a data flow that could be early on in the process later on in the process however you see fit and all you need to do to configure this you know i'll call this run what is that data flow demo that's what it's called and we simply just choose which data flow do i want to execute i only have the one we do get to choose which integration runtime we would like to leverage this within your ir choice you get a list of whatever options you have available to you so you can't scale this up as part of this but it all depends on what type of integration services you have set so you might have to create a new one if you want to kind of scale this up but this is how you would literally run this right if i wanted at this moment i could hit debug and it would execute this and i would end up with a file right i can go ahead and if i had to use a debug session i'm going to let this just kind of run in the background but how would i do this in an unattended manner right obviously i'm here i was able to run those pipelines i'm now technically running this data flow what about scheduling i was asked early on and there's quite a few choices in here but the scheduling option as far as what's organic to us built into the ui has been in front of us the whole time it is right here and it's called a trigger when you go and you can actually technically create triggers from the pipeline user interface like you see here or you can actually go to the manage hub but when you hit add trigger you can see there's an option for new and edit i don't have if i had triggers they would show up but i can hit a new one and triggers have varying choices this is literally the very typical schedule i want this to start at this time frame notice it's based off of utc but i can adjust this i can say i want this to start at eastern time i want this to occur and you choose your cadence let's say i wanted this pipeline which is going to run this data flow to happen every tuesday and thursday i could say hey i want to do the cadence of a week i'm going to run it every one week tuesdays and thursdays let's say at 7 p.m so i would do 1900 by setting this i literally create a schedule that this pipeline is tied to and it will run unattended in this time frame but that's not the only option you technically have three other choices to schedule from you have tumbling windows which only gives you the cadence of the minute and hour so you're doing this within multiple like tumbling windows within a day basically but this can be great for restartability you have storage events which actually relates specifically to blob storage accounts technically that's what we've been using so you can say hey if a folder or a file is created and or deleted that's my condition then i want this pipeline to trigger kind of cool and then lastly the newest kit on the block is custom events but custom events is something that you actually have to leave azure data factory you actually have to go into what's called azure event grid you have to go to that service create what are called topics and once those topics are created you can effectively create basically event based triggers almost around anything in azure it's actually super cool but that's its own learning curve you've got to learn how to write topics but once you've done that you can tie it in and leverage it right here so a pretty interesting kind of thing like it's the newest one available pretty cool um i definitely that's one i'm trying to get more into because it's pretty interesting but something that should be noted like i said if i wanted to hit okay and technically now i have a trigger set it's going to run this also kind of on the side note here let me go over to my destination actually something might have publishing publishing so eventually right this would um that i hasn't run yet i thought i'd kick that off maybe it's still kind of a prepping i'll let that go in the background but basically you can set these triggers as you see fit and just for the peace of mind guys you can also do other external ways of kicking this off in our youtube channel you actually have the capability of seeing some recordings that mitchell actually did and what it is is of um showing using logic apps to kick off an azure data factor you can actually also do it with power automate so let me bring this in hopefully this doesn't have too bad of an impact i know it kind of made the the the session choppy before so hopefully this doesn't do anything too crazy um but we're right at the mark we're right at two o'clock sorry for that delayed beginning guys um that losses about 20 minutes for more questions and conversation but you guys have been chatting it up and you've been keeping mitchell busy busy busy so hopefully that's been extremely beneficial and helpful what we do is we kind of gather up after this session we look at some of those questions that maybe we weren't able to answer due to the depth of it we will look and see at um you know what other options like what other kind of conversational rabbit holes we can go down and generally we'll have just a general video posted on youtube kind of responding to questions of this so you definitely want to keep your eye out for that usually it takes a little bit of time to gather all the info and prepare and record the video but as a follow-up to this if you're looking for a little item keep an eye out for that there will be a follow-up of the kind of q and a's we'll present the question and we'll showcase and we'll talk about the answer together don't forget though this session itself is being recorded you can rewind and watch it as you see fit and don't forget also in the description that's where you find the resources and the certificate of completion so yeah my goodness three hours goes by super quickly this is just the very beginnings and if you just go into that youtube channel you're gonna see quite a few videos that we have done already in there around various aspects that we've already talked about here but also more advanced ones in the realm of azure data factory and if you do take advantage of that sale that we have today for on-demand learning you've got i want to say it's like seven hours and eight hours respectively so roughly around 15 hours of content for intro and advanced agile data factory stuff we have stuff in the realm of synapse i saw questions around spark questions around data lake it's all on there guys definitely check it out thank you guys for like joining me today i apprec
Info
Channel: Pragmatic Works
Views: 28,236
Rating: undefined out of 5
Keywords: pragmatic works, microsoft azure, azure data factory, full azure course, learn azure, azure demo, what is azure, what is azure data factory, ADF, Azure dataflows, azure synapse, synapse pipelines, synapse dataflows, azure data factory tutorial, azure data factory tutorial for beginners, azure data factory pipeline tutorial, azure data factory data flow, microsoft azure data factory, data factory azure, data factory azure ml, new to azure, beginner to azure, azure training
Id: DLmlFlQGQWo
Channel Id: undefined
Length: 170min 25sec (10225 seconds)
Published: Thu Jun 09 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.