Azure Data Factory Parametrization Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hey there, this is Adam again and I'm back with another data factory video in this episode I will be talking about parameterization it saves a lot of time it makes your solution much more flexible and allows you to control the flow like you wouldn't be able without it so in this video I'm gonna show you what are the parameters variables and expressions and how can you use them in your own projects so stay tuned okay so let's talk about data factory and parameterization in data factory you can perform parameterization through parameters variables and expressions parameters are simply said an input for your actions but it is also possible to define your own parameters so basically if you need to pass some input you need to control some of the actions you can pass them that information as a parameter you also have variables variables are temporary values that you use within pipeline and the execution of your workflow so that you can actually have some temporary values and make an execution of the workflow based on those parameters and lastly you have expressions expressions are basically JSON based formulas so it's a JavaScript object notation that you use and executes either parameters functions do some conditional formatting execute some code so that you can control those parameters and variables so the typical scenarios that you do use this for is for instance if you want to get an dynamic input file like a pass the name of the file from the external service then you can do that you can also change the output table name through the parameters so you can create those dynamic flows you can append dates to outputs if you're exporting some data you can change connection parameters if you need to and you can do conditional parameters and many many more scenarios if you remember our scenario from the previous video on data factory you had the car CSV file and we were loading that car csv files into cars table in the SQL for that we needed to create a car CSV data set and a cars table data set to match our source and a destination the problem occurs when you will want to load another file for instance planes CSV you would need to create additional data sets you would need to create a plane CSV data set and a plane's table data set and then additionally add one more activity to copy this data so if you would have 20 tables and 20 files that you want to load that 60 additional objects in data factory as you can imagine that's not the best scenario to perform this activity so the idea here is that you can actually parameterize data sets to do so you just define a parameter on a data set and you pass dynamic variables through the pipeline like cars CSV file to the parameterize data set on your source and the cars table on a destination and then simply pass the planes and a planes table and that's it so this these two simple actions would save you about 20 to 40 objects that you don't have to create at this point additionally if you want you can part... parameterized pipelines as well so you can actually grab those values externally through execution which also means then your copy activity will be only created once because it will accept parameters as well so I think with this we can actually go into portal and start creating the demo so in this demo I'm going to be assuming that you've already watched either my video on data factory introduction or already know data factory a bit for purpose of this demo created a research group within that resource group I created for resources and data factory SQL Server and SQL database and a storage account a storage account is where we're gonna place our input files so let's do that right now we're gonna go to that storage go to the blob service create a container called input hit okay in the input we're going to upload two files a cars CSV and the planes CSV file so let's select those files it open and upload they uploaded successfully so we can close this blade and I'm just going to show you quick quickly how those files look like so in the car CSV we have some information about some cars models and some basic information in the planes we have some information about the planes data and that's it what we need to know for this part of right now we're gonna go to SQL database and create our output so we need to create the tables where we're gonna load those files to do that I'm just gonna do it quickly from the portal itself so I'm gonna go to query editor and of course I'm gonna be prompted here to login I would say SQL Server authentication is good for demos but you should not use it for production scenarios so let's open here and log as an administrator let's login what will happen right now it will tell me that my IP is not allowed to login and this is the common scenario that you will see when logging for the first time so I need to go here add my client IP to the server firewall click Save and when it's done I can actually hit OK and login to my database once we are in the database in this query editor we can create the tables that will be loading files to so we need to create a car CSV file first and this will be a cars table hit run and we need to create planes table hit run and it's done we have two tables that we're gonna be loading data into if you will want to do this demo yourself all the code will be available in the description of this video so we just open to confirm and we see our tables will be there loading takes a couple of seconds but we see in the cars and a plane stable available this is enough for us to actually go to data factor right now and start performing the demo so I'm gonna hit author and monitor go to the editor for the data factory I'm gonna zoom a little bit in and I'm gonna go to author section to create new pipeline so as you remember from the last video the first thing you need always is a connection so let's hit create connection and create a connection for blob storage I'm gonna call it input blob I'm gonna select my storage account from my subscription hit OK I'm gonna create new and it's gonna be SQL database I'm gonna call it output database I'm gonna select from the subscription the server name the database name and I need to provide a username for the demo for the simplicity I created the same admin username as the name of my database of course it should not do that but it's fine for the demo and the password and always hit test connectivity hit finish and right now we have that two link services that allow for the connectivity to our sources so what we need to do right now I usually like to start from creating datasets that are static and just replace them with parameters later on so let's create an input dataset that's gonna be a CSV file and I'm call it, gonna call it input CSV I'm gonna select link service that's from the blob storage my files always have headers I'm gonna select that and I'm gonna hit browse input and for instance car CSV file hit continue in the connection tab I always like to click preview data to see whats there seems to be fine so I'm just gonna leave it and I need the second data set for my SQL Server so I'm gonna scroll down hit our SQL database call it output SQL actually maybe output table is more what I'm going to be doing output data base table that's gonna be cars table and hit continue and since we have that two data sets we just need a pipeline I'm gonna call it pipeline one I guess that's fine but let's call it something more significant and within that pipeline we just need one copy activity copy data so we're gonna select our input CSV as a source and our SQL output table as a destination and at this point we can just hit the debug and it's gonna run and if everything works fine it will import the CSV into the SQL and it succeeded to confirm that we can always go back to a resource group go to the SQL database open query editor log back in the database and select from ours cars stable looks fine so if this works this is the moment that I do I should like to parametrize at this point because I know my copy activity my datasets work my connectivity works just fine so what we need to do right now we need to hit on the input CSV file and go to parameters section this is where you define your custom parameters so I want to define the parameter that will be called file name I don't want to have any default value because I don't want to make anyone make a mistake and load the incorrect file into incorrect table so a file name is a parameter that and will need to supply to this input CSV but we also need to use this parameter since we just defined it so we go to connection and instead of hitting car csv file we remove this part we hit on this dynamic content here and there's a new section here called parameters and a file name that we just defined so we click on it it appears here they add data set a file name and click finish that means whatever we pass in to this data set as a parameter will be used in this file path to pick the file name we need to do the same for the output table so let's define output table parameter let's call it table name and again we go to connection and we need to edit this table we hit on edit and replace this just remove it again add dynamic content click on a table name click finish and that should be just fine so now we have two data sets that both take parameter one is a file name and a second is table name what we need to go is go back to our copy data activity and notice something interesting because we created parameters for those actions for those data sets they are visible here in the copy activity so we need to supply in a sink table name and in a source file name so let's call it cars CSV and in a sink cars table if this works we can go again debug and see what we're gonna get it takes about 10 seconds to run actually this was 5 seconds this time we can actually select maybe count this time so if we see if the amount of rolls is growing so if we go back and maybe change this now to planes and go back to source and change it to plane CSV and debug again this should allow us to load the planes table without actually doing addition any additional work just executing this copy activity by passing different parameters so let's go back here select count from planes seems to be working just fine we can of course select everything just to see what do we have here and seems to be perfect and this is how easy it is to actually create parameters on datasets it's just one tab pass the parameters through and you're done of course maybe you want to backup this data so let's see how would you do it first of all may you would probably before copying this data you would add additional block a copy data block you just drag this in and you would reverse the scenario so we would go for source your source would be actually the output table your sink would be the input CSV and maybe you want to backup the planes table into the file that would be called plane's export CSV so let's head hit debug in just couple of seconds we should see that the first buckup already run through to confirm that I will go actually back into the portal go to a resource group open it go to the storage account go to the blobs input and find my planes export so this is also a great way to reverse it and since everything is parameterized you can you could actually specify this since we know that parameters are very powerful maybe you want to do a dynamic output may be added date to this output to do that you actually need to go here and click add dynamic content so besides your parameters that you can use you have also a bunch of functions in this case I'm gonna use string function called concatenate so I'm going to concatenate a couple of strings first of all I'm gonna add planes maybe at a - next up I want to concatenate also a date so I'm gonna go to date functions at UTC now and of course let's not forget about the extension CSV hit finish right now our file name should be planes added a date and a CSV file let's hit debug again in just couple of seconds we should see export a date on our blob storage that finished just fine I'm just gonna wait go back to the storage and hit refresh and as you see we have planes with a date appended this is amazing because now you're reusing variable and you're using expressions to control the flow and do some additional work would otherwise take a lot of time to do without this knowledge so the very last thing that I want to show you is variables and pipeline parameters pipeline parameters in order to define parameters on a pipeline level you need to hit on this white background so if you have anything selected you too need to click here and you'll have parameters and the variables parameters on a pipeline level can be defined as well so I'm going to create two called filename and a table name why am i defining the same parameters that I did because I actually want to define them once on the entire pipeline and pass them to each step here so now I have file name and a table name so I want to use those parameters I go to here and instead of typing planes here I'll remove this part and I scroll down and find that parameters called filename notice this pipeline parameters file name hit finish we need to go back to source here change this again instead of static name of the table we scroll down and hit parameters table name this is again pipeline parameters table name hit finish and do the same for the second step so instead of playing CSV remove that part at dynamic content scroll down parameters file name hit finish and a sink remove this part a dynamic content scroll down and a type name and finish so what we did just now is define the parameters on the pipeline level and bubbled them down to our data sets and copy activities so I'm gonna hit debug now it will actually ask me for those parameters because they're on the level of the pipeline and they don't have any default values so I'm gonna supply the file name called cars CSV and a cars table hit finish if nothing went wrong right now we should see our car CSV file backed up on our blob storage and our cars table loaded with data the first step just finished successfully the second did as well so if we go back to the blob storage and hit refresh we're gonna see cars CSV file exported this is amazing so of course you probably noticed that there's an extension here because you should do it a tiny bit different and not past a CSV extension as a parameter but appended dynamically like so so you would do it in a way that you would go here to the source file name and do a concatenation here and that's it now you can actually type just cars and that's gonna be it you no longer need to pass that dot CSV extension to the file so last you have variables and as I said variables are very similar to parameters you just defined some... variable and when you do you can just use them the same way like use parameters they will be seen here in a variable section what is different about variables is that here in the I think general tab... yes here you have a step called set variable so between your steps you can actually set some variables and change the variable setting so for instance change this temp value to some different name and use it in a different step so this way you can actually control the flow between the actions and I think that's it that's that's how you parametrize your workflows and as you see I would say the sky is limit because you can do a lot with parameters you can actually export entire databases import entire databases through simple two dataset type of flow if you don't have any specific custom requirement this is a very good way to do so so now that you see what our parameters variables expressions and how you can parameterize your workflows it's just up to you to design your workflows to take the full advantage of this fantastic feature so that's it for today I'm gonna see you next time if you like this video hit thumbs up leave a comment and subscribe if you want to see more videos and see you next time you
Info
Channel: Adam Marczak - Azure for Everyone
Views: 81,053
Rating: undefined out of 5
Keywords: Azure, ADF, Data Factory, Parametrization, Blob to SQL, ETL, tutorial
Id: pISBgwrdxPM
Channel Id: undefined
Length: 22min 7sec (1327 seconds)
Published: Mon Aug 26 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.