Azure Data Factory | Copy multiple tables in Bulk with Lookup & ForEach

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys this is adam and today i want to show you how to build dynamic pipelines in azure data factory by determining a list of objects and files that you want to process during the runtime this is introduction to azure data factory lookup and forage actions stay tuned [Music] let me say a few words about lookup and for each action before we move to azure portal first of all the key concepts lookup action allows you to retrieve a data set it can be either from a file or table within your database and you use that data set to determine the list of objects that you want to work for the rest of your pipeline and it pretty much supports most of the data sources that are currently supported in the data factory so you can use pretty much any kind of database or flat files solutions and additionally you pair lookup action usually with for each action because lookup returns a list of objects but for each actually allows you to execute a set of activities for each object on that list and this is the scenario that we're gonna be working on today so the part of the demo is today like this first of all we're gonna be listing a tables from sql database using a lookup activity next we're gonna iterate over the results returned from the sql server and for each table within the iteration we're gonna do a simple copy extraction so we're going to extract data from sql database into a csv file on a storage account so the demo for today is going to be done in step-by-step fashion so i'm going to have this slide where i'm going to be jumping between presentation and azure portal and i'm gonna be doing step by step setup so first let's go to azure portal where i'm gonna explain our initial setup for the initial setup i created three services in azure i created a data factory i created storage account and i created sql server and i have one database on that sql server those are the three services we're gonna be using today and let me explain each one of them first of all data factory is the main piece of work we're gonna be doing but we're gonna move there in just a second first of all let's go to storage account on the storage account i have empty storage account there's nothing done here yet except one container called exported data this container is currently empty but this is where we're gonna be uploading all our exported files so we can close that for now and additionally we have sql database i'm gonna very quickly use our portal to login into that sql database because it's currently empty to set up some initial data for us to work on so i'm going to do that i'm going to use query editor so i'm going to quickly login into sql right now and inside here you can notice we have currently no tables and i prepared a script for you to execute which creates a free a very simple tables so as you see it executed we have three tables 428 rows 17 rows and 112 rows if we're gonna refresh the view we're gonna see that we have cars table we can actually select top thousand rows of it and let's see this is some basic car information you can select thousand rows from countries and do the same with thousand rows from the movies data set and we're gonna use a dynamic approach to extract all those three tables without us typing hard coding the name of the tables anywhere we're just gonna dynamically grab all the tables from this sql and that's it when it comes to initial setup so right now we can actually go to data factory i have it already opened in a next tab and in here we're gonna do a couple of things so let's see what steps we need to perform first of all we need to list tables with a lookup action and to do that we're gonna need to set up couple of things first of all to start working we need to connect to sql server so we need to set up a sql linked service let's go to azure portal and do that so let's go to connections create new connection select azure tab and you can either type or scroll down to find azure sql database and select it from azure subscription i'm going to select my adf lookup demo database select the database name adf lookup demo use am admin username and password test the connectivity to ensure everything is working fine and hit create so the first step is now done so let's see what's next next we need to set up a data set a data set that we're going to use to list tables so before we move to setting up the data set we need to be sure that we know what kind of data this data set will be returning so let's go back again to the query editor where i'm going to use a simple query because you can actually use this query where you input your database name grab the information schema and tables to list all the tables within that database just remember to replace this database name with the name of your database this will return something like that since you see we have three tables cars countries and movies in this database so i'm going to use this query to grab the data from the data factory so inside of the data factory i can now go and create the data set so i'm going to hit new data set again azure tab scroll down to azure sql select the link service that i just created for for the database and i'm gonna call it table list data set right now i'm gonna leave table name as empty because i want to use dynamic query for it so i'm just gonna hit ok and now what i need to do is actually nothing else because i will be able to use this using dynamic query later on inside of the lookup action so let's go back to our steps and see what else do we need to perform so the next thing we need to do is add this lookup activity and the pipeline itself to do that we need to first create a pipeline this is currently an empty pipeline and find ourselves a lookup as you see it's not in the move and transform it's inside of the general there's a lookup action here just drag and drop it and right now list tables let's rename it to be very clear what we're trying to do here and inside of the settings tab you can select the data source in here just select your table list and instead of leaving this as checkbox as table change it to query and right now you can input that query that you just prepared so we're gonna use this query to return a list of tables dynamically and to confirm this works you can hit on the preview data and as you see something interesting just happened it returned only one row and this is because there's a first row only checkbox here but let me very briefly explain how does it work so lookup can return either a list or a first row only of the data and depending what you're gonna choose you're gonna get a different response so it's quite critical to understand the difference if you're gonna select true in that value so check that checkbox you're gonna get an object with one property called first row and then your item your row of data will be within that property and once it's there you can access this by using an expression calling activity function grabbing the output property and then first row property this will return your item on the other hand if you uncheck this box you're gonna get an object with two properties a count containing number of rows returned from your data source and a value containing an ri which has all the rows from your data source and accessing it is fairly simple because you either use output dot count or output dot value to get the count and the value respectively so the two limitations that i would want you to know about here is first of all there's a maximum of two megabytes that you can return from your data source and also maximum of 5000 rows so you cannot do super super huge pipelines based on the metadata in case you do you will need to break it down into multiple pipelines but that's the story for another day so let's go back to our example and let's set this lookup activity so inside of the lookup activity if we uncheck this box you can again press preview data to see the list of all the tables within your database as you see as expected we have three rows of data so what do we need to do next next we need to add an iteration an iteration will allow us to go through each table within this result set and to do that simply scroll down to the activities called iteration and conditionals and grab and drag and drop for each and in order to ensure that for each runs after the lookup just drag this line here to make sure that the pipeline is ordered properly and that's pretty much it so before we move to the next steps let's talk about for a minute how does the forage actually works and as i mentioned previously with forage you can iterate over the list of objects from previous steps and to do that in case we grab this lookup activity we grab the value that means from this object we grab only the ri with three items so let's do that in our case and let's go azure portal and in here inside of the settings you need to set up the items property because these only require property for the forage and in here you can actually parametrize so use and add dynamic content here and if you scroll down you're gonna find activity outputs list tables and notice that it actually mostly gave you the expression that we were talking about so activity function from the previous step which is called list tables grabbing the output so the only part we need to add is dot value since our ri and our list will be within this value so let's hit finish and we could probably test it right now but let's talk about for a minute more how does the forage works so since we now initialize the forage and it's gonna iterate over it you can now use an expression and this expression is called item it's a function which will return the current iteration item so for iteration one this will be item one for two two for free free and we're gonna use this in a dynamic data set to perform our copy activity and one thing to note here is that you cannot nest for each inside of a forage so you cannot do nested looping if you need that kind of behavior then you need to nest a pipeline and then put a nested forage within that pipeline it's a bit cumbersome but it's how it is it is also worth noting that when you use the item function you can actually access properties so if that would be example of our data we would have id and a table name for our ri that means you can use item dot table name to access the table name property of those objects in which case for iteration 2 that would return countries and this way you can actually iterate over each table and then prepare a custom expression custom behavior for each two tables and do the copy activities all right so the next thing that we need to do let's go to the diagram the next thing we need to do right now is to copy the table from the sql to the blob storage since we already have the sql link service the next thing we need to do is blob link service so let's go back to azure portal let's go to connections hit new connection select azure blob storage hit continue select from our azure subscription select the storage account name that we're using in this case it's adf lookup demo storage hit test connectivity to always ensure you're connecting correctly hit create alright since we have it done what's next next we're gonna need to set up two data sets we need to set up data set for the input so for our sql table and for our blob file i know you could potentially reuse the list table data set that we created but i'm not a big fan of it i think it makes a mess and you're gonna sooner or later break something apart so just keep generic approach and create separate data set for sql and separate for generic table all right so let's create those two data sets first of all we need the data set for sql again go to azure select azure sql select the link service that we just created and i'm gonna call it generic table or maybe just sql table it's going to be using my azure sql database 1 connectivity and in here i'm going to leave table now for none because i want to program it generically so i'm gonna hit okay and first thing since this is a generic programming and it's table name gonna be different for each iteration we need to introduce a parameter this will be called table name and we need to use the table name inside of the connection here in the table instead of the drop down hit edit and hit here the table name so add the dynamic property and hit the table name to be honest sql also have schemas so to be fully compliant fully very well programmed in this case you should also add a schema name and use that schema again in the connection here so add the dynamic content and select schema name because you can actually have the same name of the table across multiple schemas and if that would happen then your pipeline will break so let's make sure that doesn't happen all right now we of course need the same data set that will represent our table but on a blob storage as a file so we need to hit new data set select blob storage select csv format because this is the one we're gonna be exporting to and i'm gonna call it csv table in this case i'm gonna selecting my azure blob storage connectivity and i need to select path so i need to hit browse which will allow me to select a container in this case it's exported data hit ok and then we need to provide the file name but what i'm gonna do is first of all i'm gonna select first row as header just because i always want my headers in my csv files i'm gonna select import schemas none because this is the output data set output data sets don't need to import schema and for now i'm going to leave file name as empty because i want to parameterize it as well okay so in here i want to provide a file name parameter and this parameter will again be used in a connection tab in a file property so just hit dynamic content select parameters file name hit finish and now we have fully parametrized data sets so the next thing we need to perform is the very last step so we need to add a copy activity within this for each and use those data sets in there so let's go to azure portal and to do that we go now to the pipeline and since we have that forage here you just hit this edit and inside here drag and drop into the panel the copy data notice that on the top it says what's the pipeline name that was the forage name of course for briefity change the name of the forage in which case in our pipeline that would be let's go to the top four each should be called for each table right and then inside of it for each table what do we want to do export table here you need to select a data source so our data source is our sql table and notice that in here it did actually recognize that you have table name and schema name that you need to provide and to do that we need to use dynamic content and as i said you need to use expression and i showed you that you can use function called item if you scroll down you're actually gonna find that for each iterator is here because it did recognize that you're using expressions within a loop so you can actually hit it and it will say item right now you are ready to use the expression here so just type dot and then name and the name of the property will be the same as the name of the sql query returns so you're gonna have table underscore schema and table underscore name so for the table name we just grab table underscore name hit finish and for the schema name it's pretty much the same just scroll down select forage dot table schema and hit finish that's pretty much it now for the output go to the sync select the data set in which case it's csv table and again you need to provide the file name so in this case i'm going to use a dynamic content and again i'm going to use the current iteration name but i want to be sure that my files will not be overridden in case i have multiple tables with the same name so what i'm gonna do is i'm gonna concatenate so i'm just gonna grab this cut it out for a second con cut and i'm gonna concatenate item dot table schema with underscore sign and then i'm gonna add item dot table underscore name and lastly i'm gonna add a csv at the very end so in our case a cars table should be called dbo cars csv pretty much just here dbo underscore cars.csv so let's hit finish and that's pretty much it if we didn't make any mistakes we can actually go back to the pipeline and hit the bug before we actually publish it to ensure that everything is working correctly so let's wait for a minute the pipeline succeeded we can notice couple of things first of all there's a list tables action here you can actually click on it but if you go to the bottom here notice that there are those input and output so you can actually review what did it return so in case of list tables what did it return notice this count free because we have three tables values and as you see table catalog table schema table name and table type which corresponds to what you are returning using our sql and as you see we have all those three tables and some extra information and we use this as an input for for each table it for each table doesn't show input but it shows the output and the output is that the account that it iterated over was free and you can actually see free executions of the export table so let's go back to azure portal let's go to um to the panel let's go to the storage account container exported data and as you see fairly easy we have our cars table export that we can actually review the contents and as you see those are cars those are the countries everything worked perfectly and with just couple of minutes we were able to parametrize all of that one thing additionally that i want to show you is how to change this pipeline to only select few tables instead of all of them without you hard coding this one approach that i like to do it is to use a metadata within the database so let's delete those files for a second all right so we have a fresh start let's go back to the sql let me log in very quickly in the editor and in our sql i'm going to create a simple table called exported tables containing two major columns table schema and table name to represent pretty much the same as we were just using right now and in this table i'm gonna insert two rows called dbo cards and dbo movies containing a list of tables that i want to export so i'm just gonna run it and inside of tables you can find a new table and we can return the first thousand rows so our expectation is that using this query we should be able to export only two tables if you actually maintain the table underscore schema and table underscore name as names you can actually just grab this query and replace this one because our nested actions are still using the same table table schema and table name properties so we don't have to change anything else at all so let's just run debug to ensure that everything is working and the run just finished as you see the list tables returned only two that was expected and you see there are only two export tables actions within this forage so you can also go back to your blob storage to review the contents to ensure everything was exported correctly and as you see we have only two files here and this is how easily you can actually use metadatabase approach in your metadata tables to export dynamically your tables or maybe import dynamically your tables is pretty much up to you to design your workflow within this for each loop and using lookup actions there are plenty more scenarios that you can achieve by leveraging functionality of the lookup and forage but it's up to you to design those generic workflows yourself either exporting importing data both can be done using this functionality but that's it for today if you like the video hit thumbs up leave a comment and subscribe if you want to see more and definitely see you next time
Info
Channel: Adam Marczak - Azure for Everyone
Views: 165,476
Rating: undefined out of 5
Keywords: Azure, Data Factory, Lookup, ForEach, For Each, For-Each, For, Each, Look, up, Bulk, Copy, Dynamic, Loop, Until, Generic
Id: KsO2FHQdILs
Channel Id: undefined
Length: 23min 16sec (1396 seconds)
Published: Tue Apr 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.