Azure Data Factory Tutorial | Introduction to ETL in Azure

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This was really informative and clear - thank you so much! Definitely subscribing.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/atalders πŸ“…οΈŽ︎ Jul 23 2019 πŸ—«︎ replies

I'm sorry, but in that thumbnail, you look like Charlie Sheen haha

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/dasunsrule32 πŸ“…οΈŽ︎ Jul 23 2019 πŸ—«︎ replies

Hey everyone I tried to get all feedback from last episode in place.

  • Sound fixed, no more echo and surrounding sound
  • Changed template of PPT to gray so it doesn't burn eyes
  • Changed drawings to icons to I can show topics more clearly

Hope you will enjoy this episode.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/AdamMarczakIO πŸ“…οΈŽ︎ Jul 22 2019 πŸ—«︎ replies

Cool, subscribed!

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/DRdefective πŸ“…οΈŽ︎ Jul 22 2019 πŸ—«︎ replies

Subscribed!

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/DouglasTS πŸ“…οΈŽ︎ Jul 22 2019 πŸ—«︎ replies
Captions
hadar so this is Adam and let's talk about moving data because every single project needs to move data being in cloud or maybe on-premises you still need to move data but in Azure there's a great service that helps us with that it's called data factory so today we're gonna get the brief introduction on how data factory works and how can you use it in your own projects so stay and find out how so let's talk about data factory data factory by definition is a service delivered on a draw that allows you to create the data integrations between multiple sources of data it simply said it's a service that allows you to move data between one and another service so that definition is fine but it's easier to visualize this once we know how the data factory works so let's jump straight to it so first of all how does it work imagine we have a scenario where we want to move the cabinet from the shop into the home that's fairly simple to understand so first of all what we will need to do we're gonna get a key and an address so we need to know where the shop is located and how to enter it to grab the cabinet of course we'll need also some sort of information about the cabinet maybe disassembly info maybe information about the package whatever the information is we then of course need some delivery details so we need to know what are we going to do once we get the package after this we're gonna need also an assembly info so how once we actually deliver this how we're going to assemble this back in to get the same state as we wanted and lastly of course we will need a key and address of the home that we're gonna deliver this to as you can imagine this process needs to be overseen and this is something that deliveryman would do and because a normal company would have multiple delivery means there's gonna be someone called delivery manager that's gonna be overseeing all the delivery months out there so that's fairly simple to understand but how would you compare this to data factory if we would replace the shop with a blob storage and our cabinet with a customer CSV file this is essentially we want to move this kind of data into some other place so what we would need to do in data factory is get this key and an address this object in data factory is called link service this is a definition of the connectivity to this blob storage and also authorization like a key additionally we need to create an object that is called data set this is definition of this customer CSV file so a definition of what kind of data we are pulling from the blob storage once we actually successfully authorize there but next we're gonna need something called copy activity so basically any kind of job in data factory is an activity the most popular one is copy activity which moves one data to another place and of course we need to know where do we we need to move the data so in this case we're going to move it to SQL server into customer table the same as with the blob storage we're going to need a link service that will allow us to to know how to connect to the SQL server and where is that SQL server located and lastly we will of course need another data set which will tell us once we get the data from the CSV file how is the customer table structured so we can actually input it there in this example the delivery man the the the one that is handling the entire job is called integration run time so this is the workforce of data factory that is actually handling all the jobs and the delivery manager is actually data factory so it's overseeing all the integration run times and all the drops that are happening of course you can run multiple integration run time jobs in something called pipeline so of course the pipeline can be one copy activity although usually it is more than one usually copy data multiple times but when you do that you just need to member underneath each of that copy activity there's gonna be always a date two data sets one is a source data set and another is a sync data set and of course each of those data sets needs to have access to a link service that will tell us how do we connect to the source of the data or the target where we put the data in of course you don't need to create this many link services because if you're pulling it from the same source maybe pulling all the input data from the single source you can share the link service between them and the same goes for the target you can also share it and this is the typical scenario that you use data factory for so let's go and try some damn on shall we so we're gonna be importing course CSV file from the blob into Azure SQL in order to do that let's go to the portal this is the azure portal in here we're gonna need three things first of all we're gonna need a data factory so let's create it click on the new button type data factory if you get this click create we need to give it a name I'm gonna call it a 4e either for everyone a DF demo I'm gonna pick a resource group this is gonna be existing one that I created for this purpose I'm gonna leave version as version two because version one is completely different thing good but not that topic for today we're gonna pick a north Europe which is the data center closest to me and then for this purpose of this demo I'm just gonna disable enable gate but it's a great feature for you to explore in the future so I'm gonna hit create additionally since our demo will be moving blob to SQL data we'll need that SQL so let's create that as well so let's type SQL maybe SQL database so we found the deskilled database we can hit create and propagate the most common information so that's going to be our resource group the same one we need to name of that the database a4e drink your base we need to also a server so let's create new seems like this server is free that's good for us so let's create an admin user that we're gonna be using to log in to that server it's gonna be a 4e admin and we need to give it a password remember that password but in case you lose it you can always reset it later on I'm gonna also pick north Europe because I want my database to be close to my integration runtimes in my data factory and I'm gonna hit the select checkbox because data factor is public service so we need that access to that server itself so I'm gonna hit OK I think we have everything of course we can change the compute power because we don't need such a big server so I'm gonna change it to standard s2 so that's gonna be my standard actually s 120 degrees 250 gigs of storage so I'm gonna review that estimated 25 euro that's what I wanted I'm gonna hit create and there's gonna be provisioning for a minute or two so what we need now is a blob storage as well so let's create that so we need a storage account we need to hit create and again I'm gonna create it in the same resource group I'm gonna give it a name I'm gonna call it a t 4e a DF demo I'm gonna leave North Europe I'm gonna leave standard and I think everything will be default so hit create let's review everything is fine looks good so my services are deploying right now I think we already have the data factory which is very good but let's actually go to storage account once it's provisioned so storage account is provisioning probably within half a minute to a minute so we can actually wait for that so the provisioning just finished I'm gonna click go to the resource and on the storage account I'm gonna be using blob storage so I want to store my files in this case I'm gonna create a new container called input I'm going to leave it as private because I don't want unauthorized access to my blob storage I'm gonna go to input container and upload a file I already prepared a file that is called cars CSV so I'm going to open it and hit upload upload just went fine so we can close it click on the cars demo I'm gonna click on edit blobs I can show you how does it look inside see our SQL server also provisioned so this is how the file looks inside this is a classic CSV file in the first row we'll health headers and then we have a data separated by a comma so since we have our data in the input folder let's go and prepare our SQL for that data so I'm gonna open dashboard I'm gonna go to my resource group I'm gonna pin it because it's gonna be easier for me to navigate later on but I'm gonna go to SQL database in the SQL database there's this good feature here query editor which allows you to query the database without need of any external tools so I'm gonna use it to log into our database right now I'm on our database in order for us to work with this I need to create an SQL that will create the table where we import our cars I already prepared this for our purpose previously so let me actually jump back copy/paste the query and this query will create a cars table with these columns so let's run it successful just to validate we're gonna open tables here see demo cars and expand it seems okay so this means our preparations for the demo are done we can actually go to data factory and start working this was essential part so we don't actually have to jump back and forth from the data factory rather than assume those things were there before so let's go to data factory click on alter and monitor this will bring us to our data factory and of course to our them there's a lot of good stuff in here like create pipeline create pipeline from template copy data wizard and many others of course good videos from the other Friday and some tutorials but I always advise everyone to goes directly to alternate tab because in here this is where you're gonna spend most of your time so it's better to actually learn it this way so let's go back to our demo what we're gonna need to do here we're gonna need to do this first of all we're gonna need a pipeline because everything is contained within a pipeline so let's create one pipeline you do it here by clicking on the pipelines and add pipeline you can change the name so you actually know what it does so let's call it SQL or maybe actually blob to SQL that should be that should suffice in terms of naming convention so the second thing we're gonna need to do is integration runtime will actually do it by default when we select the steps so let's do what we need to do right now which is start integrating so first of all we need to connect a blob storage as we already said in order to connect the blob storage we need a link service so let's create the link service to create the link service you go here on the bottom to the connections there's a tab called linked services you get new you find blob storage on the list hit continue and call it somehow I'm gonna call it input blob next you need to call what kind of integration well run time we'll run it so I'm just gonna leave auto resolve integration run time which means my integration run time will be given by a drawer so I don't have to create it on my own the next is so we know we will need to connect to that blob storage right and there are several ways to authenticate there may be free account key sauce or a service principle manage identity of course the most sophisticated and more secure way is manage identity but for today we're gonna leave it with account key for the simplicity next it allow our data factory allows us to pick from our subscription if you access to the blob storage so let's select our subscription our created storage a4e ADF demo let's test connection to see if it works connection successful which means we are fine so let's hit finish and that's done so the next thing that we need to do we need to define a data set so that when we pull the data from this blob storage we need to know how does this data look like you go to data sets here hit on edy data set we again need to pick what kind of data set is this since we're using blob storage link service this also needs to be a blob storage what is the format of that flower we're gonna be picking from the blob storage and the supported are those six but for today we're using CSV so let's click on it let's hit next and let's call it somehow I'm gonna call it cars CSV next you need to pick it so there's gonna be car CSV but from where it's gonna be coming from our input blob it's asking now it's asking us where is that file located so the easiest way is to click browse here select our container and within our container Sark's cars csv file as you remember our first row had the headers so we need to select that checkbox to ensure this is that true and next leave the import schema by D as default so let's hit continue and notice what happened a new data set was created and there are few tabs that are available for us right now there's a connection tab so if you checked something wrong you can change it here but there are additional options here for instance you have a column delimiter here so maybe if your CSV is the limited by semicolons you can change it here maybe by pipe trouble editor or anything else maybe you have a different system that generated this CSV so you can change the row delimiter or maybe you want to change encoding of that file you have all those options in here for you to change but what is good about connection tab there's a button called preview data here so if everything was working fine you can hit this button here wait just couple of seconds and see our data since we created a very good CSV we can see our data here but because we also selected first row has headers the headers were mapped at the columns which is very good and of course in the schema you see what was imported since this is a CSV file the all the types are string by default so what we need to do right now let's go back we need to actually connect to SQL and in order to connect to SQL we need another link service so let's go to the portal go back to connections and create new link service for the SQL you can either your search bar or go to databases scroll down and actually you're not gonna find it here because other SQL is under our role tab which is something somehow confusing but you can click on our SQL database and again let's call it output SQL we're gonna use the same integration run time because we want to ensure that this is running on the same server next we we can connect to this and we're gonna select connection string and that connection string will be automatically pulled whenever we select our subscription server 9 and a database using this wizard either will automatically grab the connection string to connect to this database but what we need to do is authentication type also so again servers principal and manage identity aren't the most secure wise but for today we're just gonna use simple username and password so our username was 8 4 e admin and we need to provide a password and if this works this test connection should be fine in case you have issues here always remember check your username check your password and check if you've selected that firewall settings to allow connectivity from our services if you did everything should be fine so right now we have a second link service so going back to our diagram we need another data set this data set will say since we pulled the data and we have it in the memory right now how do we transform it to put it into SQL so right let's go and create a data set let's hit add data set so this time we're gonna select SQL data set we need to give it a name so that's gonna be our car stable we need to select a link service that this is going to be used for connectivity that's going to be output SQL and we need to select the table that this will put the data in so since we previously created we already have this table available an import scheme and is gonna leave as default so let's go create so we're almost there we have two data sets we have two linked services so we just need that very last step that very last step is the activity that will actually perform the copy action to do that you go back to pipeline you go to activities here as you see there's a lot of activities but for today we're gonna be picking the most common one which is copy data drag and drop it on the screen change the name so maybe cars CSV to SQL so we know what this step actually does as you see there's a small red marks that something is missing here so we go to source we need to select source data set so we're gonna select car CSV and we need to also select sync data set so in this case it's gonna be cars table this basically defines this flow that we see here that we're pulling from this data set from this source into this data set into this destination this is called source and sync data sets of course you can always check mappings if you want but it's not necessary you can even hit import schemas to see if the columns are matching correctly if not you can always change it here but our example is very simple therefore all columns are matching so if this actually worked you can use debug or you can publish everything I advise everyone to always use debug to see every everything is working fine because if it does then you can publish working version instead of publishing something that might know work so within a couple of seconds I think between five oh yeah it's it was actually five so it took us five seconds to do that but we cannot trust it let's actually validate if this works fine so without leaving data factory let's go back to our dashboard let's go out our resource group go to data factory go to SQL database query editor back in let's login back to the server hit OK and select let's select from the cars so this actually worked we have everything our in our table we can actually select how many roles did we import by select count 406 that was easy wasn't it we can actually publish this since we know this works and once you do it you will be able to trigger this workflow and monitor it so there are a lot and there are triggers that are able to schedule or call it this pipeline on event but for today we're just gonna trigger it manually so we just test this there are no parameters for this pipeline therefore we just click finish and what this does is submitting a new pipeline run but this pipeline run is different in a way that there's no longer debug you will actually be able to monitor this here in the monitor tab so this just worked fine it took 8 seconds this time it was successful so if we go back and query our table we have twice as many rows as you can imagine this is not the perfect scenario because we just double the data which means we have duplicates in our database so just quickly if you want to if you want to handle this situation there's always something that you can use here which is pre copy script which you can just type in you need from which will delete all data from the cars table before this is just an standard absurd and of course you need to design the logic that you would want to work with your data so it's up to you so going back to our slides we just did this entire scenario we were able to pull data from here for the length service dataset copy this data into a scale server and it took us less than 10 minutes that wasn't hard was it so that wasn't that hard was it you didn't even have to know what kind of API does the block storage have or what API does the skill server have you just use data factory ready out of the box connectors and just integrate everything in just couple of minutes so that's it for today that was the introduction to data Factory and if you liked the video hit the thumbs up maybe leave a comment and if you want to see more hit subscribe and see you next time [Music] you
Info
Channel: Adam Marczak - Azure for Everyone
Views: 393,757
Rating: undefined out of 5
Keywords: Azure, ADF, Data Factory, Blob to SQL, ETL, tutorial, azure data factory, Gen2, Gen 2, Extract, Transform, Load, Data
Id: EpDkxTHAhOs
Channel Id: undefined
Length: 24min 59sec (1499 seconds)
Published: Mon Jul 22 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.