Complete Azure Data Factory CI/CD Process (DEV/UAT/PROD) with Azure Pipelines

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone in this video I'm going to show you how to build an Azure data pipeline that can complete your data Factory cicd process so what that means is I'll actually got to write a pipeline that can build your Dev data Factory uh code like package up all of your code and deploy it to a uat data Factory and a production data Factory and also how to update your template parameters right so your Link services Global parameters if you want to point to a different data Lake newat and production how to update those variables as well uh those parameters as well in Upper environments um so let's get started so the process I'll be showing you is this automated publishing document here which again I'll link in the description below anything I show in this video far as documents or code I'll link in the description below as well um this is automated publishing and it goes over two ways there's two suggested methods to promote a data Factory to another environment like uat and prod I'm be showing you this way how to write code to automated deployment using the Azure pipelines here so let me show you what I'm talking about so in my Dev Resource Group I have a Dev data factory data Lake and key Vault and then in my data Factory my Dev data Factory a couple pipelines just a copy activity I get to secet from the key Vault weit activity and then a manage tab here I have a my Dev data link for my link service right then I have a key Vault link service again pointing to Dev U and then I have parameter that is just my Dev key Vault name um and so the first thing you want to do is a prerequisite doing this code also just to mention in this video I assume you know what data Factory is and how to do it and you just want to know how to actually update and deploy your code to Upper environments package up your code so that being said you can actually click on this manage tab click on this arm template and then make sure that this checkbox is checked off this actually includes your Global parameters in your arm template this is really really important this makes life a lot easier to update those global parameters so to update like that Dev keyel Global parameter to uat to prod when I want to those upper environments if you don't have this checkbox you actually have to use Powershell scripts uh to update it's a lot more tedious so click this box it'll make life a lot easier for you and now what's the actual code right so here's the pipeline here um this data Factor CSD Pipeline and the actual code is in my GitHub repository again I'll link this in the description below uh and then I actually downloaded this and I'm actually going to show you what this looks like uh in the code we'll walk through it step by step so the first piece is I have this you know piece here of the cicd folder in this folder I have the pipeline that's actually a data Factory cicd pipeline the code we're actually going to use to run that pipeline um I have a name it's actually the data Factor City pipeline this dollar sign date is actually a pred one of the predefined variables in Azure pipelines that just Returns the date so it'll just return like October 4th 2023 and this is another predefined variable that's just a counter so for example when I run this pipeline for the first time it'll be like October 4.1 for the first run if I run it again on October 4th it'll say October 4th point2 and so on and then this resets after every day so like on October 5th this will go back to 0.1 uh just it makes the name super easy to see and useful I think uh then I have trigger so this means that whenever any commits or in obviously data Factor's case you do a pull request to main um that will then trigger this pipeline automatically for you so you don't have to manually do it each time then I have a the VM I'm using as a Windows latest um and then I have a couple stages so the first stage is going to build the dev data Factory this is going to package up that code um to then be able to deploy later on and I use a variables template here so it's this variables folder if we go in here I have a Dev variable this of a couple variables the data Factory name right that Dev data Factory I showed you uh and the dev Resource Group name and then that uh ADF artifact name this is actually going to be the name of our artifact that we published that's going to hold all of our arm templates scripts that I will show you a little bit later then this working directory this is another predefined variable this build repository local path um that just specifies the beginning of the route so when we check out our code when we check out this repository it's just going to point to the root here that means that in any of these folders I can access any of the folders or files inside of the folders that I need to just makes life a lot easier I think to use okay and then the next step this first job is we're actually going to validate and build that data Factory template so we're going to package up that code so if I go to the steps template and I like these steps templates because it helps modularize and and kind of separate the code it doesn't so you don't have to write all of your steps in one giant you know pipeline you can actually break it up a little bit so it's in this ADF cicd folder so if you go here to this adcd folder you'll see this build yaml file that we're going to be using has a couple parameters right the data Factory name in our Dev environment it's going be that Dev data Factory Dev Resource Group that ADF artifact right that name that we're passing in and then that root uh directory path then we're going to check out the source code check out this repository and then it's going to write a Powershell script just to say hey we're actually going to try to build this from this particular data Factory from this Resource Group okay and then we're using a test this node tool at zero this just installs node.js and then the next task is this npm1 task which installs the npm package npm package um what these These are important because the code we're actually using this automated deployment is actually going to use this npm package so if I actually go back it's actually going to be using this Azure data Factory utilities npm package so that's what we have to do uh to be able to do this you can run a couple different things here again I'll link this in the description you don't think to too much about this just know that this is the package that gets downloaded when we actually execute our code and then we're going to validate and generate the ADF arm template scripts another important thing about this build piece for this npm npm piece to work this package to work is you actually have to create in your root directory so in the root piece you have to create this package.json folder sorry file and then just you can copy this script but this is just saying we're actually going to be using that that data Factory npm package that I showed you for our code to build it so that's very very important that you do that okay back to the build this is actually going to run that command one of the commands you can use and it's actually going to build the data Factor going to package up all that code for us and all the scripts that I will show you in a second we go back to the pipeline but if you see here this first piece is actually a secret variable so this is an encrypted secret variable that I created I created a variable called Dev subscription ID and to reference it in in with Azure pipelines in this yaml you put this dollar sign open parentheses Dev subcription ID and close parenthesis uh and that's going to encrypt this so it's not going to show up in the logs okay and it's going to build up our code right for that arm template here which I'll show you in a little bit then I'm going to publish it the artifact so I can go back in time and see what code I use to deploy what scripts I have you know everything there and Target path is again this root directory that we're passing in so this root directory it's actually going to create under the hood in the Azure VM the the Azure agent the agent that we're using an arm template folder to place all of those code all those scripts all right an artifact name is that ADF artifact it's going to publish it to the pipeline right so um that's all that happens in the dev piece so let me show you what that looks like from the Azure pipeline so if we go back we look at this Dev data Factory if you see here there's one artifact that was published that's that data Factory so if I actually go into this I'll show you what this looks like so again that's that ADF artifact name that we chose from the aure pipeline and those parameters that we passed in it was ADF artifact 2 that would show up as like ad ADF artifact 2 here then if you see this automatically gets generated by that npm utilities package by Factory so this is actually our code right so this first one this arm template for factory that's actually going to be um all of our our code right so like our pipelines Link services everything that we're going to be pushing to the uat and prod data factories this arm template parameters for Json that's going to be our actual arm template path right this is the actual templates parameters that we can update so like for uat we want to put this as uat right and then again here for our link service our data link link service is you want to put uat instead of Dev and then again for prod we put prod here instead um and then Dev here and then this is our Global parameter if you didn't check off that box this would not be here that's why this is super easy to use you just update the value right in here that's why that checking off that box is super important that I showed you in the beginning okay and then this next piece This Global parameters update script again if you didn't check off that box like I showed you in that Dev data fact you actually have to use this update script Powershell script so because we did we don't have to worry about this this pre-post deployment script uh another Powershell script this is used later on when we push our code in the uat and prod stages for triggers and like deleting resources that aren't in the code anymore um I'll show you this in a little bit again this is the global parameters that's if you didn't use that checkbox you don't have to worry about this this link template folder is actually used uh when if your arm template for factory here if your size is over 4 megabytes so arm templates and Azure just in general cannot be over 4 megabytes in size ours is significantly less if you see here because this was over 4 megabytes you actually have to use Link templates to deploy your code if anyone's interested in how to do that let me know in the comments and maybe I'll create a video on how to do that but because ours is well under that 4 megabyte size we don't have to worry about this folder at all okay so back to our pipeline so then if we actually go to the dev piece here this is exactly what I showed you in the code so it first checks out the source the code repository um it's going to say hey we're going to actually try to build the the arm template and scripts from this Dev data Factory in this Resource Group again that I showed you earlier in the video it's going install node.js that mpm utilities package it's going to run one of those commands from that utilities to you know create those scripts that I just showed you then it publishes that artifact um which again there all those scripts that I showed you um in that pipeline right and then it checks out and then Dev is done that actually built and wrote all the cold so now how do we actually push this to uat and prod so this is the next step um so let me actually go to the code here so the next step in our script uh one quick tip in in Visual Studio code if you don't want to see all of this stuff you want to like let's say minimize this in the meantime it'll make your code a little bit easier you actually hover over here and click on this down arrow and it will kind of you know match that and go back here right so and now we go to deploy a uat so this is the deploy a uat stage and if you see here I have a depends block and a condition block that says that only when that build Dev data Factory stage everything under this stage once that stage succeeds then we can run this okay then I have my variables uat variables um so again that's going to be in the variable folder here and again I have this resource connect con manager connection this is a service principle that I've created that allows me to deploy the code to the uat resource Group and then to the again in prod the prod Resource Group um so I'll show you what this looks like when we go back to aure pipelines but that's what this is and the name of it is this data Factory Azure resource manager which again I'll show you in a little bit and we have the uat data Factory name okay and this is important um this template parameters file path if you remember in the artifacts that I showed you one of them was that template parameters file where we have to update that Dev data Factory name to uat and so forth um instead of doing that I'm actually going to I actually create a template parameter pass for each environment to update it there so for example the value here again that's that predefined variable the root path and if I go to cicd the ADF cicd folder and I go to this uat template parameters that's that same template parameters file I showed you but here if you you see I have the updates that we want so uat I have a uat here uat and then Global parameter I have for uat as well same thing in prod right now I have it listed for prod right this will update that in the code okay and then again if I go back to the uat variables um I have the the bicep template file path so this actually is going to deploy a blank data Factory if one does not exist even if does exist it still doesn't do anything it'll just make sure that that it exists and I'll get to that in a minute uh and then again the uat resource Group name uh and this is the stuff for key Vault to deploy key Vault I only do this you don't have to do this in your code I just do this as like an extra thing for you just to show you can also deploy like a key Vault just to give an example if if you want to add to this code and not just do data fact you can easily do that with this stuff here um then a resar group location East us and the environment is going to be this uat so you go back to the code um another thing the first job I do is I have this approval check so what this actually is is is is you can actually create environments and aure pipelines I create this uat environment that says hey I I want certain people to approve before this deploys to uat so it builds the dev then I say hey I want like myself Nick and my friend you know John or whoever to actually approve before deploying this if you don't approve this it actually will fail so what that looks like in Azure pipelines uh is in this environments tab right you see I have this uat environments Tab and I can say who who do I want right to approve I can say multiple people all approv must approve first or one person um you can put different things on here um to do that and also I have a I have a prod environment as well so it just means you can put checks in place for who should approve the the the process before it gets deployed fully once those people approve it it uh deploys that it's going to start deploying a US uat this next job is going to deploy the Azure resources to uat so that's this deploy aure resource fee bicep this is going to deploy the blank data Factory and the blank key ball in order to push code to a data Factory in uat the uat data Factory has to exist already and again if it already exists it's not going to do anything it's not going to override anything it's going to be fine um so we going have another steps template here this Azure resources and then the deploy Azure resources path here this yo file and again I have the resource manager connection um again that's just going to be the service principle that it's going to allow us to deploy the instances the data factor and key to that uat uh Resource Group and then later on to prod uh data Factory name it's going to be that uat data Factory name right template path for the the the Azure resources that's going to be the ADF bicep again this is just the parameters I passed I won't get too much into bicep because it's not really this particular thing this just deploys a blank data Factory right so that's going to be the uat East US Resource Group location and then this the code to do a system assigned uh data Factory same thing with key Vault just as an extra bonus for you all um how to deploy a key vault as well if you're interested and then I put the Prem for the key Vault again check out the source repository and I use these Azure CLI tasks where I deploy the data Factory that blank data Factory um via uh you know with the CLI in bicep um again that path is that template file path right for that ADF that's going to be ADF bicep here uh again the data Factory name that I passed from up here all right and then the key Vault again it deploys the key Vault with that bicep as well okay so that's that job here uh and then I pass in those parameters from the variable template right so all the VAR templates from that template are just passed in here so this makes it really easy and like one pane of glass to deploy all of your you know to configure all of your your environments in just one pan of glass and then you can you know deploy those um and then the next step is we're going to deploy the data Factory code so now the data Factory instance has been created or already is there now we're going to push the code and update those correct parameter files so again that depends on if this resources was successful and then we use this ADF deploy template so in that ADF deploy template we're going to have the different names right the parameters data Factory name the uat data Factory in this case we're going to have the environment be that uat that we're passing in and then the uh check out again the source code uh it's going to then attempt to deploy the data Factory code to that uat data Factory in that Resource Group that'll just print out on the screen in the aure pipeline now we're going to download that artifact that I showed you earlier that had all those different scripts and then I'm just going to list the contents of that workspace you can see all the scripts that are there in the in the log um in case you're curious and this piece is actually just going to uh use the Powershell task it's going to stop the ADF triggers um you need to do this so if you have active ADF triggers they need to be stopped before you can deploy your package code to that uat or prod data Factory um and then that's going to use that pre-post deployment script that I showed you that ADF automatically creates for you um when you run that code um then I just pass in my arguments right my arm template is going to be that arm template that I showed you uh in that artifact the arm template parameters file path is going to be that file path that's actually going to be this uat template parameters right because you want those uat correct updated values in uat right so that uat data Factory uat Link services so that's just going to do that correctly um the uat resar group name and this predeployment is just saying that we are not in our post deployment this is still before we actually push our code so it's actually because it's just stopping triggers right now that's why this is true and I'm not going to delete anything yet so that's why this is false okay then we actually push our code so this is the Azure resource manager deployment task uh and then uh if you see here we're using our Azure resource manager connection that service principle that's going to have permission to deploy the code to the uat and prod resource groups um then if you see here I have this if condition this is saying that if that parameter environment right so if this environment that we pass in is uat then grab the uat subscription ID right again that's that encrypted uh secret variable which I will show you once we go back to Azure pipelines I have this uat subscription ID variable that's get encrypted um that you can reference here otherwise I know it's going to be the production subscription ID so then you use that description ID instead so if you're inition subscriptions for your uat and PR environments you can easily do this um and then um we say this is a linked artifact right because that's linked from the pipeline um and then we pass in right that predefined variable that pipeline. workpace that's where we get the download artifact from and remember we created that that folder that ADF artifact folder in that build step in that build uh templates step and then we get that armed template to deploy then that parameter file uh is that uat template parameters and then this is going to be incremental uh meaning that we just want to update our code not not delete everything first and then redo it because if it already exists just kind of update it that data factor and everything um and then after we do our cleanup right so we on our Powershell script we clean up the resources and start the ADF trigger restart those triggers uh again this the exact same stuff that we use except in this predeployment this is false because this is our post- deployment this is the last step here and then the delete deployment is true this means that if there's any resources in that data Factory that don't exist now in our new code to delete those in that data Factory so it doesn't have like pipelines that shouldn't be there anymore it kind of deletes them for you and like the triggers and all that which is pretty nice um and then once we do that that actually uh should finish the uat deployment so now if we go back to the Azure Pipeline and actually go to the uat deployment um this check that past is that approval right that so I just have it where I have to approve that environment before so approve the the deploy before it actually goes through since that now approved then it ran this um and then again it approved it it downloads that artifact to all those scripts uh it just is going to say it's going to start the uat deployment uh and then we actually deploy the Azure resources so that blank data Factory via bicep and the blank key vault uh and then we actually deploy the actual code and then again it's going to check out the code repository uh it's going to display it's going to say hey attempting to deploy that data Factory code to that uat data Factory in that uat Resource Group um and then downloads the ADF artifact so that downloads that uh all those scripts arm template the prepost Powershell script and then this is going to list the contents after we download that of that pipeline workspace folder so if you see it has all those scripts again right the arm template we don't have to use all these scripts but it just I just list them just in case you need that link templates folder right the prepost so I can use these in the script then I'm going to stop the ADF triggers right and if you see here the tenant and subscription is actually encrypted because I have those a secret variable so it encrypts it in the logs um then it deploys it so it deployed it uh use that uat template parameters file and deployed it uh to uat then it rest started the trigger uh and you see deleted anything that doesn't exist in the code already um and that's what did it and if you want to know what that uh Azure manager connection was so that's this service connection right right here this is actually a service principle and I won't get too much into what service principles are but it's just a service principle and then in order to deploy your code to that uat data Factory in the uat resource Group you need to have this service principle that gets created uh listed as an arbac contributor arbac role at least uh to deploy successfully to deploy your code otherwise you'll get an error same thing in prod you want your prod um service connection your resource manager uh as a contributor rback role in order to do that successfully okay so back to the pipeline now um since IED uat now we go to the actual prod right same thing I have an environment prod I had to approve that before it could actually start and work and then in prod uh it does the exact stuff this exact same thing that it did in uat except now that we just updated it to use the prod template parameters one other note when you do this ADF deploy if you see here sometimes you'll see code where instead of using a template a template file a parameters file you'll see this override uh you know template parameters and you'll see like you know ADF name and so on like code here that's you can actually overwrite it here in this piece and not use a file I like to use files CU it's more explicit but just know if you see that in the code that's why you don't always have to create a file you actually just do this override you know argument here and then put that through um just a note on that um and then yeah and then if we go back to prod again it does exact same thing where it downloaded it uh deployed right the bicep files deployed the code to UA to to prod and then that finished that and so if you want to see what that looks like so if I have a uat resource Group that deployed that uat data Factory and that key Vault right and then if I go in here to okay it'll have those three pipelines but then if you look at our Link services instead of that Dev it's going to show the correct uat reference same thing with key Vault uh and then the global parameter is going to have that uat Global parameter same thing with our prod data Factory right instead of this link service going to have our prod uh Link services and going to be out you know as prod throughout all there so yeah I hope this has been helpful uh if you any questions or anything leave a comment in the description otherwise um thank you for your time and hope this makes your life uh a little bit easier thank you
Info
Channel: Data Engineering With Nick
Views: 4,079
Rating: undefined out of 5
Keywords: Azure Data Factory, Data Factory, Azure, Azure Pipelines
Id: l-bBMelqifw
Channel Id: undefined
Length: 26min 32sec (1592 seconds)
Published: Fri Oct 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.