Azure Databricks Continuous Integration and Continuous Deployment Demo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody my name is kumar welcome to my channel nomad on cloud this is my first video so please share your constructive feedback regarding the content in this video i will be covering the ci cd implementation for azure data breaks the agenda for this video is a brief introduction on azure data bricks cicd implementation repository structure how the artifacts built by development team will be placed within the repository that is github i will also be walking through the build pipeline followed by a walkthrough of release pipeline finally a working solution demo without any further delay let's start let's start with the introduction to cicd for azure databricks in a traditional software development process cicd is defined as the process of developing and delivering software frequently by using automation pipelines for azure data breaks cicd is defined as the process of developing and delivering notebooks and libraries frequently by using automation pipelines software is replaced with notebooks and libraries for azure data breaks this is a very short intro which gives an idea on what artifacts are involved in azure databricks cicd now the next topic repository structure for seamless ci cd the basics is to define and adhere to standards at every step which is part of cicd one among them is source code repository in our case this is the structure which we are recommending in our repository we have three folders the first one cicd scripts this folder is dedicated for devops team developers should refrain from contributing or committing any of their codes within the cicd scripts folder the second folder notebooks all the notebook files which the developer wants to deploy on azure data bricks workspace needs to be placed within the notebooks folder the third one the package folder this folder is also again for developers to place their libraries if they are building any wheel files or any java files they have to place that within this package folder next topic build pipeline walkthrough let's get into our azure devops portal go to the build pipelines we have a databricks demo build pipeline click on the scene get into the edit mode as you can see here azure hyphen pipelines dot yaml is the file which is used to define the build pipeline this file is also stored in the root directory of the git repository now what have we defined within the azure hyphen pipelines dot yaml let's see the first three lines are comments so let's ignore that fifth and sixth line trigger we are defining the trigger that is any comment on the main branch of the data bricks dash demo repository will trigger the build pipeline 8th and 9th line pool name hosted ubuntu 16.04 we are going to use microsoft hosted ubuntu 16.04 agent to execute the steps defined in the build pipeline what are the steps so the steps are defined in the steps section as you can see below so the first step what we are executing in the build pipeline is install python this version should match the version on the data bricks cluster how can i validate that go to the databricks workspace click on the clusters select the cluster go to the advanced options as you can see in the environment variable it is using python 3. once we install the python the next step is to install the required python modules which is a must for the build to succeed like pi test request setup tools wheel databricks connect so databricks connect is a spark client library which is used by build agent to connect to databricks cluster and execute spark code now the fifth step is to connect to the data breaks how are we connecting environment variables these are the environment variables where have we configured this environment variables it is configured in the variable sections as you can see here workspace region url workspace region url this contains the data bricks workspace url okay now the next one cse develop pattern this is nothing but personal access token where can i get that within the user settings under access token tab we can generate the token and that token will be leveraged by databricks connect to connect to the databricks workspace or cluster so that is the path token and this should be safeguarded in a keyword since this is a demo i have kept it in plain text the next one existing cluster id so where will i get this value again go to the clusters section click on the cluster advanced options tax as you can see here the cluster id this is what is configured in the existing cluster id workspace org gid where would i get this this value can be found in the url as you can see here and also in the query parameter fine so all this environment variables are sent as input to the data bricks connect so that we can connect to our databricks workspace or cluster now the sixth step this is used to download the latest code from the designated branch that is from the main branch in our case all the latest code whatever is placed in this repository is going to be checked out and downloaded now the seventh step the seventh step is about packaging the library code the core whatever is placed within the package folder folder of the repository as you can see here this setup.python file is leveraged in this section to package the library now the eighth step here we are packaging the artifacts like [Music] library notebooks cicd scripts etc and placing it within binary directories folder right and then the ninth step we are going to compress the binary directory folder zip it and name it with a build id this file will be generated and then this file is published as a artifact under the data bricks build section to visually demonstrate the same as you can see here this is one of the build which got succeeded as you can see here one of the artifact is published and as you can see data bricks build this is nothing but this value under this one we have the build id dot zip that is nothing but the build id dot zip okay and what is within this zip file as you can see here within the 64.zip ci cd scripts notebooks and packages folder are created as you can see here this is the folder cicd scripts right and then package package right so these are the folders which got created and what is that lips folder what you see here this is where the wheel library which we want to install within our databricks cluster will be placed so these are the artifacts which we have published as the output of build pipeline next topic release pipeline walkthrough let's again get into our azure devops portal this time we'll get into our release pipelines we already have a pipeline called databricks demo sorry now we are in the release pipeline and as you can see here i've named it very badly new release pipeline let's get into the edit mode now in the artifacts section as you can see here i am taking the artifacts which i published in the build pipeline databricks demo as the input for my release pipeline using this as the input i am going to perform six tasks within the release what are those six tasks again what agent i am going to use here i'm using a microsoft hosted visual studio 2017 hyphen windows 2016 agent okay this is the agent which is going to execute and perform the release and it is going to use the data bricks demo build pipelines artifacts as the input now the first task what i perform is again install python again i'm going to install the python version which is the same as databricks cluster the next step extract files as you can see here the artifacts with which got generated in the build pipeline were zipped as you can see here so this zip i'm going to extract in this task once i extract the next task is about installing the python modules which are mandatory for our release pipeline to succeed here we need only request once this python module is installed the next step is to deploy notebooks to our data bricks workspace as you can see here my databricks cluster is configured in the south central us region and within the artifacts folder as as you are all aware as you are all aware the notebooks folder whatever notebooks we have those notebooks will be placed in the shared workspace what is shared workspace as you can see here this is the workspace where this files will be placed after successful execution of the release pipeline and we are again passing the personal access token the same token which i had configured in the build pipeline is configured in the release pipeline variable section as you can see here again this is a very confidential information which should be safeguarded in a keyword since this is a demo i'm just keeping it as a plain text now with this inputs this task is going to perform the activity of deploying the notebooks to the shared workspace now the next task databricks dbfs file deployment as you can see here again the region where my databricks class workspace or cluster is installed is in the south central us region as you can see here the packages libs folder 64 package lips the content within this folder the wheel file and the tar file both the files are taken as you can see here any file pattern whatever is the extension we are taking everything and placing it in the ellipse folder how can i validate that now if i get into this one test now let us try to run this script run cell it is trying to attach the cluster so that it can perform the execution of this command let's hold on for a minute yes it is executing now as you can see here whatever files i showed you the tar and the wheel file you can find it here this is the wheel file and this is the tar file right so whatever was in the lips folder as you can see whatever was in the lips folder this is the lips folder these two files any pattern i'm okay with tar i'm okay with wh i pick all the files and i put it into the slash lips folder and i have shown you how to validate also whether this files got successfully copied into our databricks dbfs path and again the input for the same as csc qapad this is again the personal access token which i have already explained now the final step install library on cluster here as you can see here when we went through the repository section i explained that this cicd scripts folder is dedicated for the devops team we have placed a python file called install wheel library this file takes these arguments like workspace path token cluster id and the libs path and the dbfs path the dbfs path is nothing but the slash slips and as you can see this environment variables are configured in the variable section as you can see here slash libs cluster id the url and the path token these values will be passed as argument for this wheel library python file and it is going to perform the activity of installing the wheel the wheel which you are seeing here this wheel would be installed within the cluster where it will be installed you can go to this libraries and you should be able to see this wheel file getting installed if the status is installed that means our release pipeline runs successfully without any issues so this completes our release pipeline implementation now let's get into our demo as you can see here i'm using jupyter notebooks i have created a simple notebook file here where i have a simple print command one more hello world okay when i execute it's working fine i have downloaded this file let me download again download as notebook ipy show in folder control x now i'm going to place it within my repository okay replace the file now this is my clone of the repository which you see here okay this is the repository which i have cloned on my local disk now since i've added a new file called one more dot i p y and b i'm going to check in the same to the repository okay let me show you what is the status as you can see here one file has got added one more dot i p y n b now i'm going to add it then what is the status it has been pushed now to the staging now let's commit it with saying added one more dot i p y n b file [Music] okay now i'm going to push it to the remote that is main as you can see here this is going to push it to the repository once this is pushed you can see here added one more dot ipynb file now when we go to our pipelines and then build pipeline this is going to be immediately triggering the continuous build because any commit on the main branch of the data databricks demo repository is going to trigger the build pipeline now the job is running zero artifacts generated let's get into the job python got installed now we are loading the dependencies as you can see here databricks connect and then it is connecting to the databricks connect then checking out all the files from the latest branch that is the main branch once the checkout is complete it built the python library all the changes publish the artifacts okay now let's get back and as you can see here one of the artifacts got generated the artifact is 65.00 since i have configured continuous uh deployment this pipeline should be immediately triggered let's see whether the trigger got yes as you can see here release 30 is initiated now the 65 dot zip is used as an input and all the six tasks which we discussed will be performed waiting for azure pipeline it's in the queued stage now it is in the in progress state now the agent the microsoft hosted agent is allocated to perform the task of deploying the notebooks and libraries to the databricks cluster as you can see here the download artifacts is done python got installed then the zip file was extracted then the request pip install was executed and then the data bricks notebooks deployment is getting executed connecting via bearer that is the path token now the notebook deployment got succeeded now we are doing the databricks dbfs deployment it is going to place the wheel file in the libs folder and that got succeeded now the wheel file which we had placed in the ci cd scripts folder is executing and installing the wheel file to our cluster still it's running let's login in the meantime now the python wheel file got installed again successfully and that's the reason as you can see here since it is showing the status as installed hence you know you can see here it is reflecting that it was able to install it successfully then the finalizing and closing of the release pipeline this completes our demo of how to implement ci cd for azure data breaks hope this content was quite informative once again thank you for watching and share your constructive feedbacks if any thanks a lot
Info
Channel: Nomad on Cloud
Views: 3,171
Rating: undefined out of 5
Keywords: Azure Databricks, CI, CD, Continuous Integration, Continuous Deployment, Azure DevOps, Databricks, Demo
Id: qFvcddrrZnA
Channel Id: undefined
Length: 30min 46sec (1846 seconds)
Published: Fri Oct 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.