Airflow tutorial - DAGs, Operators, Tasks, Providers & airflow.cfg

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello data Pros welcome back to another episode of our Apache airflow series in our last video we demonstrated step-by-step installation of Apache airflow on a Windows PC and successfully executed our very first airflow Dag now it's time to dive deeper in this video we'll learn about the airflow configuration file explore each section inside a dag understand various operator types experience the power of Provider packages let's begin right away as we already know airflow dags are coded in Python language every airflow setup has a dags folder you can set this folder path in the airflow configuration file named airflow.cfg in addition to the dags folder this configuration file has many other settings that you can customize to meet your needs for example to enable my airflow instance to send email notifications I added another Docker container in my Docker compose this new container will locally host a simple SMTP server I then updated the airflow configuration file to use the corresponding Docker service name as the sntp host let's now take a closer look at each section inside a dag in general we begin with import statements we should always import the dag class in addition import the airflow operators that you are planning to use in your tasks we may also need to import any functions you use whether they are built-in or user defined for example in my case I developed a custom python function called clean data which is placed in the plugin folder because I intend to use the clean data function in this tag I've included it in the Imports section next create a dag object and configure the dag level parameters such as start date end date schedule interval backfill catch up and more default guards is a dictionary that allows you to set default parameters for all tasks created within a dag optionally within a task you can override these values using test specific parameters for a complete list of dag parameters and their purpose please refer to this official airflow documentation Link in the video description next we create tasks a task is created by instantiating a specific operator and providing the necessary task level parameters the parameters differ depending on the operator used so always refer to the relevant documentation for a list of the task level parameters that you can use at the end we Define the task dependencies there are several methods for establishing dependencies between tasks using bitshift operators with setup stream and set Downstream functions or with the use of chain function in addition when using the task flow API dependencies are automatically inferred based on the sequence of task function calls we'll cover the task flow API in our later videos let's try to execute this dag in the airflow UI it has completed successfully and I can validate the respective logs as well all right let's understand the difference between an operator and a task in simpler terms think of an operator as a blueprint or design template while tasks are implementations of the blueprint in python or object-oriented programming terms operator is a class and tasks are objects created from the class at a high level we can categorize these operators into three main groups one action operators these operators execute a specific function or task for instance the bash operator which is used for running bash commands the python operator which lets you run python code Azure data Factory run pipeline operator which is used to execute Azure data Factory pipeline two transfer operators these operators are used for moving data from one place to another an excellent example of this is the S3 to redshift operator it does exactly what it sounds like it moves data from Amazon S3 to Amazon redshift three sensor operators sensors wait for a specific condition to be met before triggering The subsequent tasks in the workflow an example of a sensor is the S3 key sensor which waits for one or more files to be created in as3 bucket the AWS redshift cluster sensor waits for a redshift cluster to reach a specific status it's worth mentioning that airflow offers a vast number of operators this list highlights just a few of the most common ones not all packages are included in the default airflow installation for instance if we attempt to use the GitHub operator in our code we may encounter an issue let's go ahead and try when you return to the UI you should observe a dag import error similar to the one displayed here this error is caused by a missing provider package while some provider packages are included with airflow you may encounter situations where you need to install additional ones please refer to this link in the video description for the complete list of airflow provider packages in our setup you can simply add the missing provider package name over here in the docker file alternatively we can create a requirements file and include all packages one after another for this change to take effect we should rebuild the docker image let's restart the docker containers now I no longer see an error and my dag executes successfully this plug-in play extendability and the availability of wide range of Provider packages make airflow an exceptionally powerful and versatile platform that's all for today please stay tuned for our next video where we'll explore airflow variables and connections please do like the video and subscribe to our Channel if you have any questions or thoughts please feel free to leave them in the comments section below thanks for watching
Info
Channel: Sleek Data
Views: 14,783
Rating: undefined out of 5
Keywords: airflow, airflow.cfg, dag, apache, operator, task, operators, tutorial, provider, packages, airflow-providers, airflow dag, apache airflow, airflow tutorial, provider packages, airflow operator, dag example, operator types, airflow provider, dag parameters, airflow task, airflow dag example, task vs operator, apache airflow tutorial, airflow provider packages, airflow dag parameters, operator provider packages, apache airflow dag, apache airflow basics, apache airflow fundamentals
Id: OuRiX1XQgyY
Channel Id: undefined
Length: 7min 53sec (473 seconds)
Published: Mon Oct 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.