Run Airflow 2.0 via Docker | Minimal Setup | Apache airflow for beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on everybody welcome back to the series getting easy with airflow this is the third video of the series in which we are going to be looking at setting up your own minimal airflow setup on your local machine using docker the reason i said minimal is because we are going to use the simplest executor of airflow that is available which is sequential executor the reason for that is just to help you get started with the airflow so that you can play around with it create your dags and experiment and learn how airflow works so to get started you first have to make sure that you have docker installed in your machine they have this available for mac windows and linux so download this for your own required operating system the reason we are using docker is because it is a containerized solution so you don't have to install all of the dependencies on your local machine in instead we'll create a compact environment which is very easy to spin up and it is cost effective and you can fastly deploy it on any of your environment i already have it installed so after downloading and installing you can test this by typing the command docker version mine is 20.9.7 we will be using the tool provided by docker called docker compose that is a small orchestration system for docker that manages multiple containers for us so make sure you have that installed as well which already comes with the docker when you install it yeah so we'll be using docker compose version 1.29.2 with the falling build so the official airflow has already provided us the docker compose yemel definition for airflow you can head over to this and click here which by default uses salary executed but for the simplicity we will first be going for the sequential one so we will amend this file and let us begin so here is the modified docker compose file let us go through what we are doing here by starting off with the services so services are actually defining what sets of containers we need to run as part of this docker can compose in which we have like three major components that are required by airflow we have mysql database used for storing all of the metadata database of airflow we have airflow scheduler and we have the airflow web server all of these components have one thing in common i would add this on this one as well so airflow common is a definition which defines the image which we are using here 2.1.1 python 3.8 and in this definition we have some environments as well defined as a flow common environment which will be used by our scheduler and the web server so airflow scheduler uses airflow common the command it uses is the scheduler then we are using the environments defined over here telling the airflow what is the address of the database sql alchemy connection in this case we will be referencing this to our mysql which is a service that we defined here with the root user and password all as airflow we will allow the airflow to load all of the example tags for us you can view all of the available configurations of airflow on their official website as an example you can see the default configuration file on their github page as well as an example we are telling airflow to use the executor as section sequential executor which falls under the core component of the cml file either we can provide our own config file to the airflow or we can override these variables by using environment variables by using the format as a flow which will remain as constant underscore underscore the name of the section and underscore underscore then the name of the variable then we have volumes defined that will be used by airflow to store the logs and if we have the stack folder defined in our local machine airflow is going to pick up that tag as well you will see how that works in our next video we will be writing our own tag we are mentioning the user here just to make sure that the docker user and your local machine user has the same permissions this is only required if you are in linux or mac make sure you have a dot enb file on the same folder where you have this docker file with the same permissions as we have mentioned on your docker compose and finally we are saying that all of the services that are using this airflow common definition must have to wait for mysql to be healthy first so this healthy status is checked through the health check which is defined over here so it runs this command after every two seconds to make sure that the service is running fine and then we have our docker volume defined as mydb which inside the docker container uses this volume so this volume will be shared across all of the containers that are running on this service so we are defining it over here in the end moving to the airflow init service that uses airflow common using this image and these environment variables and all of these that we discussed earlier what it does is overwrites the entry point of this image because by default in that image the default entry point is airflow so that's why you see in our scheduler we are only telling this container to run the command scheduler because the first command in the entry point is airflow so in this way it runs airflow scheduler and then the scheduler starts but in this case what we are doing is we are overriding the entry point telling this container to use the batch script and run whatever command we are going to tell you on the command section so first off what we are doing is running a flow db it which initializes all of the database components then we are saying to update the database which makes sure that the schema of the database is up to date and finally we are creating an admin user of airflow by using the airflow users command then the role we are giving it to the user as admin the username is admin we are setting the email address we are giving the password as airflow so the new version of airflow image does has an environment variables to do this job for us as an example you can see here that if we set this environment as true airflow db upgrade true it actually at the back of the scene runs this command init and upgrade and if you set this variable as true it actually creates the admin user with the following username and password but i don't prefer this way of using it because you are hiding away the things that are actually happening at the back of the scene in the airflow for the developers so if you are learning i need to understand what and how airflow works you need to get yourself familiar with what commands we need to use and of course we have more flexibility not only you can create the admin user but you can create more users by writing in more commands and you can like add custom connections you can set local variables on the airflow and whatnot so you basically have much more flexibility by doing this way and of course we are using the common environment variables that we defined over here to use the sql alchemy connection and load examples so the scheduler container is as simple as you run the command a flow scheduler of course we have a restart policy as always so if the container dies docker compose will restart itself you have volumes to find well in fact you don't need that because we already have those defined on the commons volumes so you don't need that over there my bad yeah so it is depending on well in fact we don't need that as well because yeah we have that defined over here then the last one is the airflow web server which runs airflow web server command to spin up the web server i'm using 8 0 8 1 as a local host uh port because usually my docker desktop uses eight zero eight zero port so on the right side you have the port which is used inside the docker and on the left side you have the port is the one that will be used on your local machine then we have a health check and restart policy always and environments variables so now you got to know what this docker compose file is doing so let us spin up our airflow first of all we are going to execute the airflow init container that will set up the database so for that let us write docker compose up airflow init so wait till this my sequel container completes because airflow in it depends on my sql service you can also view the logs of the docker containers from the docker desktop app so as you can see oh airflow and it is being executed now so this is initializing the database it will update the schema and at last it should create the admin user cool as you can see airflow admin user has been created after it has completed the migration so we are now ready to spin up the airflow scheduler and the web server so now let us spin up all of the components of docker compose so for that we are simply going to say docker compose up as you can see airflow innate container says admin already exists in the database okay it looks like everything is up and running now so let us go to the browser and view our airflow web server which should be running on localhost eight zero eight one and as you can see on the localhost eight zero eight one we directly land into our login page which is admin and the password was airflow we will jump right into the airflow main page so these are all of the example tags that are loaded into a flow by default and of course you can hide them by turning this parameter as or false and yeah from here you can view all of the tags you can trigger the tags you can view the logs and the status of each of the tags and the tasks and and almost everything you can control from here as an example if you look at our example bash operator just simply turn this icon to enabled and this tag is going to trigger and don't forget to turn on this auto refresh to view the updates in real time it will relatively be slower as i told you before this is a sequential executor it executes one task at a time which is okay for learning purpose fast forwarding this and you can see we have tags which are running now and you can also view the graph view where the green color indicates the task has been successful and the pink ones represent they have been skipped which is the behavior of the specific tag because we are logged in as an admin user so we can see all of the admin features like security and admin panel and airflow version 2 also has full-fledged api you can get the documentation either via swagger ui or redox so basically you can control your airflow by the api because they have almost all of the api commands for each and everything to get the connections to trigger the tag to get the tag status event logs and of course whatnot as you can see docker is already throwing all of the logs into our log directory you have the scheduler logs we have each of the task logs and of course if you want to add your own custom tags you can simply add the dag file into this stack directory we'll be looking at that into our next video where we will learn how to write your own first tag so till then stay tuned i hope this video was helpful to you if it was please don't forget to like and comment and definitely share this with others if you are first time into this channel please don't forget to subscribe this is going to keep us motivated to bring such kind of useful stuff to you in the future till then i will see you in the next video thank you so much
Info
Channel: MaxcoTec Learning
Views: 25,432
Rating: undefined out of 5
Keywords: apache airflow for begginers, airflow docker, airflow 2.0, apache airflow, airflow 2, sequential executor, run airflow local, apache airflow tutorial for beginners, apache airflow python tutorial, data science, apache airflow hands on, apache airflow 2.0 tutorial, apache airflow tutorial point, airflow tutorial for beginners, airflow 2.0 tutorial, airflow 2.0 docker, airflow 2.1, airflow docker compose, apache airflow docker tutorial, Airflow docker container
Id: TkvX1L__g3s
Channel Id: undefined
Length: 12min 22sec (742 seconds)
Published: Sun Aug 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.