Airflow tutorial 2: Set up airflow environment with docker

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone and welcome to another tutorial of airflow so in this video we will learn how to set up inflow environment using docker but first let's talk about the air flow problem to get us a motivation of why we have to use docker so the first problem is air flow grows at an overwhelming pace as you may know Apogee air flow is an open source project by taking a look at the air flow github repo right here we can see that air flow has 632 contributors 98 releases and more than 5,000 commits and the latest commit is four hours ago that means inflow had news commits every day and constant releases to manage and maintain different version of air flow is already a challenge air flow is built to integrate with all databases system and Clair environment so by looking at the official air flow page we can see that we signed the energy inflow package it has some extra packages that you can pick and choose to integrate with different cloud environment and different databases so in here you can see that you have Google cloud platform which is the GCP API sub package then you can you choose or to install to integrate with Google cloud platform services they have s3 which is the Amazon Web service Postgres to integrate with all postcard database Microsoft sequel server my Seco hive HDFS and much more and you can also choose to run your own custom sub packages as well so the problem here is to mean managing and maintaining all the pendency changes would be very difficult to for example today you choose the only need you know one package or one custom package you build and tomorrow you mean to an extra and answer and answer so it takes a lot of time to setup and config in flow environment and you know when you work with any installation work for any like very difficult environment if you mess up one installation step then you have to clear everything and start over again right and after you spend a lot of time you know setup and configure environment how do you sear your custom development environment to all other developers so with all that challenges in mind that all those problem gives us motivation to use doctor to handle of course doctor is more powerful and has a lot more emotionally than what I can show you in this tutorial but a full explanation of docker is out of the scope of the fo tutorial so I can only give you a brief overview and specific use case here to use stalker to chat up and share our airflow environment so a brief overview is docker is an open platform to developing and shipping and running application docker provides the ability to package and run an application in a loosely isolated environment call a container and the isolation and security allow you to run many containers simultaneously on a given host regardless of its operating system like Mac Windows PC plow or data center so the concept of container is very similar to virtual machine in which it create an isolated environment for you to run the application only instantly the container with down messing up your global environment but of course there's some differences between a container and virtual machine in here you can see the comparison between those two so containers are lightweight because don't need the extra load of a hypervisor but run directly within the host machine kernel and your host machine kernel can be anything from Windows to Linux to Mac you know your Mac they use UNIX system so this means you can run more container on a given Hardware combination than if you were using virtual machine so the benefit of using docker is that docker is freeing freeing us from the task of managing maintaining all the air flow dependency and deployment and it's easier for me to share and avoid different version airflow environment that means regardless of your current operating system I don't care if you running you have your current opening system is a Mac Windows Linux as long as you have darker you can run all my workflow in the same darker environment then I would be sharing with you and we can all follow the same tutorial right so and I can also keep track to github and tags and releases so for example later on if I want to update my alpha version I can just an attack only lead you to my github repo and we can all choose which version we want to use for example currently we will use in this tutorial I'll be using air flow version one point 10 which is the latest version but later on if we have a next version which is like 2.0 and you can pick and choose from for whichever use and version you want to use right and also ease of deployment from testing to production environment so the same development environment we're working right here if you want to deploy it to production environment it's and just a an easy way to deploy it right so I have already created a github repo for us to use all you need to do is just go to this link right here fork and clone the below github repo and follow the instructions to set up of course when you they it needs a Rebecca set so you have to install doctor and if you don't have doctor compose have to insert as well and that's all you need all you need to do is install darker darker compose and then go to the repo clone the repo and then you you know install the darker that kind of impose that all that the only requisite and then you just follow the instruction to run the surface and you go to this link to see the service so let's see quick demo how we do it right so I'll go I'll go for a step-by-step on how to do it for you so this is the link you're gonna go to I'm gonna put all the links in the description below so if you wanna just take a look at description and follow all the links there so this is a gif repo dad created for all of you you know all you need to do is just follow the instruction you clone this repo so you can need download it or clone it so in this case I'll choose to clone it so right so we follow the first step here so go over my terminal this is my current you know github directory only to do a git clone and then the link that I just copy and I'm cloning it and then I CD to the repo right just clone and then let's go over the next step which is installed a brexit I already have darker installing running currently running here and if you want to run the service all you need to do is docker compose ups all you need to do is just run this coming in docker compose up - Dee but - Lee is just basically you know hiding the lock and you can check a lock using docker compose locks but this since I want to use show you the lock of how everything started how the service started and running I'll be using docker compose up so let's go over and see so on all I need to do is tie docker compose up and hit run and immediately it's starting all the container or if you haven't build images only be everything for you and we were seeing see all the service running in a minute yeah so you can see like it start a web server here it started databases and all the services and all you need to do go to next step you run the service and you check the localhost at port 8080 so and immediately you have the web server so let me turn it down and then you have all the deck or a work flow immediately and like I said regardless your operating system if you have darker you can run everything on the same environment I'm curring working on and you know like I said in a previous video that you can immediately go in and checking all that that you can immediately turn it on to see how it running and you can see here emmalin you know send all the task to the queue and you can really check the graph view to see how everything is running this one economy on queue you Frances quanta currently running and you can go to each task here and view the law so you have everything right so let's take a look at our docker compose Phi right - to see how to work so you have your on a little you under a little bit more about what is under the hood so let me go over the my github repo the one be sharing with you so you see here this folder dance will be storing out that and the only fun you need to focus on it is one doctor composed not yamo so this is a gamify and it is our configuration to create a flow and volume environment so by using only this configuration something fine and sharing with you and if you have darker you can have the same working environments as I do so under the hood here what do you see that running when we time you know docker compose up immediate star and pulling all these services all the images from the docker hub and build it for you so in here you we have two services right we have a postscript database that used Postgres 9.6 and a web server that we using the docker airflow image developed by Bako so the credit go to Paco and his amazing work with the doctor and flow image so bindery leveraging so this is his docker airflow image so by leveraging his image and adding some extra packages here extra airflow packages here like TCP API which is interact with Google cloud and s3 to interact with Amazon Web service and later if I need more I just adhere more and packages here and just reveal my darker image right so all up this dr. images here you can see that post-grad nine-point-six on the darker airflow here is all contributed by the docker community so we can just you know use them leverage them instead we have to install a personal self an example of this is later on if you need to develop a food stand application that require database a web server or a lot more services all you need to do is go to docker hub which is at hop calm and time in search for any services you want to use and search for the image you want to use right and put into a similar like config file I put here and there you have it for example you want to use your database as Postgres sequel you can type in post quest right and search for it and you have the four cell Postgres image here with different tag or mostly different version if you want to use a flow you can type in flow here and you have like Paco duck or airflow which is the one we'd be using and a different version as well so it's very amazing it's like sharing it's like a github repo but for different docker image that I mean you can share a different environment whatever people write we don't have you don't have to read you have to install and configure self just like pulling in writing all you need to do just run a text file very lightweight and share with people and it is only the time each one come in docker compose up and then they have a service so let's go back to our services here so we have all our service and a lot running so let me turn it off remember after you turn on the service you have to turn off and by turning off the container you type the code compose down and all of it is in L the all the instruction is an repo right so let's start over again so you clone the repo for semi-colon repo you install the practices which is the darker and you can compose and then you run the service by typing doctor compose up - D let's start over again doctor can post up - Lina immediately starting these to build image if you haven't have it and then it start all container and you can list the container by typing dr. container LS and you see - service is currently running right now which is the webserver and the post class and you go to the link which is localhost 8080 and then you have the services and you can play around you can run them you can schedule them whatever you want to do with them and after you've done playing around with them oh I you can't I'm not gonna compose locks here to see that the law can really running right and after you've done playing with them you time docker compose down to turn them up and that's all you need to do it's very simple right so this is the end of our video in the next video I'll be talking about Google plan composer which is another service from Google which is much easier to setup and scale your production airflow environment with them worrying about using docker or so they have any infrastructure so thanks for watching this video and I hope this video is helpful to you don't forget to Like subscribe and hit the back button for more video thank you guys and see you in the next video
Info
Channel: Tuan Vu
Views: 138,226
Rating: undefined out of 5
Keywords: python, docker, datascience, apacheairflow, etl, datapipeline, dataengineer, airflow tutorial, airflow tutorial for beginners, apache airflow, docker tutorial, airflow tutorial python, airflow for beginners, airflow docker, airflow explained, airflow example, apache airflow tutorial, apache airflow tutorial for beginners, data science, data engineer, data engineer tutorials, data engineering projects, kubernetes, docker compose, docker container
Id: vvr_WNzEXBE
Channel Id: undefined
Length: 14min 48sec (888 seconds)
Published: Thu Nov 22 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.