setting up apache airflow with postgresql database

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello Internet welcome back to daily quotes yesterday we set up the project of data engineering that will help us to script data from to websites and load that data into a postgres database today we are going to set up airflow to ensure that the airflow can be able to run and be ready to receive the tags and the tasks for the tags so by the end of this video we should be able to run airflow scheduler and airflow web server and we should be able to also add a user an admin user into this airflow server that we are going to create then we will see the output later this is a bigger project that I worked on and I've broken it down into several pieces of videos and each video should be independent on its own if you don't want to watch any other video and you just want to see setting up of airflow this video will cover everything that you will need to know to set up airflow to work with the postgres database but if you want to follow along the whole project this is kind of the second video in coding on setting up this is a previous video yesterday where we set up all the folders that we'll need for this project and there is a another video that is before that where we I showed you the whole project and what is expected about the project so today let's do airflow so the first thing we are going to do is to determine where the metadata of airflow will lie either when SQL Alchemy I mean sorry either in sqlite database which is a portable database you can have um in your project folder or um in in a database management system that is bigger and moreovers like postgres and mostly we are advised to have uh post Grace I should saving its metadata in a robust database management system like postgres or MySQL or Oracle and so forth so now we are going to go to postgres ensure that you have postgres installed in your system installation of First Grace and PG admin is out of the scope of this so you try to find out how to install postgres on your computer whether using Mac or Linux or Windows and then I am using this UI called PJ adminform you can also find out how to install it um into your system then you can follow along let me increase the size of this so that you can see so I'm going to create a database and I'm going to just right clicking here into pigeon mean and let's say create database then I'm going to call this database um web data web data then I suffix it with YouTube so that I know that this was a database I created for the YouTube tutorial then I'm going to click save so it will utilize all default settings for postgres database for your local system now as we have created this database we are going to create a user who a an airflow user will be using this database or we can use the postgres user but it's advisable that you have a flow user so I'm going to open the query tool I just right clicked on the tables and selected the query tool then I'm going to write a query here so I'm going to create a user so I'm going to create user and this user is going to be called air flow YouTube user YouTube user this user will have a password so we're going to say create user airflow YouTube user with password with password I want to use the same password for the video purposes but um if you're doing this in production then you should have a randomized password as many characters as uh you you wanted to confuse people who want to argue so I'm going to create this user then I'm going to write another piece of query to give privileges to this user so I'm going to say Grant or privileges grant for privileges on this database that you have created on database and the name of the database is here that we created web data it's web data YouTube so on database web data YouTube and then we are granting all privileges on this database to that user to airflow YouTube user so I'm copying it only paste it here I want to confirm the name of this database so I'm going to to try to make this big or relay okay let me reduce the size of this and scroll then I scroll down so I have web data YouTube web data YouTube separated by underscore so let me run both of these queries run so privileges does not have Double L so the first one ranks successfully so it only has this so I'm going to select this second one then I run all privileges what's happening syntax in here privileges I think there is something wrong MJ let me type privileges again but no I'm going to use uppercase p r i v i l e g e s privileges okay I think I type too wrongly so airflow YouTube user does not exist let's create the user since this one never end so the role is created here can secret one then let's run this again okay the grant is successful so I'm going to link you the link to this code here it is part of the scripts in the in the repo so you'll find the link in the description below uh if in case you want to just copy and paste but it is the same thing you can just write create user then the username with password and the password then Grant all privileges on the database the database name to the user the username you have created so now we have created this we we now need to connect airflow with this user to use this database as you can see in this database on the public schema we don't have any table so we are going to connect it and then we are going to initialize the database when I initialize airflow to use the database and then you will see the method data tables created here so let's go back to terminal so in the terminal here yesterday we created a virtual environment for this but um you can work on your system if you don't want to work on the virtual environment but we're going to use the virtual environment with the the airflow version that we created yesterday so Source the end then activate so if you're using Windows then this will be a totally different command then after that before this we need to first of all set up a config for airflow so that airflow can connect to this when we initialize it so let's go back to the project remember the airflow folder we created yesterday and then we did airflow version and it brought all this here um let me make it bigger for you to say so yesterday we created the virtual environment then we did airflow version and we exported the the two variables uh that is that was airflow home and the python path to make sure that it's pointed towards this folder here and then we did the airflow version and it created all this for us so now we are going to be working with this file here it's called airflow CFG so CFG is like config short of config I assume then inside here we are going to look for uh uh uh we're going to look for a variable called I think SQL Alchemy connection so I'm going to just find Ctrl s Ctrl F then I type SQL Alchemy connection here so this SQL Alchemy connection is the one that we are going to change to to use um postgres so originally the the configuration is so that the metadata for airflow is stored in sqlite and the path to that database is our airflow Home Path and then inside our airflow home is where the DB is created so we are not going to use this part here so I'm going to either remove it by just deleting it or commenting I prefer commenting it out then down here I am going to copy this paste it oh no we did it twice then I'm going to type postgres so it is post Grace ql and we are also using psycho pg2 psycho pg2 which was also installed in the requirements when we were creating the virtual environment yesterday so in that previous video every time I say yesterday meaning the previous video because I recorded it yesterday and the link to the previous video is in the description or in the card above this I would recommend you look at it and see how we created the environment and how we installed everything um yeah but let's change this I assume that you know how to do that so um after doing this we're going to say postgres SQL it's just like a URL then we're going to put the user the user we have created here is airflow YouTube user so I will copy that and replace it here then we after that we're going to put the password but separated by column so user colon password the password we also give us the same as the user here please don't make this mistake when you are pushing your code to production make sure your password is something very random then after that so your user password at looks like an email so my database the host of my database is the localhost because it is on my computer but you may have a postgres database hosted in the cloud so you put the IP address of the host or the the URL https blah blah blah so mine is the localhost so just type local host oh I could type one two seven Dot 0.0.1 but that's the local host and then the name slash name of the database the name of the database is here web data YouTube that we have just created so this so this is the only change that you will have to make onto the airflow config file inside airflow home folder that contains the airflow thing uh this is the only change that you need to make there are other changes for example if you want you want to use a different schema then you change a certain variable here SQL chemist schema this is SQL can be support databases with the concept of multiple schemas so if you want to use something that is separate from the public schema then you'll use that but as you know what you should know is that uh the user always points to the use of airflow with the SE always points to the public schema so there there are some changes you need to make you have to read the documentation and make those changes but for in this case uh we want the metadata to be in the public schema the reason is that if we created the schemas it should be for the data for the API so this is the change that we have to make then we are going to save then after that we'll go back to our terminal then we are going to check if we have the other two variables that we set yesterday so I will do Echo it is just bash then the variable here is air flow home and it has written nothing that is because whenever we do an export of a variable it is only kept in that terminal session if you close that terminal that variable also disappears and I don't want to put that variable in dot bash RC because it is for a simple project that will be later stopped or I will stop continuing working on this project maybe after a month or so so I don't want it to be permanent on my system so I'm okay with uh exporting it every time I need to use it so export airflow home airflow home should be equal to print working directory so the directory that we are in but the directory that we are in is this one we want airflow home to be the airflow directory which the airflow directory is inside this so we're going to put a slash after printing working directory and then we put air flow so enter if we try to Echo that variable again we get the result here that this is the path tool airflow home then we also need to export python path because we have a local python let me do LS minus a uh we have a local python package here python three points 10.0 I have several python python versions installed in my system and I'm using I am please look at the previous video and see how I set up python to be on a local folder so we are going to export python path to be in this directory that we are working in so PWD so also if we do air con dollar pay python but we get this link please take note of the difference between this and and this uh python path has now underscore separating python but airflow home has underscore separating air plant home so now we are ready to connect our airflow with a database and initialize airflow there is Abyss with one command so we are going to do that command is AR flow DB in it so airflow 2 we're using 2.4 so air flow 2 uses this airflow if we need um airflow one was using airflow in it in hdb unit and DB was separated by underscore so when we do this it's going to to put to create all the tables for metadata so let's do enter make sure before you run airflow TV in it you have airflow home pointing towards Where You Are config file is located because it is going to look for this and then decide to use this database if you don't do that then it will use SQL Alchemy I mean sql8 why am I confusing SQL camera it will use SQL Lite and it will create it in your home folder in your system in your home root directory in your system so you need to be careful as you're doing this so let me run this and hopefully it has some warnings and One of These Warnings is because I am using python 3.10 which airflow at this juncture has not yet is not yet tested on python 3.10 but since it worked for me with airflow version 2.4 then it is just fine I'm okay with the warnings so the initialization is done with lots and lots of warnings and one of the warnings is that I'm using in all settings so warnings don't hurt us until they become errors so let's look at the database and Let me refresh this schema public schema I'm going to just right click it and then click refresh then let's look at the table so you can see that these tables here there are several tables that are created here is this table this table this table this table it is wrong there is done their job there is log table and so forth so airflow is initialized now the next thing we are going to do is to create user so we're going to create I mean airflow create a flow create users then the first flag is minus minus user name and the username I'm going to call it um the elements then fast name there's a second flag first name I'm going to call it darling and then last name I'm going to call calls then the row is very important is admin with capital a then email I'm going to say dalmas apps daily coins.com then the password please select something random um the password I'm going to use here will be dalmas password then enter so it's going to create there's something that I did draw saying invalid Choice create opposed to me airflow users create not airflow create users um let me just delete that at least I am updating it without looking at stack Overflow for the error so you should subscribe for that airflow users create Please Subscribe because I'm not using stack Overflow well I use the stack overflow a lot when I was reciting about this so it's very important for a programmer so run this yeah it's running it's creating in the user that much is created withdrawal admin so now we are free to um run airflow scheduler so airflow scheduler is the server that helps in scheduling tasks keeping jobs remembering when to run tasks um yeah and the dependencies of tasks and it is the one that is also used to load the the dugs into the system to for airflow to see that the Ducks are running so we are going to run airflow scheduler in this terminal where I'm also going to open a new terminal initial foreign problem with my system is it launches Anaconda every time on my system so I'm going to just deactivate so I'm saying conda the active it I clear that so I mean my my folder that I Edge projects I'm going to activate the virtual environment they have been activated then we are going to export remember every time you launch a terminal the other variables don't exist so we're going to export the two variables which I'm okay to do every time so I'm going to export a flow and just go home equals to the working directory please don't forget these macdicks this Matrix means that whatever is inside here is a bash command needs to be run the output is the one that will be assigned there so slash AR flow then I'm going to export python path about python right being equal to this directory so in the first in the first terminal I'm going to run airflow scheduler so I'm going to type this airflow this is scheduled and they run it so they are running uh yeah to giving us a lot of warnings but we only care about errors for now then we're going to run air flow web server in the next terminal remember I have not closed this I have left this server to run for airflow scheduler then I'm going to run airflow web server here gives me all these millions of warnings then it will tell me that it is running it already told me somewhere here if you don't have warnings you'll just see it's running into this link so I'm going to copy this link this link here G unicorn server is running with that then I'm going to I don't have don't have a browser running let me launch this sorry for my football then I will paste it here yeah so we can see let me just make this bigger we can see that the if that airflow is running and it is telling us to login so I'm going to type in Delmas we remember the password the password was dull Mass it was the password for for airflow that we created when you're creating that many user on the terminal so sign in and we are here it is telling us do not use sequential executor production So currently we're using sequential executor because we are not on production so it's giving us a warning and this interface we already have uh example dogs inside here uh so yeah also directed as I click last graphs um inside here so now we know that airflow is running and we have set it up um when we want to work with airflow you don't want to look at this example dags because these ones are just for tutorials um when you're learning airflow so we are going to go back to airflow config file and remove this and there is there is one there is one variable here in airflow config in airflow home that is used to do that and it's called load examples so I'm going to kill the other processes that are running here in this terminal I'm going to kill airflow web server Ctrl C then I'm also going to kill airflow scheduler Ctrl C yeah so let's go to our config file where is it it is here then we find that is Ctrl F or command F and then we're going to look for load example so here it is saying loading examples equals to true we're going to change that to load examples equals to false then you save then after saving we are going to run again airflow scheduler the reason why I stopped it is because if I just changed it we have flow schedule I will take long before it realizes it so I just stop it and restart it I've also restarted the airflow web server so I'm going to just refresh here and we shall be able to see that they are will be no ducks so let's wait as this one taking long yeah we can see that they are not Ducks now they are not done the reason why they are not dunks is because we have not created any dug here and the airflow inside airflow home we are supposed to have a folder for that so I'm going to create that folder called Dags and in the config file here if you look at one top we know dark folder is here and it is pointed inside airflow that's that's why I've created dags in case I wanted my dags to be somewhere else I would change this path and make sure that airflow has the rights to read and write uh into that path the same as logs the logs will be having their path somewhere here this is best log folder is the path where the logs will be so you can see the logs here this is one of the logs and it is this path here you see air flow then logs here so so far we have set up airflow and we have added a user into airflow please try to do this on your own and see if you will be able to Launch airplane your system you can be pausing the video and looking at it again until you get it right I am very sorry for people who are using Windows I don't have a Windows machine but I would have shown you how to do it on Windows but for mac and for [Music] Linux which is the same process I hope also for Windows will be the same process uh so tomorrow we are going to be looking at fetching data from websites using python and uh hopefully we can also scrap the data that we fetch and put it in airflow in a short period of time but if it's not then we can do that in two videos so until next time make sure what you've learned today you share it with other people you can share this video you can also just teach other people what you've learned if you're at work you can show them if you're at school you can also just show your friends that I have learned this thing called airflow how to set it up I'm going to show you and through teaching you master your skill so until tomorrow bye
Info
Channel: Dali Codes
Views: 9,861
Rating: undefined out of 5
Keywords: data science, data engineering, software engineering, machine learning, deep learning, natural language processing, game theory, css, javascript, python, nodejs, vuejs, reactjs, django, hadoop, django rest framework, flask
Id: auyfJq5b9Io
Channel Id: undefined
Length: 35min 47sec (2147 seconds)
Published: Wed Oct 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.