Airflow tutorial 7: Airflow variables

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone welcome to another episode airflo tutorial so in the previous tutorial we have you a data pipeline in airflow using Google bigquery from scratch starting from this tutorial I will talk about some advanced features in air flow specifically in this tutorial we will start to explore how to use air flow variables so let's get let's get started so what is air flow variables variables are key value stores in air flow metadata and data base so it is used to store and retrieve arbitrary content on settings from the metadata and data base so when do you use variables so variable is mostly used store static values like config variables a configuration file list of tables or lism IDs to dynamically generate ants from the benefit of using airflow variable is that now you can separate the variable from the pipeline code so usually a workflow auerbach is defined through the Python code right and we used to store and keep track of the dag to the source code control like github but if we turn some of the config variable to airflow variable then you you can access or modifying the airflow variable to DUIs so if you remember in the previous tutorial when we build a bigquery github train down we have these three configurable and now if you want to change any of them right and I have to go in here directly in the code and change them what if I can access them to the airflow UI and the code is only one version of it right and I want to keep changing it every time I want to call come in here and change the config file and this this one can be changed multiple temp right so if I can separate the code from these three variable and I can just go into the FO UI and modifying I'm gonna change them one by one and that will be a lot better so before we dive in let's first learn how to work with variables so for this tutorial I want you to clone the github repo of associated with the air flow Torrio and the first thing you want to do with the CD to the repo and type compose pop - tea and this was star the airflow environment up right so to work with variables that you know variable can be listed created and updated to the UI so when you store the airflow environment this is what you see right this is the info environment and if you go to aunt Minn variable this is where you can create edit or delete your variable and if you have chase on setting fine you can even do book upload it to here to do I choose fine and you can upload it all of your JSON setting right here and it will only create the variable here pacifically for this tutorial I have created this workflow for you to go to so if you go to the example into example index we'll have this example variable back and this is specifically for this tutorial that look at the code for this tutorial that we will go to so usually when we define configure our variable we define it you know commonly we just put it directly into a code but this is a very bad example because a very bad new such because every time we want to change it or edit it or delete it we have to go directly into the code and manually edit it or change it here so if we want to change them into airflow variable this is what we going to do so let me show you here so we go to admin variable right and this is where we're going to edit it so the first valve variable here for example the key will hit create the key is for one the value is value 1 right it's safe and that will create a key value pair like I said fo variable is a key value store in the airflow metadata database now variable 2 we do the same thing key value right it's saved in variable 3 we do the same thing here key value here and we hit saved so we have three variable so now we have these three variable right and in order to access you can access it to variable to get 4 1 by 2 by 3 and we access this by the key and it would return we back the value of these three value all the key value pair that we put it here now this is a fine setup for database connection but it is not a recommended way why is it so let me tell you so the recommended way you restrict the number of airflow variable in your deck so sin airflow variable a store in metadata a database any call to variable would mean that a connection to the metadata database right so in steps are a large number of variable in your dark which may end up separating the number of allow connection in your database it is recommended you to store all of your data configuration in sign a single airflow variable with the JSON value so if you can store it instead of one two three variable you put it as a key value pair like this so the best way for you to create its key value pair it's one air flow variable is that remember I told you that info can upload a JSON file so this is exactly what I'm gonna do so in this that config folder I created a JSON file so the key a example variable can fix so this is the config for our example variable DAC and this is the key value pair now that and how are you gonna gonna put this file into the fo UI you're gonna do something like test so I have in the same thing in the - folder here I have in the config file a config folder I have the example variable that JSON file and which I have now show you which is just a key value pair up all the variable that we need and now you know air flow - and all you need to do is just track it here and click import and then voila we have the key which is the example variable config which is storing all the very all the config variables for our deck and this is the all the value of key and we can access it by what by variable that get and then key in l airflow grab oh this is the key right here and this is the exact name and we put these super I chase this on I mean it turned the whole JSON and deserialize it and then we can access it right for one by two Vaudrey with it so we only use one connection to get the key value pair of the air flow variable and then we can access this answer dictionary it's very simple like that so this is the recommended way for you to create only one variable to use to our home you know workflow or your dad but also another features which in flow is that if you want you can directly access a variable using what called Jinja template so for example if you have you know key call VAR 3 like like we show here var 3 and the value of its voluntary you can access this by using the ginger template format like that so VAR dot value inventory so it will get the value of this key in the air flow variable database and also if you have a JSON config or JSON format like this instead of using variable again you can do exactly like to change the template here which is var JSON that example variable config and then the key inside that JSON variable now let me do an example of this whole deck to see what is the input and port at which tests so if you want to run the test only specifically this test you can use this command so is to run to the darker environment and run the test only from this test which is the test idea base is kept a convict and all it does it it print L or eco L in the batch environment the DAC config that we get here right so let's run through this in this command to see what is the output this task so tired here I'm gonna paste in here and one so you can see running come in here it's eco and the dad config we do is correctly set key value of value variable 1 this is the value of variable 2 and this is the very variable the value of variable 3 and this is the elbow just print entered a convict that we get from the airflow variable now if you don't want to get that a config here you can also do the ginger template instead of variable to get this is the same thing it's variable and get the value of entry and so let's run through this right so let's take this task the same thing now we're gonna test test number 2 and it's doing the same thing it's equal and this is the value of variable 3 and this is the output and what if you want to do the same thing but to get the value of variable 3 instead of from this airflow variable here you want to get it from the Chaisson variable then you will use this ginger template here which is var that chase on the key which is the example a variable config and then the key will appear inside that json dictionary right so we're gonna run through the stands to see the output and yes this is the same thing as we expect so we just learn how to access the inflow variable in sign l - right so instead of setting in a common way now we do it in the recommended way - so now we can you know edit or delete or access the variable in the UI here but another benefit of using air flow variable is that you can even access this from the command line I mean you can run some crud operation on variable instead you have to go in here click edit or get or set the access the variable here you can now do it in the command line so I put a couple of command line here for you to try for example if you want to get the value of variable one trying the key evil one and a value of it's one one and you don't want to you know log into your inflow and get it you can do it from the command line so how we go do it we're gonna run through this command so it is a flow variable and get and then the key of the air flow variable you won't get and it returned to you the value of it another comment is what another comment is a set value of variable for now we've want to set or edit the value of a new variable you can do something like this so that means inflow grab your set and then we if the variable have didn't exist it will only create a new one and set the value of this so it is key value right so if you go to the UI and if you refresh it you should see before and value for right and you can even set the you can even import the variable case on fine so if you store so that is why I have the config file locally here and the benefit of it is if you have a giant config file I would a lot of variable hundreds of thousands or dance on a variable to to power your workflow you that I've never seen in that bad but I see in the case when you have like tens tens of variable right so you can create a JSON file and you start someone else and every time you want to edit on the or delete on you know make some changes to your variable you can just make it changes in your Chaisson fine and you import it directly and it will change here in the UI so if you want to change this instead of you know key and here is value III you want to change it to something like test and it's safe and then we can run through this come in here so we run through like a compose and flow variable and when we import that file and then that would make an edit to change the value of that key so now it's going to run through and import the new file and you see the output here one of the variables successfully updated and we go back to the inflow UI and we refresh it you see this really between change to test so that's how you deal with variable and make everything much easier from now on because all the config file all the configurable will be stored directly in the airflow metadata database and then all of this key value pair of variable is globally access that means all of your deck so not just this one deck but all of the deck or all the new decks then you want to create and you want to access that variable all you need to do is variable that get in the key of that variable and that's it and you access it so it's not just power locally your deck if you put it here this all this variable is locally you're gonna share it and you cannot access this another deck you have to do a lot of copy and paste right but if you put it in the inflow variable if you immediately store in the database then office is globally accessed and I mean all it will power all of your deck if you have a config file that share among and then and you change one it will affect all the - changes that would be a lot better so with a lesson learn in this example variable that how are we gonna make our bigquery github train that better so as I told you we have this config variable here right now we don't need them anymore we will turn them into a JSON file so this is the big wicket uptrend variable and this is the key value pair that we turn these three into the key value pair here and I wanted to do is storing the JSON file upload to the air flow UI and now this is the correct way to access this variable we're gonna do something like this instead of what we do as before we do the same thing we've turned into a config of one config variable and we just access it to variable that gap and this is the key similar here the key of the variable the name of the variable and because this is a JSON file we have to do this civilization equal to and it will be serialize it and now we can access all of the you know key value pair in this variable and flow variable as follow similar to a Python dictionary so this is the end of the toriel so in this video we have learned about air flow variables and when to use it to make out that better so I hope you enjoyed the tutorial so far don't forget to like the video and subscribe to the YouTube channel for more great content thanks for watching and I'll see you in the next video
Info
Channel: Tuan Vu
Views: 34,256
Rating: undefined out of 5
Keywords: python, datascience, apacheairflow, etl, datapipeline
Id: bHQ7nzn0j6k
Channel Id: undefined
Length: 16min 42sec (1002 seconds)
Published: Sat Feb 16 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.