Airflow XCom for Beginners - All you have to know in 10 mins

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's up you guys it's mark here if you want to master airflow you've come to the right place my name is mark amati i'm the head of customer training at astronomer and i'm the best-selling author in udemy about airflow i've trained over 20 000 people how to get started with airflow how to create complex data pipelines and more importantly how to make their lives easier by automating the manual tasks in today's video we are going to discover a very important concept that you will use in your data pipelines which is the xcoms let's say that you have created a beautiful data pipeline and you have multiple tasks you want to share data between those tasks how can you do that by using xcoms if you already know xcoms don't worry watch the entire video i'm pretty sure that you will learn something new about them and if you are a total beginner that's perfectly fine everything will be well explained oh and by the way i've put a lot of work in that video in order to give you the best possible value in only 10 minutes so the only thing i ask you in favor is to smash the like button that will help me a lot and it is proven that by liking that video you will become a better engineer so without further waiting let's dive into the world of xcoms to better explain xcoms let me give you a very simple use case let's imagine that you have the following data pipeline with downloading underscore data and modern underscore a b and c in charge of training machine learning models based on the data right there each model will have an accuracy and then in choose underscore model you want to choose the best machine learning model based on the accuracy now the question is in model a we will have accuracy 1 for example then model b 2 and model c3 how can we share those accuracies with the task choose underscore model how can we exchange data between truth underscore model and model c model b and model a in order to choose the best performing machine learning model well to do this we need xcoms but before exchanging data let's define what a xcom is xcom stands for cross communication and allows to exchange messages or a small amount of data between tasks you can think of xcom as an object where the value that you want to exchange is and that xcom is stored inside the metadatabase of airflow keep in mind that each time you interact with xcoms you are interacting with the database of airflow now let me show you the composition of the xcom first you have the key the key is the identifier of your xcom and that key will be used in order to fetch the xcom from the database then you have the value the value that you want to exchange you have to make sure that the value is serializable with json or pickle otherwise it won't work then you have the timestamp defining when the xcom was created we have the execution date corresponding to the execution rate of the diagram where the xcom was created that's why if you have multiple diagrams for the same dag and multiple x comes with the same key for each diagram airflow will always choose the right xcom and the last two columns are the task id the task having generated the xcom and last but not least we have the dac id from which tag the xcom was created alright now we know what composer xcom let's start using it at this point if you want to follow what i'm going to show you on your computer you can do it just by clicking on the link in the description below and you will land on this beautiful page from there you have to install docker and then use the docker compose file in order to run airflow on your machine once airf3 is running you just need to copy the following code corresponding to the data pipeline that we are going to use along that video in a file xcom underscore dag dot py under the folder dax once you have done this you can open your code editor and open xcom underscore.py let me give you a quick explanation about that data plan first we have the task downloading underscore data using the bash operator and waiting for three seconds then we have the three tasks training on the score model a b and c that are dynamically generated as you can see from the loop and using the python operator calling the function underscore training underscore model that python function generates an accuracy randomly last but not least we have the task choose underscore model using the python operator as well and executing the function underscore choose underscore best model basically this is where we will print out the best accuracy so how can we access the accuracies from the task to the underscore model well first we need to push those accuracies in xcoms let's push our first xcoms well in order to do this this is super simple as we use the python operator and we execute the function underscore training underscore model where the accuracy is generated in order to create an xcom with that accuracy we just have to return the accuracy so the value that we want to share let's save the file from the ui turn on the toggle of the dag x command score tag click on it go to graph view and wait for the tasks to be completed and now they are done go to admin x comes and as you can see we have pushed some xcoms and more specifically from the tasks training underscore model a b and c as we did by returning the accuracy from the function underscore training underscore model by the way since the xcom has been automatically pushed by airflow by returning a value the key by default is return underscore value so the question is how can you push your xcom with a specific key let's discover this with the method x command score push back to the data pipeline in order to access the method xcom underscore push you need to access a task instance object as we use the python operator you just need to path this argument ti in the python function training underscore model ti corresponds to the task instance object from that task instance object you can call the method xcom underscore push and this method expects two arguments the first one is a key so let's call it model underscore accuracy and a value which is in our case accuracy save the file and let's trigger the detail pipeline clean the xcoms select all of them click on actions delete okay then go back to dags click on xcom underscore dag and trigger it manually okay let's wait a little bit from the graph view now it's done go back to admin x comes and as you can see this time the key of our x comes is model underscore accuracy okay at this point we know how to push xcom using the return keyword and x commander score push now there are two things that i would like to talk about first as you can see we can have multiple xcoms having the same key as long as the task id is different if you use the same key with the same task id the previous xcom will be overwritten and last but not least we have this xcom that has been created from the task downloading underscore data but if you go back to the code we didn't create any xcom from the bash operator so what's going on here well you have to know that by default all operators that can push xcoms will push xcoms with the batch operator by default the last line print on the standard output by the executed command will be pushed as an xcom in order to change that behavior you need to modify one parameter that is common to all operators which is do underscore xcom underscore push so here if you type do underscore xcom underscore push equals to false then save the file and trigger the tag again as you can see this time downloading on our score data didn't push any xcom so now we have the xcoms stored in the database of airflow how can we pull them from it in order to do that we need to use x commander pool like with xcom underscore push in order to access xcom underscore pool we need to use a task instance object again in the parameter of choose best model put ti in order to access the task instance object and then you can pull the xcom by typing ti dot xcom underscore pool where you need to specify two arguments the first one is the key of the xcom that you want to pull in that case model underscore accuracy and the task ids corresponding to the tasks where the xcoms have been generated in that case we want to pull the xcom from the task training underscore model underscore a training underscore model underscore b and training on the score model underscore c like that then store the accuracies in variable accuracies and finally we can print those accuracies with print accuracies save the file and let's find out if it works from the review trigger the dag manually let's do it go back to grab view let's wait a little bit now it's done click on choose underscore model go to log then as you can see right there we have the three accuracies as expected congratulations at this point you know how to share data between tasks in airflow by using s-com underscore push and x-command-scorpool as well as the return keyword now before using xcoms everywhere in your dta pipelines let's talk about the xcom limitations you have to remember that airflow is not a data processing framework it is not spark neither flink so you should not share gigabytes or terabytes of data between your tasks otherwise you will end up with a memory overflow error airflow is an orchestrator it is the best orchestrator so don't use it as a data processing framework or you will end up with a memory overflow error you have to know that xcoms are limited in size depending on the database you use if you are based on sql lite your xcoms are limited to 2 gigabytes if you are based on postgres your xcoms are limited to one gigabyte and if you use mysql your xcoms are limited to 64 kilobytes so be careful when you use xcoms only share a small amount of data and last but not least xcoms create implicit dependencies between your tasks indeed as soon as a task needs data from another task by using xcoms well there is a dependency between those two tasks and you won't be able to see that dependency from the ui of airflow xcoms are super powerful but use them carefully that's it about the xcoms now you know how to share data between your tasks in airflow if you enjoyed that video please smash the like button that will help me a lot have a good day take care and see you for our next video
Info
Channel: Marc Lamberti
Views: 12,005
Rating: undefined out of 5
Keywords: marc, lamberti, marc lamberti, airflow, xcom, apache airflow, airflow tutorial, airflow xcom, airflow data, airflow tasks, xcoms
Id: 8veO7-SN5ZY
Channel Id: undefined
Length: 11min 36sec (696 seconds)
Published: Tue Jan 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.