Yigit Guler - Understanding Celery & CeleryBeat

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone today I'm going to talk about salary this will be an introductory presentation so if you use that in your daily life or familiar with it this presentation won't add many things to you but if you don't know anything about Saturday or if you find tasks use complicated in general this presentation might be useful for you so my name is heat I am from Istanbul Turkey and this will be my first Europe item talk so wish me luck I am a member of Python Istanbul community we are making weekly and monthly meetings and they are planning to organize a Pike on Turkey hopefully next year so be sure to see there I work in hippo it's a company where we build products so companies and startups come up with their ideas and we transform them into applications and we use salary heavily in our back-end team so before starting I want to talk a little bit about the beauty of the pythons in my opinion the beauty of Python comes from its diverse use people from many different backgrounds use Python every day from web development to astronomical calculations python is being used everywhere so it's a great opportunity for web developers to take a great advantage from the huge knowledge of the Academical site in their project for example if you are making a website you can easily integrate an image recognition library just by adding some lines of code but the same applies for the people from other domains as well imagine that you are an academician and working on an algorithm you can easily transform your transform your project to a web project by importing a small micro framework and make it web ready however the web itself has a problem its patient list have you ever seen these kinds of errors many this is mostly happens when the function the the Python function become gets very slow and cannot provide the response in the given time so we will get into late detail later so I created a an imaginary web page where people get somehow and we get their DNA files and analyze their DNA and generate a PDF file and send them that PDF file as an email and then show a thank you response when you look at the runtimes of each function in our imaginary scenario getting the user takes only 50 milliseconds which is possible because probably it's just a database connection analyzing the DNA let's imagine that it takes 5 minutes which is I think ok in in these times and sending the email it takes about two seconds and yeah then we display the thank-you message so let's imagine that you are a visitor of this page and for the 5 minutes and 2 seconds the only thing that you see is just a white page this is definitely not the best user experience in 2017 and most probably you would never see this message because most of the browser's just lose their hopes on webpages that are slower than 2 minutes so most likely your users will just see a white page and then an L page and then they will try again and try again and after 50 minutes they will receive 3 emails with the same PDF files so what if these two heavy functions could be out of our function what if that would be a way to give these two functions to someone else to another process sometime later and continue to our routine and display the thank you message so in that case we would get rid of these two heavy functions and just give the tank your response to the user in 50 milliseconds and this is where celery arrives this is the exact use case of celery when you have a function that you want to outsource say that it comes to your help using celery you can assign a task to some workers and continue on your team you will get into the details later about whatever code is what a task is but generally you can put everything that can be put out of request-response cycle so what this can be anything heavy for example sending emails sending an email can take up to three seconds or maybe more or sending push notifications imagine that you are you need to send 100 push notifications to different people just as a result of an action resizing and editing images are always a pain especially when you deal with high resolution images and third party that data storage solutions like s3 and they are the most common reason of 502 errors in my opinion and there are also some tasks that take time like taking backups the normalization and data sync issues with third-party integrations this is an exemplary seller architecture here we see that we have an application we have a message queue and we have one or more workers so this is we will see this slide many times during this presentation because this is the basic cellular architecture we don't care for the result data base or anything this is the simplest possible implementation so let's look at each element and analyze what they do the application is our main application I mean in this case it's our view controller function so this is the part where we want to make it faster this is the part that wants to outsource some of these tasks to someone else what it said it can be a website or any kind of project Silla has a great support for applications because an application just need to give the function name that it wants led to execute and provide necessary arguments that's all the application should give the function name take the arguments Celebes transform them into Pickle or json string and record it to the database celery has a beauty in support for Django because once it was a Django project now it's open to all Python projects it has support for 'the last bottle pyramid and tornado and many other frameworks it even has support for PHP so if you are writing a PHP PHP program you can also use celery because all you have to know is the name of the function that you should you want to run and the necessary arguments so you can even see realize with jason through a PHP project and the task use this was the main huge thing you see in the middle and this is very important because celery needs to record these tasks to a database to a specialized database and then process one by one there are many available task queues rabbitmq i don't know if you heard of it but it the very popular message queue database it's it's the most supported one by salary but Reddy is also being heavily used with salary because you can use salary readies in other issues as well for example you can use the same salary instance for the caching of your website and the task management and there is support for other databases for couchdb MongoDB Amazon ask us if you don't want to deal with any database and workers workers is a specific term for salary you can imagine this as another another application that is written by salary guys and it will just work and pull the database if there is any task so it will always ask do you have a new task do you have a new test do you have a new task day and night and when there's a new task is to mark that task as assigned and process it so when we look back you see that the application prepares the task data which is the name of the function and the arguments send them to the message queue which is a specialized database and as all databases - when they record something on their data on their data storage they give an ID so it gives back the ID so that the application can track the status of that task is it processing is it process that is the renner or what and there might be one or multiple workers but you should have at least one worker to process this task in the queue and they can be anywhere they can be in the same instance they can be in different physical machines difficult different locations so let's move our heavy tasks to salary you remember our example what we want is to get user DNA information prepared PDF file and email that PDF file to the user before doing that the first thing that you should do is to set up a broker database this is the biggest dependency of salary and for this example I decided to use reddit because it's very commonly used and also we should install say a true tip it's the easiest part and then let's look again what we are going to transfer we are going to take these two functions and give it to settle so we create a new file named tasks PI and we create a function inside so what we do here is basically we have a function that calls these two functions so when I call this function with DNA file an email it will first analyze it and then send email with the attachment in order to make this as a salary task you should import salary and define a salary application while defining salary application of course we can give many details but as this is an introduction speech so we just give a name and the broker URL so that say that we can connect to this message database while creating tasks and consuming tests but this is not enough we should define that this function belongs to salary so we add this decorator on top of it and thanks to this decorator celery will know that it has such a task so while waking up the worker it will look for the tasks and it will prepare a list of tasks that it knows and it will execute when the right time will come and we go back to our view function and we get the user and we give the DNA file and the user so when we execute this function in this form what we will see did not surprise us we will see five minutes of white screen and then a message maybe because we just call that function we didn't wanted to that salary we didn't transfer this to a salary task so what we have to do is to add a delay at the end of the function this is a shortened short cut function there are other ways to do that with more information but when we add this delay at the end of the function the Python will not run this function but instead of it salary package will create the task by the functions name function name and the three arguments we sent and recorded to the database and as soon as it it gets recorded to the database and get the ID it will continue to its routine and display the thank-you message so as Reedus and other databases are quite fast maybe in a couple of milliseconds it will pass that part and display the user a nice message so the user will not wait for that task it's nice to use the least happy solve the message but he will not receive the message if we wait 5 minutes 10 minutes a day there will be no email in the Inbox why because there is no other period we recorded into the database we know that we have to execute this task but there is no one to execute so we should wake up a worker this is the easiest way to wake up workers this I open this log level to better visual visualization and if we type this function and we have salary installed this is this will be what we see by the way ancillary if you don't see any red lines this means that you are very lucky everything is ok it connected to database everything is good and here you can see the tasks you see there is a list of tasks so this item is known by salary so if salary sees such a task it will just go and execute it and as soon as we run the salary workers you will see this line because we already recorded a new task in the task database and it will receive the task and start working on it and after about 5 minutes you will see the competition message of this task this is just 5 minutes because I just put some sleep functions inside and another nice part about failure is that as you know some there are some functions that can create problems especially third-party integrations when you make an integration in an email service or an integration with a push notification service there is a great chance that they can they can have some technical problems and if you do this in your view controller functions this will mean that your website will will crash because of the third parties and if you put everything inside a try function it will not give errors to a user but it will not make the job so one of the biggest advantages of using salary is that you can just reach right asks for example you can set a test retry limit for example you can say plea try this test three times by waiting one minute before each trials this way you can improve the chance of executing your required tasks also salary tasks can create new tests so if you will iterate over thousands of users and send them emails you don't need to do this in ones you can just have a function that generates other federal function is salary tasks this way when you are I trade over a list and generate thousands of new tasks the workers will go on one by one over these tasks and execute them and if one of them fail only that little one will be executed again this is a huge advantage especially if you are dealing with emails because the worst thing that can happen to a developer is not to shut down the server but send wrong emails because if you shut down the server only the people who are on the site you'll see that you did some bad things but if you send wrong emails everybody will see those emails or push notifications they are the same so that is quite useful on handling these kind of errands situations and in steadily there is also a great tool that we frequently use periodic tasks in my opinion any website or any project sooner or later will require periodic tasks in order to run periodic tasks we have a tool named Stella rivet it comes inside the salary package and it's quite easy to set up however many people confuse salary bit because they think that if they have workers the life and if they define the salary with periodic tasks everything will work smoothly but in fact that's not true because workers just exit tasks or they cannot invoke niphas by themselves so you can thinks a little bit as a separate application that sends tasks to the message queue so you should imagine just like your application so you have to keep your application alive using supervisor or something like that and you have to keep your workers alive also you have to keep the celerity process alive there are three types of schedules time data schedules are the easiest one you just give a time interval and the task just runs this is dependent to the start time of the salary beat so if you start the beat 30 seconds after the task will be initiated 30 seconds after the task will be created again so and so forth if you have if you need to have more control on when the task will be fired you can use cron table schedules and you can say send these do this thing at this time for example in this case we are sending email digests every Friday at 5:30 p.m. and it's that easy to implement this you only think that you should take attention is the time zone of course because this will be filed by the time zone of the server's so please be aware and there are solar schedules as well if you want to file tasks by the position of the Earth and Sun if you can use this one for example if you want to fire a new task when it's sunset at Istanbul you can just give solar sunset and the coordinates of Istanbul and it will send the new task just at the right moment then it's the sunset at Istanbul there are different types so there are dusk civil task nautical that astronomical so if you have time you can check all of these and find the best one there are great additions the the one that I want to talk will be flower because I don't have too much time flower is the tool to monitor your salary workers all they do is are they processing the task so I strongly suggest you to use flowers if you use steadily and some last words it's a great and simple tool for time-consuming tasks I know that there are many people who hate celery I think they have a club of celery haters but I think this is mostly because old versions the new version especially the version 4 is quite stable and if you had bad experiences with celery I suggest you to give a try again you vote the you will be surprised you should you should never forget that arguments will be serialized and then save so never pass general models as an argument because they will be serialized as a string and if you do that you will have problems corrupt data problems and everything and celery sometimes is too easy to cover architectural problems so imagine that you have a faulty architecture and you just remove everything or you make too much the normalizations and then instead of fixing this thing and removing them the normalizations you can easily get into the trap of using celery to prepare those generalizations it will be a bad decision so please avoid that so I think I'm in time and thank you thank you very much very interesting I hope there are many questions yeah so I think here what's the first one yeah okay thank you for a talk and I have several questions but I have probably the most interesting one is like how would you compare salary - more like work flow operation tools like Luigi or airflow and like when tasks have dependency on each other that you told like the tasks can produce other tasks it's like this most interesting and I have like three minor questions yeah should I ask him right first thing I did is use those the other one so I don't know yeah okay and the question for a queue you told that there are several back-end supported and that's how a we try to normalize some guarantees for Q's because like you mentioned sqs and SKS doesn't guarantee you order and also it doesn't guarantee exactly once rule for task queue mm-hmm and like what if if you use SQ s GS SES there is a chance that work tubercles workers will receive one message yes we experience many problems with a stress and for for six months we used STS in many projects but we had many problems with Siddeley I don't know why but they just didn't work or the problems that you mentioned happened the same task receiving by to work wrist so we we went back to RabbitMQ which rabbitmq it's quite easy to set up and you feel that it's the it's the original one that is written for so if you have big cubes and if you have complex few operations I strongly suggest to have rabbitmq not release not anything else but riveting view or a similar met specific message queue system okay and like another question is about crontab scheduler your example was very simple but for instance in normal cron at least in started standard cron there are issues with schedules like last Sunday of months we didn't expect any problem I'm using force so like is it based on cron or like it's a separate implementation of cross key in my as far as I know it's a separate implementation but the biggest problem with chrome tabs is that when you change one of the schedules as it records it to a text file the last run time it creates problems so it might be possible that it is being fired before you want so what I do generally at the beginning of each seller task that will be filed by a cron task is to add a control key and add something to reduce with some expire time and make sure that it will never ever run before time especially this is crucial in in mailings because for example if you want to send a mailing at Friday and you change the time in Monday you it just gets fired so we had control blocks at the beginning of the tasks okay and final question are about your you have an application object of celery at import time used to instantiated in import import time does it have side effects of like creating connection when just you import when you create an instance or epilady I didn't take students in a problem like because like when you start testing it you have import side effects and import your data text generally bad practice yeah but I didn't experience in with rolling it thank you next question yeah do something for diversity hi thank you for the talk first of all and the question is what is the advantage of the salary if I'm using it only for the background workers in comparison to you whiskey is too low for instance in my opinion the biggest advantage is that you can use Sallee from anywhere so you don't need to you don't need to operate it only single thing and you can send tests from various locations you can use I don't know if that supports for example periodic tasks so if you have a salary installed in your architecture you can just use it in in many tasks as possible and also I don't know if they have retry or not they have I didn't know okay so I think one more question and I think he is still over there so I think you can ask them all the questions I'm pretty sure you will ok so yeah thank you for that talk so you're saying you're living this heavily in production do you have any tips on how to operate very efficiently what kind of message give you use you mentioned before on me an impression that you switch from VMs and service to Robbie thank you is that what using production for the different alum yeah we tried the 3ds and rabbitmq it both works very well but rabbitmq is of course whether the only problem that I see when you have too many tasks I mean like millions of tasks I have some friends here that they are dealing with some ecommerce business and yeah there might be problems when you have too much tasks you should take attention to the overall performance of the system because there might be object dependencies the same objects might be reached by different tasks and you should avoid that as much as possible because it gets the code complicated it gets debugging complicated so you shouldn't too much rely on tasks especially to periodic tasks okay and what happens if you have a task that fails me time there just raise an exception or do you think your boss say exactly like you have analyzed DNA and the sanema say send email says does that mean that you file the entire you try all you just try to enjoy this feels kinda seems inefficient us five hours of compute time for the second yes so that's why you can you can divide into sub tasks and okay any recursively great so from analyze DNA you don't schedule assigned mean yes okay that's more efficient thank you thank you okay so let's spend again [Applause]
Info
Channel: EuroPython Conference
Views: 28,725
Rating: undefined out of 5
Keywords:
Id: kDoHrFLkahA
Channel Id: undefined
Length: 30min 29sec (1829 seconds)
Published: Thu Sep 28 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.