Demystifying Celery - Sam Clarke

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

get started and can you hear me at the back there yeah can you see the slides okay yeah I know there's a bit of light but hopefully you can see everything I'm going to be talking about today okay so welcome this is demystifying salary by myself Sam Clark originally this talk was entitled a salary in production I'm I've since widened the scope a little bit so anybody who is new to salary that's never used salary before shouldn't get lost as we go through the talk okay so if you came here expecting a talk on vegetables you're in the wrong place this is your chance to leave and go to a different talk but if you did come here expecting a talk on distributed task queues written in Python then stay here this is the right place okay a quick word about Who I am I work for a creative agency based in Vancouver Canada called rival schools we build a lot of web applications using Python I'm also lead developer on a product called script speaker which basically takes movie scripts turns them into casted audio files and that's written in Python and we use heavy use of rely heavily on salary for that okay so the reason I wanted to give this talk is be a based on my own experience getting started with salary based on an experience of colleagues and what I witnessed online is that getting started with salary especially for new developers isn't always a smooth process there's some pain points some pitfalls some gotchas and some areas that don't make a lot of sense the first time around so the idea of this talk is to take some of that mystery out of getting started with salary and getting up and running also for new developers it might be the first time they've been introduced to distributed architecture and that's intimidating also although salary is written in Python some of the underlying technologies and protocols aren't Python and that could be a source of intimidation for new developers as well okay so I'm going to give you a quick overview of what celery is and how it works we're gonna dive into adding that to a Python application we'll look at how you move from a development development environment to a production environment with celery and finally we'll have a look at how to log and monitor your application when it's running in production okay so what is salary salary is written in Python obviously I'm salaries a task queue so what is a task queue a task queue is a queue of work to be done that is executed outside of the user request cycle which means work can be executed in the background when the resort resources become available to do that work I must mention the Django salary is also a useful package for Django specifically but in this talk I'm going to focus just on the pure salary implementation so you're thinking about using salary the first question you should be asking yourself is do you need to use surgery if you have a very small application where their work is executed synchronously then you may not benefit from having a task queue at all however if you have any work in your application that can be executed asynchronously you can possibly see a performance benefit for using celery so using any task queue also if you're configuring application to be performant and you don't have much in the way of available resources again you can get a good performance boost from offloading some of that work to a task you bear in mind when you're introducing any third party application to your application any third party tool then you're introducing code that you have to take responsibility for now can add some complexity so there's always this trade-off between Garrick and getting performance boost but also adding some complexity to your stack okay how does salary work the first core concept of I'm working with salary is that of brokers and backends and basically a broker is your mediator between your your worker salary workers and that communication is done via messages the underlying protocol of how though that those messages work depends on which broker you pick and these are several options here rabbitmq and register by far the best best practice options to use salary was written with RabbitMQ in mind so rabbitmq really is the best fit for your broker at least initially salary also takes a back-end optionally if you don't think you need to get results from your tasks or query your task state maybe you don't need to use a back-end but if you do want to get any task state or get results you will need to have some storage mechanism and there's a variety of options that celery will accept here memcache D any database RabbitMQ again RabbitMQ is not a good fit here it's a task queue it's not persistent storage so don't use that my recommendation here is to use Redis and a quick word on what sorry workers and concurrency artsy's and the next kind of two concepts to wrap your head around when getting used to salary i'm so worker is basically your salary process and salary I sell one salary worker can spawn multiple child processes and it's these child processes that actually go to the task queue and do your work you can specify the number of processes a worker can kickoff and that's the concurrency argument you can do this explicitly sorry will default to the number of CPU cores on the machine there is an order scale feature for salary I've had mixed results that have witnessed mixed results online with using that auto scaling and you know CPU spikes etc so our best practice that we found is to specify concurrency directly when we start the worker okay so that's a really brief overview of what salary is and some of the core con with salary so let's have a look at how you would add salary to your application to your Python application so you're going to need RabbitMQ server for sure optionally Redis if you're using that as a storage back-end or whatever other storage back-end you've decided to go with in your virtual environment you're going to install salary and and then you're ready to import salary and instantiate it so very briefly in the most basic configuration you instantiate salary the first argument here is an arbitrary string convention is to specify the the modular salary is instantiated in in this case app top PI's or simple files so we're gonna go with app broken back-end we specify here we've got RabbitMQ running as the broker and the back end is Redis optionally you can also use a configuration file and you pass it in as an object and this allows for a bit more explicit configuration and here we're using environmental variables to configure our boat or broker and back-end because obviously this could be different in in development and staging and production and one thing to note that since salary 4.0 I believe all this configuration is lower case whereas if you look at all the configuration files for salary it's it's all uppercase for some reason okay so we got salary installed we got a virtual environment up and running with salary inside it so we want to kick off a salary worker so typically when I'm developing with salary I'll have two terminal windows open one terminal window will be my Python interpreter the other world or my web server running and the other terminal window will be my salary worker so I kick off salary I use in a sari worker command you pass this so this app flag here this is usually one of the first kind of pitfalls or mysteries when people encounter salary if you don't get this right salary will complain about relative imports and you'll get a you know some funny looking output so what were actually doing is we're specifying the relative path to the salary instance from where we start the worker so for instance my salary instance is instantiated in apt or PI so I'm going to for my project route say okay AB salary is my is the path to my salary instance salary workers start from here and yeah it's easy once you know how it works but it cannot you know it leads to some confusion the first few times again we're using the - C flag here to specify the number of child processes this would be a typical output in the terminal window there's a lot going on here the things that I'm interested in as developer is that my configuration got picked up correctly we've got rabbit and pure Redis we've got a rap name and under tasks I want to know that my salary tasks have been registered registered correctly here we have two tasks I am associated with app module and that's the ad and send email task and we're going to use those functions those tasks in the next few slides to explore how salary works ok so registering tasks so this is a really contrived and simple function in Python that will be used typically to send an email to a user for example when a user signs up to to your application we pass a user ID that user ID is used up to look used to look up a user profile and then when we call the that function we call it in a typical way in passing the keyword argument of the user ID converting this to a salary task is very straightforward we import our salary instance and we use the dot task decorator this wraps this so you see here we don't actually modify the function in any way we just wrap it with a static decorator and when we want to call this in our code we call it using the dot delay method instead of directly and all this does is that puts that task now on to our task queue a quick word on state in stellar e don't pass in anything that's stateful to a salary task obviously we don't know if this is going to be X cuted in a few seconds or in a few minutes or in a few hours so we can't if we pass it in for example the user object we can't be sure that in the time between that user object being passed into the function that that user hasn't gone and updated their email address or they've deleted their account and we're dealing with stale data so only passing values don't pass in objects okay so that send the email function that we just looked at that can quite happily go send an email to a user and we don't necessarily want any return value we don't want to resolve from them so this is another very contrived function something very similar it's very simple where we return the sum of two integers but we do care about the result in this instance so when we call the delay and we assign the output to a variable this result is not the output of that that task that function and again this can be trip new users of salary up we're signing the output to a variable but we're not getting the return value what we're actually getting is an async result object and it's this object that we can store a reference to and use that to look up the result when we try to get the return value so here is very simple example where we we call this function using the dot delay method and we use async result dot ID to look up that that task that specific instance of the task and when we have that result we can then ask if it is if it is ready and if it is ready then we can get finally get the return value from that task from that function async result object has a variety of useful methods some of the first ones you're likely to encounter is get like we've seen and things like dot successful and dot ready can be used to determine in our code what we want to do at certain points if we find out where the task is dot revoke can be used to explicitly take the task of the the task queue and if we use the the terminate equals true argument here we can actually kill execution of that task if it's being executed right now okay so salary tasks have default internal states and this can be pending whether the task hasn't been picked up yet and started the task has been started but it hasn't finished and success of failure the task has finished in one of those two states or the task has been revoked and again this is a similar to how we were getting the result before but we're actually explicitly asking if the state is one of success and then we'll get trying to get the return value if they're the state is something else then we might want to do something else with that result we might want to wait you might want to typically poll for the result in salary so sometimes those default states aren't enough and this is an example from some source code of ours and script speaker where we have an internal method that they can actually determine the the progress of about function what we're actually doing is rendering audio files to disk and we're trying to calculate the percentage of that that is done so we can update the the task state here with this custom state of progress so we know that that's our custom state and we actually just put in the percentage done as a float into this into the states and then we return that to the client typically like a web browser and we can use that to render loading bars for example there is an argument for in some cases for not relying on salary to report when tasks are finished completely in this instance were modifying our send email slightly our user has this email sent equals false property and what we actually do instead of relying on salary to tell us that that task done where we're sending the email and actually changing that flag to true in the case that celery fails for some reason and it is unreliable reporting task status we can do an audit of views of profiles and know whether that email was sent off successfully or not and this is summary of a much more in-depth and great blog post by dan Poirier of cactus and he goes into a to more detail of why you'd want to do this some of the scenarios surrounding keeping state outside of celery and keeping that new database just a quick word on relying oh sorry there's the tasks acknowledged equals text ax late I am global variable and the way the Sri works by default is that it acknowledges the task before it actually executes it and the scenario is where you actually want to only acknowledge the task after execution and so that if salary fails halfway through it will retry that relight Tyler chest we try that task sorry and only tell you once it actually has finished correctly this is good if your tasks are idempotent you know that you can keep running that task and it's not going to have unwanted side-effects and this is one approach to having some increased reliability around salary you may or may not benefit from using this approach okay so we've seen some very simple functions as salary tasks that run in isolation sometimes you want to compose more complex workflows and and have tasks that depend on each other and work with each other and salary canvas is salary API that provides for this so sorry canvas uses the concept of signatures and signature is really just a serializable representation of a task and this allows us to pass tasks across the wire and use tasks as arguments do other tasks and there's a variety so there we go there's a variety of default signatures sorry canvas ships with I am group chain code map I don't have time to go into all of these right now I will highlight chain because this is something that we've used extensively and basically training tasks means you're linking tasks together in a composable fashion so we actually we can then call our functions using the the dot s which is short for dot signature method and so what we're doing here what we're doing here is we're using those add functions from before and we're calling two of them in series using the the dot s the signature method here and what is actually happening is we're calling the first one with the two integer arguments and the result of this first a call actually or the return value actually becomes the initial argument for the second so for example here five plus seven is twelve so the arguments for the second add call are actually twelve and ten so again we call this whole thing using dot delay and in a quick word about getting results out of this so if we try and get the result as we did before here you notice I'm using dot result rather than get but it's the same approach we are actually getting the final return value and that's typically the return value from the final executing ask if for some reason we want to get output or the return value of tasks further up the chain we can't reverse using the dot parent hierarchy to go up the task chain and look at results internal in the in the chain okay so you have salary up and running we've registered some tasks and we're using them in our code so a quick word on testing salary tasks obviously if you're testing your tasks you don't really want to boot up salary wait for tasks to be added to the task queue and then assert some some tests on on those tasks so good approaches here you can use always eager mode and of salary which actually tell sorry execute everything synchronously if you just have a couple of tasks in your code base you can just call them with apply async method instead of the dot delay method and again this will call that task or that function synchronously directly Jango has its own sorrow' it's test suite runner which makes some of this a little easier but that's that's Jango stuff okay so we have salary up and running and we can test our functions so now we want to move from development to a production environment and this is not shouldn't be as intimidating as it might first seem we're really using the same core concepts that we've explored so far the real change here is that we need to run salary and demon mode rather than manually so we need to run sera in the background we also need to be able to restart that if something goes wrong depending on what OS you're running on you've got various different demonizing tools at your disposal upstart works very well for us a very contrived and simple App Store configuration might look like this where we have a stands changing to our virtual environment and executing our worker as we've seen before and we're just respawning on and when the server goes down okay so yeah we have a we have salary working we have salary hopefully running in on our production machine we know how to test but we don't know how to monitor salary and see if things are starting to go wrong so when it comes to logging again and a pitfall or a common culture is to try and pass the the route logger from from your code into a salary task and then wait for some some logs to happen and this typically doesn't happen you don't get any output salary actually ships with its own old logger which is the is the preferred way to get logs into your into your tasks we'll have a look at that in a second there is a way to hijack the route log I this is not really recommended you can use this option I'm much better to have a dedicated salary logger and you can do it per task where you instantiate a logger for each task and for us we've always found the best approach is to declare one salary logger at the at the beginning of your module and just use that to log out any useful information within your tasks again if you just passed the the route logger into your into your sorry function you probably won't see you won't see any output and you'll be wondering you know where your logs are another really really useful tool is flour also known as flour and this is a really nice web interface for monitoring your amount of monitoring your salary processes it's a really quick way to see what's going on with your salary workers if how many tasks have executed successfully and if you have had any failures it will give you nicely formatted stack traces so you can quickly see what's going wrong with your salary processes you can also restart tasks and pause from from this web interface as well and we find this would be two invaluable as your first kind of port of call to to know going on with your salary a default stupor 555 that can be changed and you can lock this down if security is an issue installing flow are much the same as salary and you start one instance of it there's no concurrency ok mmm a quick mention to task priority so Sara RabbitMQ sorry it does natively support a prioritization of tasks but the best practice here really is to if you need to have some tasks that are prioritize more than others dedicate certain workers with the amount of resources that that you need so for example a high priority worker you might give it a lot more resources CPU CPU cores to execute that work much faster than maybe low-level tasks that can happen in the next few days that you can give that you know just a couple of CPU cores and that's fine okay I am just obviously talked about salary here I would be remiss if I didn't mention there were a couple of alternatives huy which is pure Python and RQ which I believe is Python and Redis I don't have much experience with the either of these I know they work very well but they are nowhere near as configurable as celery okay so if you don't remember anything from this talk then pay attention to the last slide and obviously if you're designing code that's there's blocking you have tasks to block each other then you're losing all the benefits from having an asynchronous task queue so avoid this my recommendation and you can argue with me about this is rabbitmq is a broker as the best fit and Redis is a great back-end don't send objects or any kind of state to your tasks keep it values only and if you need to look up something the database do that when the task is being executed use the salary logger don't rely on the logger of your application so one final thought and do you need salary yes it does add some complexity to your application but my experience is that even small web apps that I make these days one of the first third-party tools that I install is celery because as soon as I start doing any meaningful work in my web application then I'm going to want to farm some some tasks and work off to a task queue so I think that you'll find benefit even for small to moderately sized web apps thank you I one node on my github there is a small demo project that demonstrates some of these processes and in code thank you

Info

Channel: PyCaribbean

Views: 3,139

Rating: undefined out of 5

Keywords: Demystifying celery, Sam Clarke, Celery on Production, Celery, PyCaribbean, PyCaribbean 2017

Id: 7ZkZr7apcJs

Channel Id: undefined

Length: 26min 42sec (1602 seconds)

Published: Wed Mar 08 2017