Task Queues: A Celery Story

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
first week of this afternoon is Tom medicine he'll be speaking to us about task queues particularly salary and where and when you may need to use them please make him feel welcome hey everybody as Mitch said this talk is titled toss you so celery story we're gonna start with just a little bit about Who I am I'm a professional Python developer at a company called polymath you actually work with Mitch I sit opposite him so thanks for the interim edge so polymath you in we call ourselves a bespoke hosted mathematical optimization company basically what that means is we build tools that help big industry usually mining make better decisions and save them or make them money I've been there for two and three-quarters eight years as of now I'm also a fifty-year software engineering student at the University of Queensland and I love Haskell but unfortunately no Haskell today Python is pretty great too although all the details are on there there's the plugin math um logo alright let's get into it so this talk is going to be mainly in two parts first of all an overview of task queues themselves and second of all some specific tasks you libraries for Python as well as how we've used celery and other libraries apply matthean and a few terrible jokes so yeah let's just dive right on in starting with tasks use more generally what is a task queue task queues are queues of tasks crazy what that actually means is there's a queue of messages where each message represents a unit of work or a task and then these messages are distributed by a message queue which we call the burka each of these are distributed to different processes and those processes are called workers that actually do the work that the task describes these are processed asynchronously although you can when you start your task block waiting for it to finish that little diagram there is just stealing AWS infrastructure diagrams so we've got web servers which are ec2 instances then we've got a message queue that's Amazon sqs in this context and that sends messages onto our workers that do the processing it's generally pretty simple stuff relatively easy to understand so when might I want to use a task queue so there are two fundamental requirements when you're using task queues first of all is the work you're doing can be run asynchronously and second of all you're in a context where distributed systems make sense because you've got to go out to your broker and back in so they're all separate interacting systems and what you get from this situation when you use a task queue you get a degree of architectural flexibility and the main benefit that I find with that is it allows you to make your architecture asymmetric what I mean by that is your workers can be totally different to whatever is sending the request for TAS to be run because we're in Python we get extra stuff which is actually an artifact of us having less stuff in the beginning task queues are really useful in Python for a thing called process but process based parallelism which I'll talk a bit on further so basically when you have a sink and a distributed system you get some architectural flexibility and you can see that that particular architectural flexibility is also asymmetric oh let's skip the slide yeah so a process based parallelism we're in Python land and we've got a thing called the global interpreter lock I call it the Gil some people call it the Jill it's like gif and jiff in C Python which is the normal Python interpreter memory management isn't thread safe so that means that when you change stuff sorry when when memory is released that if another threads using it there's no real guarantees but we get those guarantees by adding locking with the global interpreter lock and that says that only one thread can execute passing code at once and that means that if you have high intensity high processing stuff written in your Python because only one part of that can run at a time using threading to do things that have high computation in Python is pretty rubbish and it's all run at once anyway so a lot less useful when you use task queues you have your workers in separate processes which means each separate process has its own tilt which means that each process has one thread of execution in parallel and so if you're able to distribute the work you're doing across these workers then you still get all the parallelism benefits it's just running in separate processes tasks use happen to be a pretty nice way to coordinate this so just from some linger at ask you to provide message passing concurrency because you're passing messages which is an alternative to shared memory concurrency which you get when you're using threads in other languages like C and C++ so yeah just some common use cases for tasks use user signs up to your system you want to send them a registration or a validation email it doesn't really make sense for them to wait on the web request to complete while you send them an email so instead you could spit up a task and send that to a worker because you don't have to wait for that to complete it just has to be done eventually other ones generating image thumbnails you don't have to immediately generate the thumbnail people could download the full image until you've got a good something already if you're updating some complex and difficult to compute cached value you can just update the cache over on the side and keep using the old one in most contexts in most contexts one use case we use is we generate reports so Excel spreadsheets for stuff to download and then data parallel workloads so workloads where you do the same bit of work on a huge amount of data you just chunk up the work and send it to different workers that's certain to process based parallelism I was mentioning there are some drawbacks so starting a task is a lot harder than a normal function call because you've got to write your message into a stream and then you're going to send that over the network and then you've got to pick that up on the worker and then you've got to deserialize it and then you've got to do the work and then you've got to say hey I'm done basically there's a whole lot of reporting that adds some overhead so if you've got these tiny tasks that you could just do a little in a web request please don't do them with a task queue second of all is state tracking if you've got a lot of incremental progress going on task queues aren't always the best approach usually you can tell when a task is done or started and that's about it and you'd have to implement your own state tracking layer of the top realistically what you want in this situation is to write your own worker system and just use a database to track the state task user for sending a quick message picking up the message and then a message back if you use the database you get to write your state and you write any extra information because you want it to stay there yeah and then in passing it's hard to serialize and deserialize some objects for example functions so there are some use cases where you'd want to use a tie to you where you can't so basically in short task queues are great but they're not magical secret scale source they provide you architectural flexibility but you still have to remember that there are drawbacks and you need to know what you're doing all right enter celery celery like I say on the slide is the standard in toss cues yeah it's really simple for a simple use case you literally just write celery workout and then suddenly have a worker and then on the other end you go task delay and done I'll show you the code sample in a second it has a lot of features and as soon as you start using those features it really really blows up in complexity it's insanely powerful there's so much you can do there's like complex message routing features toss chaining basically everything you could ever imagine and support for a bunch of different message brokers support for storing results on different backends there's a lot of features it's really cool and really powerful which is part of why it is D standard and all of these features are really robustly tested which is pretty great so here's a code example I'll actually be doing a few code examples throughout this presentation this is what all of them are going to look like so on the left hand side we have the shared code so this is how you define this is a task that I want to be able to run with my task queue and then on the right hand side we've got a client function and a worker function so in the client you do a delay one and two which says I want my task to task queue to compute one plus two and get the result back in celery result don't get is blocking so it sits there and waits for your task to complete and then it gives you the result adding two numbers is a horrible way to use a task you don't do this this is a trivial example for demonstration purposes but yeah so all of my code examples are going to look like so cutting in you've seen tiny bit of celery now I'm gonna talk about polymath Ian and how we used celery so like I said we call ourselves an industrial mathematics company we build optimization tools and the way we use these is we build web apps these web apps basically all have a very common workflow of input your data configure some settings click the Run button and it solves a really complex optimization and then you get data visualization and you get some reports to export and you can do stuff with those reports this isn't how all that stuff works but pretty much 90% of it does this all of our software is deployed on AWS and we kind of have this implicit goal of making use of AWS software wherever possible although we do have some on-premise deployments so it still has to be flexible and then what actually makes us the big bucks is we have these really powerful optimization engines that solve hard problems for people and because they're hard problems to solve they take a lot of memory and a lot of CPU and so that basically means that task queues are awesome whoops bad animations yeah so we've got these high memory and high CPU requirements which means that we've got different resource constraints between the web server and the workers perfect for asymmetric architecture these optimizations take a while to run and so they definitely are too long for a web request and we can definitely wait for them to complete a synchronously perfect for tasks use this workflow with import optimized export it's inherently asynchronous again perfect for task queues and we actually just add nifty little extra is we have a fixed cost support model for all of our clients which means that we can just have the service sitting there and if they try and do too much stuff it's perfectly fine for tasks to sit in the queue while they wait to be processed which is awesome for us save this money so yeah basically task queues are awesome for us yeah so as I said earlier celery is D standard which is why we initially chose it for our task queuing implementation it really well supports the use case of a fixed pool of worker machines but yet what we would do is we would have a bunch of machines we would run a fixed pool of workers on our fixed pool of and then we would report back logging messages as we were going and we would save those in the database so in terms of actually coordinating the tasks salary gave us enough and we didn't need any particularly complex their tracking on top of that but we did want to report back progress messages so we used us we use the database to write log messages but unfortunately celery broke our hearts like I said celery has a lot of really complex and rich options that are incredibly powerful but unfortunately that means that there's an incredibly large set of configuration options and to make it worse there are four different ways to configure your celery setup and it's impossible to diagnose miss configuration because those four different ways all have a lot of the same settings and if you've got settings interact with each other there's no way to tell how they interact and so you get these weird dependencies between different celery settings basically if you google celery horror stories on the internet everyone will talk about oh wait we miss configured celery unfortunately I miss Brianna's talk yesterday but apparently they even had issues with celery configuration so pretty common problem what made it worse was that the defaults and celery were not at all configured for our use case salaries designed for processing short tasks that yeah they're a bit longer than a web request but they're really quick and because of that it's got a lot of prefetching defaults set up so it'll grab a bunch of tasks process them and then once it's done all those it'll grab some more tasks and this is to reduce Network communication and makes great sense for the use case where you've got a lot of short running tasks but given we've got solves that run between one minute and one day it kind of didn't really work in our favor because I we prefetch tasks on the same worker and then they would never get run because we were running a day long solve with another thing pre fetched and all this spare capacity and it actually took a surprisingly long amount of time for all of these issues to become clear as we got bigger people were using the workers more frequently and so if you had two things queue at the same time your worker would pick up someone else's tasks and it would sit there and get executed but once we reach this point we kind of were screwed we had to do something and celery wasn't going to cut it so we ended up oh yeah so just for your benefit in order for our setup to work we had to do three separate configuration options in two places these are the settings this is what's enabled this is what you have to do to disable all prefetching and salary' I will post my slides on slack but basically the top two can be configured with three of the four configuration options and the bottom one has to be configured with the fourth and not the first three yeah weird weird dependencies and also if you have the first one enabled it changes how the second one behaves weird dependency it's like I said anyway yeah we looked at a bunch of other Tasca installations crazy enough rolling your own is actually pretty easy if you've got a good message coming library you can write a simple enough task you and about a hundred lines there's a flask snippet that does it in 60 so task queues aren't that complex until you have to add a bunch of extra features which everyone does so the ones we looked at I listed there I'll be going through them one by one first up ah Q stands for read askew it's really really simple like dead simple super easy to understand the code is really easy to read it's really well documented and the execution models super simple you can actually look at Redis and understand what's going on internally because the structure is that simple which makes it really easy to get high quality monitoring so over on the slides you can see task definition we don't need the task decorator thing like celery had it had at celery dot task and you come up with an rqq and you add your task to the queue I think that codes wrong that's bad yeah so what the code actually should read is that first line should be qnq at brackets add one two and then everything else is exactly the same I've used this OS system shorthand here to demonstrate you just run this stuff on the command line so you all you do is write IQ worker and then whenever it receives a message with a task it'll import the task from the correct location which is why you don't need the decorator it just goes and uses pythons import system yeah Huey Huey is it builds itself as a little task queue it's got a lot less code than salary and it's a lot simpler but it actually has pretty much all of the same features the one you do lose is it only supports Redis and SQLite unlike slurry which supports Amazon sqs Redis aimed 8 HQ MP AMQP the the rabbitmq Queuing protocol and some other stuff and it has a kind of nicer toss chaining api them celery if we have enough time I will show you the toss training stuff it's really cool but yeah otherwise pretty much all of these are very similar you have decorator and then you in Huey by default they're always the default call will be asynchronous whereas in other libraries you have to specify that you want the asynchronous call and it's got the same get API dramatic again basically the same except you've got send instead of delay and you don't have to wait this will this call the block which is less than ideal again you just run it command-line except this one you point to a module path and that module path is used to configure your dramatic actors this one has rabbitmq and Redis support but terrible Docs basically if you want to do anything beyond just sending a task it's not fun tasks dagger task tag is actually really cool it was the first library I found that did distributed locking I found this before I found here we distributed locking is where over all of your different workers only one can enter a section at a time and it actually provides a bunch of extra queueing criminals which are really good and it's architectural II really sold it just supports a lot of stuff that makes failure less painful unfortunately all you can do is find out your task is done you can't find out any information about that you can't find out if failed you can't find out what the result was but again API is exactly the same as everything else they're all really simple which shows that celery did a pretty good job with the simple stuff because they pretty much will copy the celery API for defining a task and then tasks tasks it's a bit different so dusk is a project by the PI data foundation the same people behind Syfy and numpy it's not a task queue but it does really well solve the data parallelism in Python problem so instead of a queue it's got a scheduler that you run over your entire cluster and it's designed to distribute pandas dataframes and numpy arrays over your entire cluster to let you do your processing and it actually implements all of the scikit-learn api's in a distributed fashion which is absolutely awesome it does also have a task queuing like API but it uses the scheduler to distribute those tasks so if all you want to do is run your work somewhere else and you happen to be using pandas and it's super big datasets that you use in clustering check this out because it's really cool yeah so that's that's the list of tasking api's this is where we are now we're running our queue in places of celery for most applications the ones that are still running celery are the ones that I haven't gotten around to porting over yet and it's actually very easy for us to pull our queue is so simple that we were able to actually implement Windows support for a limited use case it doesn't have Windows support by default but now half of our developers who run Windows are running our queue and it took less than 60 lines to add Windows support which is pretty awesome and our tasks never get stuck in the queue while we've still got capacity available which is a big improvement of the celery so we love lucky right now it's great I already covered all that oh yeah we were already using ratification so it made a lot of sense to use something they used Redis is a queue as well and so I thought we haven't really had any issues did I miss anything else nope it's simple it's great it's lovely easy to understand yeah so their conclusions their tasks use are really great they give you architectural flexibility but the only cost is stuff you should be doing anyway if you're doing scale web systems and they're super relevant at the moment like everyone uses toss keys I would take away celery is really powerful but complex if you don't using any of the complexity you'll find you won't have to worry that much but as soon as you do anything beyond their default use case you're going to have to do some research and it's important to know what you need from your task queue because if you do you can realize that celery is too much in too complex a burden and switch to something tiny and simple like we did for our queue it's probably a generally good takeaway to know what you need for any piece of infrastructure anyway but there's a lot of complexity in the standard solution in this particular case in Python land so especially relevant here I guess questions people thank you Tom we have some time for questions does anyone have any questions you were going to show us that task chaining oh yeah learners content also yeah lucky you yeah so let's toss chaining and celery it gives you two different api's for it and when you do a des that represents a signature of AD partially applied with those arguments in this case ad takes two arguments so you don't provide any extras it generates the result and passes it on to the next one and generates the resultant passes on to the next one and to actually trigger that you just call and then you get a normal task result but internally it does all the chaining for you in the system which is pretty cool and then it's also got the binary or pipe operator you can use as well so that's that's celery land Hughie's on here as well I like the dog pen API is because I'm a high school guy and it makes me feel like I'm using monads even though I'm not so yeah again pretty simple you do the signature API dot then don't then dot then yeah and then distributed logging so this is carried directly stolen from the huy documentation on the bottom two lines you see with Huey don't lock task DB backup means only one task can enter there at a time yeah distributed locking is actually surprisingly easy to implement if you have something like gratis because you can just have a key there and say you can use Redis primitives to effectively do locking on that key yeah that's the bonus content thank you very talk quick question did you guys try to engage with readies so it's every dev team like know about your they probably have better things to do and they've it's very clear that they've designed it for the most common use case which is not Alice and changing the defaults does not make sense for them so just finding those configuration options was great for us but what actually killed us was it turned out there was a bug in the sqs implementation that meant that one of those was ignored and it meant that our stuff was still getting stuck so that bug has since been fixed in a more recent celery version but by the time it was fixed we'd already moved to okey great talk Thanks one quick question I had is in the past I've used celery and when celery is distributing the tasks it has to I think pickle them yes are there any restrictions on what can be sent as a task with our queue so we didn't have to deal with that so because of the restrictions that celery provides you we'd run into a lot of issues and I actually ended up implementing our own serialization layer on top of that because there's no easy way to flexibly extend the existing serializers without defining your own and we just wanted to use bog-standard Jason to serialize stuff and so what we ended up doing was converting stuff into stuff that was Jason serializable and passing that in and we just reused that exact same logic for what I'm pretty sure actually does is it also just pickles but we didn't need to deal with that we're looking at using celery at the moment and one of the some of the features that we like about it are that it's got a scheduler add-on celery beet and flower looks like a really nice monitoring tool for celery is there any similar scheduling and monitoring things for our queue pretty much all of the tasks systems I listed have some sort of schedule II type thing so that distributed locking thing from Huey it has the chute or a wrong direction yeah so that actually has Huey using periodic toss straight up so effectively it's scheduling off the bat ah q has one toss Tyga has one das has one everything has that remonda drink celery has a pretty nice monitoring dashboard and it's a lot easy to build your own monitoring tools because you can actually understand what's going on in the internal structure we actually had some issues with the celery flow a thing at some stage that was before my time and Mitch did that what were the issues we ran into didn't work didn't work okay okay this was two years ago that we tried that out and we basically gave up on it and did a tiny bit of state tracking ourselves and it was enough that we didn't hate our lives but it wasn't great but IQs just nicer so far because it's simple I think that answers your question I think volt okay and they are used in the excuse as a back-end of theory and do something like we try is a nightmare so I'd like to know a new switch from sorry to argue they'll happen to have some experience I'm really trying and I feel mostly transept yeah so we actually mostly avoided retry it's not too hard to write your own retry for any of these task queuing systems because they do generally provide enough that you can just catch your exceptions and manually write retry but who wants to write manual retry with exponential back-off basically none of our tasks suited retry so we never actually had to use it we just manually reported the failure and relied on the user determining whether or not to retry because if in all of our situations if it failed there was an issue with your data so sorry can tumble what kind of tools are you using to monitor tasks to be able to tell how many failures how many successes and so on so in our queue they provide a dashboard that tells you like you actually when your task fails it weaves ill into a failed queue and then provides a one-liner script that moves all failed tasks back into their original queues which is cool but it also has retry straight up in terms of monitoring we actually directly report these values to the user because like I said if something goes wrong in one of these it's almost certainly your data is wrong so for our use case at least but yet the dashboards really nice you can get reporting straight up because it sends messages when it puts stuff in the queue so it's something similar to be deflowered and so yeah like I said we tried it before my time and it died and it didn't work but celery does have an event system and that event system will give you messages every time the state of your task changes which is actually really useful it I use a bit different in that it provides in Redis you've got the up-to-date state of the system rather than an event streaming system so yeah but the event system is salary is actually quite cool Huey provides the exact same event system basically Huey is celery but with less complexity and simpler code we probably have time for one more question if there are any further questions the question is really with the message queue systems are using which of those scales to millions and millions of tasks all of them so Redis has a lot of complexity in scaling to huge numbers because it's designed as a key value stole primarily but RabbitMQ and Amazon sqs both designed for massive scale message passing which is kind of the use case celery absolutely excels at if you're doing millions of tasks in flight constantly you probably want salary and it's worth the overhead of spending that those hours working out what configuration options you need people thank you very much for your time some stuff name can everyone give me another round of applause and as a token of appreciation enjoys length
Info
Channel: PyCon AU
Views: 24,733
Rating: undefined out of 5
Keywords: pyconau, pyconau_2018, Python, PyCon, PyConAU, TomManderson
Id: ceJ-vy7fvus
Channel Id: undefined
Length: 29min 39sec (1779 seconds)
Published: Sun Aug 26 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.