Celery Production Notes
Table of Contents
I’ve been using Celery a lot lately on several projects at work lately, and I think I’ve finally gotten to a point where I am more or less comfortable working with it. It was painful getting started with Celery. The official docs and guides were useful of course, but when it came time to deploy to production, I found them to be quite lacking.
Celery gets a lot of flak for being too big, or too difficult to work with. Despite the complaints, it’s mindshare, track record, and library support with Django makes going with the alternatives1 a bit less tempting. You’ll need a better reason for choosing an alternative other than just for the sake of using something else. At the end of the day, Celery works and is good enough for most projects.
This post will focus on using Celery (3.1) with Django (1.9).
Choose the right broker
The official guides already cover choosing the right broker, but I’ll quickly mention it here for completeness.
Fortunately the choice is not difficult, as there are only two:
There are other supported brokers, of course, but those aren’t
considered “stable” yet.
Never use the Django ORM as your broker. That would be like using
manage.py runserver on production. (On development, using the ORM is
a nice and easy way to get started, and avoids the overhead of setting
up Redis/RabbitMQ on your machine).
From what I’ve read, Redis is good enough for most cases, but you should probably only use it if you have a good reason to not use RabbitMQ and understand the caveats that come with using Redis2.
RabbitMQ is the officially recommended broker and is a good default choice. The official guide has a nice walkthrough on how to configure RabbitMQ.
We also enable RabbitMQ’s management web UI. It becomes especially
useful when your job count starts to get high and/or when something
goes wrong with the queues. Just make sure you’ve set a strong
password for the
vhost user before turning it on.
It is a good idea to have a dedicated instance for RabbitMQ, as the message database could get quite big, and you’ll soon find yourself running out of disk space if it’s running on the same machine as the webserver or data storage.
By default, Celery routes all your tasks to the default queue named
celery. This is good enough for small apps that have less than 10
(or less) different task functions.
Personally I like to have each Django “app” and its tasks have a dedicated queue. This makes it easy to adjust the workers for each app without affecting others.
For example, let’s say we have an app called
customers with the
+ project/ |- core/ | |- models.py | |- tasks.py | `- views.py |- customers/ | |- models.py | |- views.py | `- tasks.py |- settings.py |- celery_config.py `- manage.py
And we want all
customers tasks to be processed by a dedicated
worker (could be running on a separate machine), and have all other
core tasks) to be processed by a lower
We’ll need a router for the
customers app. Create a new file
# customers/routers.py class CeleryTaskRouter(object): def route_for_task(self, task, args=None, kwargs=None): if task.startswith('customers.'): return 'customers' return None
And specify our routers in
CELERY_ROUTES = ( 'customers.routers.CeleryTaskRouter', )
You can have as many routers as you want, or one main router with a
Now, all tasks defined in the
"customers.tasks" module will be
routed to a queue named
"customers", this is because Celery task
names are derived from their module name and function name (if you
don’t specify the
name parameter in
If we start a worker like so:
$ ./manage.py celery worker --app=celery_config:app --queue=customers -n "customers.%%h"
The worker will have a name
"customers.hostname" and will only
process tasks defined in the
Any other tasks (e.g. those defined in
core.tasks) will be ignored,
unless we start a separate worker process to process tasks going to
the “default” queue (named
"celery" by default).
$ ./manage.py celery worker --app=celery_config:app --queue=celery
This decouples our apps’ workers and makes monitoring and managing them easier.
Notice that all we needed to set in our
settings.py is the
CELERY_ROUTES variable. Celery automatically creates the necessary
queues when needed. So only define
CELERY_QUEUES3 when you absolutely
need to fine tune your routes and queues.
We use Supervisor at work to manage our app processes (e.g. gunicorn, flower, celery).
Here’s what a typical entry in
supervisor.conf for the two workers
above would look like:
[program:celery_customers] command=/path/to/project/venv/bin/celery_supervisor --app=celery_config:app worker --loglevel=INFO --concurrency=8 --queue=customers -n "customers.%%h" directory = /path/to/project stdout_logfile=/path/to/project/logs/celery_customers.out.log stderr_logfile=/path/to/project/logs/celery_customers.err.log autostart=true autorestart=true user=user startsecs=10 stopwaitsecs=60 stopasgroup=true killasgroup=true [program:celery_default] command=/path/to/project/venv/bin/celery_supervisor --app=celery_config:app worker --loglevel=INFO --concurrency=8 --queue=celery directory = /path/to/project stdout_logfile=/path/to/project/logs/celery_default.out.log stderr_logfile=/path/to/project/logs/celery_default.err.log autostart=true autorestart=true user=user startsecs=10 stopwaitsecs=60 stopasgroup=true killasgroup=true
celery_supervisor script is a custom script that wraps
manage.py celery. It activates the virtualenv and the project
variables (since doing that with Supervisor never seemed to work). It
looks like this:
#!/bin/sh # Wrapper script for running celery with virtualenv and env variables set DJANGODIR=/path/to/project echo "Starting celery with args $@" # Activate the virtual environment. cd $DJANGODIR . venv/bin/activate # Set additional environment variables. . venv/bin/postactivate # Programs meant to be run under supervisor should not daemonize themselves # (do not use --daemon). exec celery "$@"
Now, when a worker goes down for whatever reason, Supervisor will automatically restart them.
Currently we have our workers’ concurrency level set at 8. By default, Celery will spawn 8 threads for each worker (a total of 16 threads for our example). For low concurrency levels, this is fine, but when you start needing to process a few hundred or more tasks concurrently, you will soon find that your machine will have a hard time keeping up.
Celery has an alternative worker pool implementation that uses Eventlet. This allows for significantly higher concurrency levels than possible with the default prefork/threads, but with the requirement that your task must be non-blocking.
I had a hard time determining what makes a task non-blocking, as the official guide only mentions it in passing. Eventlet docs state that you will need to monkeypatch libraries you are using for it to work, so I was left to wonder on where should I insert this monkeypatching code or if I even needed to? The official examples seem to suggest that no explicit monkeypatching is needed, but I couldn’t be sure.
To use the eventlet pool implementation, you will need to install the eventlet library:
pip install eventlet
And specify the
--pool option when starting the worker:
$ ./manage.py celery worker --app=celery_config.app --queue=customers -n "customers.%%h" --pool=eventlet --concurrency=256
Monitoring and managing workers
Now that we have our workers up and running, we want to monitor them, check the tasks currently on queue and restart them when we push updates.
The recommended tool for this is Flower, a web UI for Celery. Install it as usual:
pip install flower
Then start the server (make sure your workers are running already or flower will have trouble picking them up):
$ flower -A celery_config:app --basic_auth=flower:secretpassword --port=5555
Visit the web UI at
http://yourdomain.com:5555 and you can see a list of
workers with their status. The UI should be pretty self explanatory.
One way to restart the workers is directly via
this could get cumbersome if you have lots of worker processes, and if they
are distributed across multiple servers.
My preferred method is to do it via the Flower UI. You can select the workers you want (note though that on the latest version of flower, the UX for selecting workers have regressed6), and then click on the “Shut Down” button. This will cause the workers to gracefully perform a warm shutdown, which is what we want. And since the workers are supervised, they will automatically start up again, now using the latest version of the code.
- Mentioned RabbitMQ management plugin
- Added footnotes