Firing up asynchronous tasks in Django with Celery
While working on a new project of mine I encountered a problem of blocking the django server for too long. It is sort of a recommendation engine and does some data crunching by sending hundreds of network requests to different sites. The problem was that these tasks were performed in a synchronous fashion.
Even if such a site handles only low traffic, most browsers timeout and close the connection this leads to a broken pipe like this:
So the solution is to run these tasks asynchronously on a different process rather using the precious webserver time. Two options came to my mind for doing this:
- Node.js - It is a super-fast asynchronous javascript framework, based on the premise of requests and response callback objects.
- Celery - A python based task queue, which runs tasks on a separate python process. Basically what it does is, it searches for functions or objects on which a ‘task’ decorator has been applied and serializes the function and the arguments passed to it. It then stores this data in a database and queues calls to the function. It is much slower than node but existing code can be ported easily.
I decided to go with Celery because of two reasons, the major one being the fact that I have to rewrite my Models, ORM and the data analysis code in javascript. Also I needed some tasks to be run synchronously, as they depended on the results of the previous tasks. Using node meant I had to change my algorithms as well. So I decided to go for the much slower Celery.
Setting up celery was a really painful experience, and took a whole day. The documentation on their site is a bit confusing, so let me sum it up for you. I used Redis as the database for the queue.
- First you need to install by doing an “easy_install Celery”.
- Then, go to your project folder and create a file called “celeryconfig.py” and add these settings:
- Make sure you are using absolute module names while importing modules in your project, Celery crashes on relative names.
- Now to setup an asynchronous task, go to the module you specified in CELERY_IMPORTS in your config file and do something like this:
- Make sure that the arguments you are passing to the task function are simple python builtin type objects. This is because they need to be serialized and stored in a database. If you need to use complex objects like ORM models, pass their ids or other parameters from which they can be reconstructed within the task function.
- Now start up your redis server and fire up a celery worker by issuing this command inside your project directory, where you have created your config file: “celeryd -l info”. The additional parameter provides some useful logging features.
- Start your Django server, which now has the power to perform asynchronous tasks.