Showing posts tagged celery

Firing up asynchronous tasks in Django with Celery

While working on a new project of mine I encountered a problem of blocking the django server for too long. It is sort of a recommendation engine and does some data crunching by sending hundreds of network requests to different sites. The problem was that these tasks were performed in a synchronous fashion.

Even if such a site handles only low traffic, most browsers timeout and close the connection this leads to a broken pipe like this:

So the solution is to run these tasks asynchronously on a different process rather using the precious webserver time. Two options came to my mind for doing this:

  • Node.js - It is a super-fast asynchronous javascript framework, based on the premise of requests and response callback objects. 
  • Celery - A python based task queue, which runs tasks on a separate python process. Basically what it does is, it searches for functions or objects on which a ‘task’ decorator has been applied and serializes the function and the arguments passed to it. It then stores this data in a database and queues calls to the function. It is much slower than node but existing code can be ported easily.

I decided to go with Celery because of two reasons, the major one being the fact that I have to rewrite my Models, ORM and the data analysis code in javascript. Also I needed some tasks to be run synchronously, as they depended on the results of the previous tasks. Using node meant I had to change my algorithms as well. So I decided to go for the much slower Celery.

Setting up celery was a really painful experience, and took a whole day. The documentation on their site is a bit confusing, so let me sum it up for you. I used Redis as the database for the queue.

  1. First you need to install by doing an “easy_install Celery”.
  2. Then, go to your project folder and create a file called “celeryconfig.py” and add these settings:
  3. Make sure you are using absolute module names while importing modules in your project, Celery crashes on relative names.
  4. Now to setup an asynchronous task, go to the module you specified in CELERY_IMPORTS in your config file and do something like this:
  5. Make sure that the arguments you are passing to the task function are simple python builtin type objects. This is because they need to be serialized and stored in a database. If you need to use complex objects like ORM models, pass their ids or other parameters from which they can be reconstructed within the task function.
  6. Now start up your redis server and fire up a celery worker by issuing this command inside your project directory, where you have created your config file: “celeryd -l info”. The additional parameter provides some useful logging features.
  7. Start your Django server, which now has the power to perform asynchronous tasks.