34

I have a python application where I want to start doing more work in the background so that it will scale better as it gets busier. In the past I have used Celery for doing normal background tasks, and this has worked well.

The only difference between this application and the others I have done in the past is that I need to guarantee that these messages are processed, they can't be lost.

For this application I'm not too concerned about speed for my message queue, I need reliability and durability first and formost. To be safe I want to have two queue servers, both in different data centers in case something goes wrong, one a backup of the other.

Looking at Celery it looks like it supports a bunch of different backends, some with more features then the others. The two most popular look like redis and RabbitMQ so I took some time to examine them further.

RabbitMQ: Supports durable queues and clustering, but the problem with the way they have clustering today is that if you lose a node in the cluster, all messages in that node are unavailable until you bring that node back online. It doesn't replicated the messages between the different nodes in the cluster, it just replicates the metadata about the message, and then it goes back to the originating node to get the message, if the node isn't running, you are S.O.L. Not ideal.

The way they recommend to get around this is to setup a second server and replicate the file system using DRBD, and then running something like pacemaker to switch the clients to the backup server when it needs too. This seems pretty complicated, not sure if there is a better way. Anyone know of a better way?

Redis: Supports a read slave and this would allow me to have a backup in case of emergencies but it doesn't support master-master setup, and I'm not sure if it handles active failover between master and slave. It doesn't have the same features as RabbitMQ, but looks much easier to setup and maintain.

Questions:

  1. What is the best way to setup celery so that it will guarantee message processing.

  2. Has anyone done this before? If so, would be mind sharing what you did?

Ken Cochrane
  • 68,551
  • 9
  • 45
  • 57
  • 2
    as for the rabbitmq failover, I'm hearing rumours that something simpler will be available soon! – asksol Aug 05 '11 at 07:02
  • 1
    Redis can be durable if you set the append_only setting. But redis still doesn't support message acknowledgements, which means that a message is redelivered if the worker doesn't ack it. Celery redis support emulates this, but only as well as is possible to do on the client side, which means that any unacked message may be lost if the worker is killed abruptly or there is a power failure. See http://ask.github.com/celery/faq.html#should-i-use-retry-or-acks-late – asksol Aug 05 '11 at 07:04
  • 2
    Maybe you can get away with not losing messages if you set CELERY_DISABLE_RATE_LIMITS=True, set CELERYD_PREFETCH_MULTIPLIER=1, set CELERY_ACKS_LATE=True, and run with the solo pool. But would have to verify that. – asksol Aug 05 '11 at 07:05
  • the solo pool may or may not be necessary (-P solo) – asksol Aug 05 '11 at 07:05
  • @asksol thanks for the help I'll check those out. – Ken Cochrane Aug 05 '11 at 10:15
  • What tolerance does your app have for duplicates? If you need guaranteed delivery without duplicates things get harder than if you can tolerate seeing each message at least once. – Malcolm Box Sep 07 '11 at 15:34
  • @Malcolm box I currently can't have duplicates. – Ken Cochrane Sep 07 '11 at 21:13

5 Answers5

5

A lot has changed since the OP! There is now an option for high-availability aka "mirrored" queues. This goes pretty far toward solving the problem you described. See http://www.rabbitmq.com/ha.html.

Chris Johnson
  • 17,500
  • 5
  • 69
  • 74
3

You might want to check out IronMQ, it covers your requirements (durable, highly available, etc) and is a cloud native solution so zero maintenance. And there's a Celery broker for it: https://github.com/iron-io/iron_celery so you can start using it just by changing your Celery config.

Travis Reeder
  • 31,147
  • 12
  • 77
  • 80
1

I suspect that Celery bound to existing backends is the wrong solution for the reliability guarantees you need.

Given that you want a distributed queueing system with strong durability and reliability guarantees, I'd start by looking for such a system (they do exist) and then figuring out the best way to bind to it in Python. That may be via Celery & a new backend, or not.

Malcolm Box
  • 3,851
  • 1
  • 23
  • 42
  • Thanks, do you know the names for systems that have a distributed queueing system with strong durability and reliability guarantees? I would like to check them out. – Ken Cochrane Sep 07 '11 at 21:18
  • Amazon SQS is one. Others I don't know - but Google's probably your friend now you know the question to ask – Malcolm Box Sep 07 '11 at 21:21
  • Look at [MQSeries](http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/index.jsp?topic=%2Fcom.ibm.mq.csqzae.doc%2Fic10770_.htm) and similar products. – michaelok Oct 06 '11 at 14:31
0

I've used Amazon SQS for this propose and got good results. You will recieve message until you will delete it from queue and it allows to grow you app as high as you will need.

varela
  • 1,185
  • 1
  • 9
  • 16
  • Amazon SQS is slow compared to redis and rabbitMQ, and I don't think it works with celery, but I could be wrong. – Ken Cochrane Sep 01 '11 at 11:01
  • Celery does support AmazonSQS - but this post does not answer the question. Is the order of the messages guaranteed? Can you guarantee that no duplicates will be created/processed in a distributed system, etc. – Sam Redway Jul 27 '17 at 15:03
0

Is using a distributed rendering system an option? Normally reserved for HPC but alot of concepts are the same. Check out Qube or Deadline Render. There are other, open source solutions as well. All have failover in mind given the high degree of complexity and risk of failure in some renders that can take hours per image sequence frame.

rjmoggach
  • 1,254
  • 13
  • 23