knktc's Notes

python, cloud, linux...

0%

Prevent Celery from Hanging When the Broker Is Down

In one project we used Celery with RabbitMQ as the broker, and found that when RabbitMQ went down, Celery could no longer send tasks. Calls such as apply_async and delay would just hang.

I first tried Celery’s documented BROKER_CONNECTION_TIMEOUT setting, but it did not help. The documentation itself explains why:

The broker connection timeout only applies to a worker attempting to connect to the broker. It does not apply to producer sending a task, see broker_transport_options for how to provide a timeout for that situation.

So this setting only helps workers. It does not help when application code is trying to publish a task.

After reading some issue discussions, I found that the practical fix is to pass retry-related options through BROKER_TRANSPORT_OPTIONS, for example:

1
2
3
4
5
6
BROKER_TRANSPORT_OPTIONS = {
"max_retries": 3,
"interval_start": 0,
"interval_step": 0.2,
"interval_max": 0.5,
}

This means:

  • retry at most 3 times
  • start retrying immediately
  • wait 0.2 seconds between failures
  • do not let the total retry interval grow past 0.5 seconds

After using this configuration in the project, Celery no longer hung when RabbitMQ was down.

Since Celery uses Kombu underneath, this may also help when Redis is used as the broker.

Reference:

https://github.com/celery/celery/issues/4627#issuecomment-396907957

如果我的文字帮到了您,那么可不可以请我喝罐可乐?