In one project we used Celery with RabbitMQ as the broker, and found that when RabbitMQ went down, Celery could no longer send tasks. Calls such as apply_async and delay would just hang.
I first tried Celery’s documented BROKER_CONNECTION_TIMEOUT setting, but it did not help. The documentation itself explains why:
The broker connection timeout only applies to a worker attempting to connect to the broker. It does not apply to producer sending a task, see
broker_transport_optionsfor how to provide a timeout for that situation.
So this setting only helps workers. It does not help when application code is trying to publish a task.
After reading some issue discussions, I found that the practical fix is to pass retry-related options through BROKER_TRANSPORT_OPTIONS, for example:
1 | BROKER_TRANSPORT_OPTIONS = { |
This means:
- retry at most 3 times
- start retrying immediately
- wait 0.2 seconds between failures
- do not let the total retry interval grow past 0.5 seconds
After using this configuration in the project, Celery no longer hung when RabbitMQ was down.
Since Celery uses Kombu underneath, this may also help when Redis is used as the broker.
Reference:
https://github.com/celery/celery/issues/4627#issuecomment-396907957