Adding a message queue feels like an unambiguous win: producers stop waiting on slow consumers, traffic spikes get absorbed, and services stop knowing about each other. All true. But queues quietly hand you a new set of failure modes, and they tend to show up in production rather than in the demo.
At-least-once means "expect repeats"
Most brokers (Azure Service Bus, SQS, RabbitMQ) guarantee a message is delivered at least once, not exactly once. A consumer can process a message, crash before acknowledging it, and the broker will redeliver it.
The implication is simple and non-negotiable: your consumers must be idempotent. Processing the same message twice should be safe. (If you read my note on idempotent APIs, the same idempotency-key thinking applies here — dedupe on a message id.)
Poison messages need an exit
One malformed message that always throws will be redelivered forever, blocking the queue and burning your error budget. Every consumer needs a dead-letter path:
- Cap the delivery attempts (e.g. 5).
- After the cap, move the message to a dead-letter queue instead of nacking it again.
- Alert on the dead-letter queue, not on individual failures.
This turns "the consumer is in a crash loop" into "there are 3 messages to look at when I'm awake."
Order is a privilege, not a guarantee
Parallel consumers process messages out of order. If order matters (say, updates to one account), you need a partition key so related messages land on the same consumer:
session/partition key = accountId
→ all events for account 42 stay in order,
events for different accounts still run in parallel
If order does not matter, don't pay for it — you will throttle your own throughput for nothing.
Make retries back off
A consumer that retries instantly hammers whatever just failed. Exponential backoff with jitter spreads the load and gives the downstream a chance to recover:
wait = min(cap, base * 2^attempt) + random_jitter
A short checklist
- Consumers are idempotent (dedupe on message id).
- Dead-letter queue configured, with an alert.
- Backoff + jitter on retries.
- Partition key chosen deliberately — for ordering, or not at all.
Queues are still a win. They just reward the teams that treat redelivery, poison messages, and ordering as design inputs rather than 2am surprises.