The Invisible Wall
Our system was hitting an invisible wall. Our RabbitMQ cluster had plenty of CPU and memory, yet message throughput had plateaued. Worse, our p99 latencies were spiking under load, causing downstream timeouts. The culprit? A single, high-traffic queue.
In RabbitMQ, a single queue is constrained to a single CPU core on a single node. No matter how powerful the server, you can only push so many messages through one queue. We weren't scaling up anymore; we needed to scale out.
The Solution: Divide and Conquer with Sharding
Instead of one massive queue, we partitioned our data across multiple smaller queues, a technique known as sharding. This allowed us to parallelize message processing across multiple cores and even multiple nodes.
Here’s the architecture:
graph TD
subgraph Producers
P1(Producer 1)
P2(Producer 2)
end
subgraph RabbitMQ Cluster
X(Exchange);
subgraph Sharded Queues
Q1(Shard 1);
Q2(Shard 2);
Q3(Shard 3);
Q4(...);
end
end
subgraph Consumers
C1(Consumer Group 1) --> Q1;
C2(Consumer Group 2) --> Q2;
C3(Consumer Group 3) --> Q3;
C4(Consumer Group 4) --> Q4;
end
P1 --> X;
P2 --> X;
X -- routing_key ends in .1 --> Q1;
X -- routing_key ends in .2 --> Q2;
X -- routing_key ends in .3 --> Q3;
X -- routing_key ends in .N --> Q4;
How it works:
- Producers: Producers publish messages to a single exchange as usual. The magic is in the
routing_key. - Routing: We use a consistent hashing algorithm on an entity ID (e.g.,
user_id) to generate a shard number. This number is appended to the routing key (e.g.,event.created.5). - Queues: We create a fixed number of queues (
event_queue_1,event_queue_2, ...). The exchange routes the message to the correct queue based on the shard number in the routing key. - Consumers: Each consumer (or a group of consumers for parallel processing) subscribes to a single shard. This ensures messages for a given entity are still processed in order while the overall system processes messages in parallel.
Ensuring High Availability with Quorum Queues
Sharding solves the performance bottleneck, but what about reliability? For this, we use Quorum Queues, RabbitMQ's modern, Raft-based solution for data safety.
Unlike classic mirrored queues, Quorum Queues provide better data consistency and are designed for modern HA systems. We define a policy to ensure all our sharded queues are created as quorum queues with a replication factor of 3.
# This policy applies to all queues with names starting with 'event_queue_'
# It makes them Quorum Queues with a replication factor of 3.
rabbitmqctl set_policy ha-shards "^event_queue_" \
'{"ha-mode":"quorum", "ha-params":3}' --apply-to queues
Fine-Tuning for Peak Performance
With the new architecture in place, we also tuned our consumers:
- Prefetch Count: We adjusted the consumer prefetch count to find the sweet spot between keeping the consumer busy and preventing it from holding too many unacknowledged messages.
- Concurrency: We scaled the number of consumer instances per shard to match the processing capacity needed for that shard's workload.
The Impact: Throughput Unlocked
- 2x Throughput: We immediately doubled our sustainable message throughput.
- Stable Latency: p95 and p99 latencies remained stable, even under peak load.
- No More Hotspots: The load was evenly distributed across the cluster, eliminating the single-queue bottleneck.
- Resilient by Design: With Quorum Queues, we could lose a node without losing data or availability.
By moving from a single-queue model to a sharded architecture, we built a message bus that can scale horizontally with our business needs.