What is Apache Kafka And Choosing Right Partition Count & Replication Factor With Cluster guidelines!
2 min readApr 21, 2022
Kafka is a distributed storage layer that truly decouples producers and consumers
Reasons for operating a Kafka infrastructure at the edge include low latency, cost efficiency, cybersecurity, or no internet connectivity.
- A Kafka cluster is composed of multiple brokers(servers).
- Each broker is identified with its ID.
- Each broker contains certain topic partitions!
- After connecting to any broker, you will be connected to the entire cluster!
Partitions Count
- Better parallelism, better throughput
- Ability to run more consumers in a group to scale
- Ability to leverage more brokers if you have a large cluster
- more elections to perform for Zookeeper
- more files opened on Kafka.
Replication factor
- Topics should have a replication factor > 1.
- This way if a broker is down, another broker can serve the data
- Better resilience of your system (N-I broker can fail)
- more replication (higher latency if acks = ALL)
- more disk space on your system (50% more if RF is 3 instead of 2).
Partitions Count, Replication Factor
The two most important parameters when creating a topic. These impact the performance and durability of the system overall.
If the partitions count/replication factor increase during a topic lifecycle, it may lead to an unexpected performance decrease or data integrity….
Partitions Count
regardless of how big or how small your cluster is. Coming to the producer side you also need to adjust the producer throughput
- If you have a small cluster of fewer than six brokers, create two times the number of brokers you have ( N*2 if N < 6)
- suppose if you have a big cluster of over 12 brokers, I would say it’s not necessary to go to 2X!! you can do this ( N*1 if N > 12)
Cluster guidelines
- a broker should not hold more than 2000 to 4000 partitions (across all topics of that broker)
- A Kafka cluster should have a maximum of 20,000 partitions across all brokers.
- The reason is that in case of brokers go down, Zookeeper needs to perform a lot of leader elections.
- If you need more than 20,000 partitions in your cluster, follow the Netflix model and create more Kafka clusters