What Is a Kafka Partition?
A Kafka partition is an ordered, immutable sequence of records that continually appends messages. Each topic in Kafka is split into one or more partitions, and each partition is stored on one or more brokers for redundancy. Key characteristics:- Immutable log: New records are appended to the end.
- Offset-based retrieval: Consumers read messages by offset.
- Per-partition ordering: Kafka guarantees ordering only within a single partition.
Ordering is guaranteed per partition, not across the entire topic. If strict global ordering is required, use a single partition—though this limits parallelism.
Benefits of Partitioning
| Benefit | Description | Example |
|---|---|---|
| Increased Throughput | Parallel writes and reads across multiple partitions | Produce 10,000 messages/sec to a 6-partition topic |
| Fault Tolerance | Replicate each partition across brokers for high availability | Set replication-factor=3 for three-broker redundancy |
| Scalability | Add partitions on-the-fly to handle more data without downtime (rebalancing) | Expand from 4 to 8 partitions during off-peak hours |
| Consumer Parallelism | Multiple consumers in a group can read from different partitions simultaneously | A 5-member consumer group on a 5-partition topic |
Creating a Topic with Multiple Partitions
Use the Kafka CLI to create topics with a specified number of partitions and replication factor:Partitioning Strategies
- Key-Based Partitioning
Messages with the same key are routed to the same partition, preserving ordering for that key. - Round-Robin (Default for No Key)
If no key is provided, Kafka distributes messages in a round-robin fashion across partitions. - Custom Partitioner
Implementorg.apache.kafka.clients.producer.Partitionerfor bespoke logic—useful for geographic or priority-based routing.
Scaling Consumer Parallelism
Consumers join a consumer group to share the workload of reading from a topic’s partitions:- Each partition is consumed by only one consumer in a group.
- If there are more consumers than partitions, the extra consumers remain idle.
- If there are fewer consumers than partitions, some consumers read multiple partitions.
| Partition Count | Consumer Count | Parallelism Achieved |
|---|---|---|
| 3 | 3 | 3-way parallelism |
| 3 | 5 | 3-way (2 idle) |
| 8 | 4 | 4-way |
If you reduce partitions after data is produced, you cannot decrease the count — you can only increase it. Plan your partition count based on projected throughput.
Best Practices for Partition Management
- Start with a realistic partition count based on expected throughput and consumer count.
- Use keys for messages where ordering or data locality matters.
- Monitor partition sizes and consumer lag via tools like Kafka Metrics or Confluent Control Center.
- Rebalance consumer groups carefully to avoid extended processing pauses.
- Match the replication factor to your desired fault-tolerance level (commonly ≥ 3).