In this article, we will learn about Kafka consumers, offsets while reading data and consumer groups. We will also see how to start consumer from Kafka console.
Consumers read messages from topics. They only have to provide the topic name and one broket to connect to and Kafka will take care of pulling the right data from right brokers and sending them to consumers. Data is read in parallel across all partitions of the topic. But within the topic partition data is read sequentially. This is an important criterion to improve performance as more number of partitions will lead to more parallel processing.
Kafka consumers organize themselves into consumer groups. Each consumer within a group will read messages from one or more partitions. In Kafka, no partition will be read by two consumers from the same group. That means having more consumers than the number of partitions of the topic is not very useful as extra consumers will sit idle.
In the above image, the topic has four partitions. In consumer group A, there are only 2 consumers so each consumer is reading from 2 partitions at a time. Whereas in consumer group B there are four consumers so that each consumer is reading from one partition. But Having the fifth consumer in Group b will not help as it has no extra partition to read from.
In Kafka, offsets at which consumer group or consumer is reading is maintained. It is expected that when the consumer processes data from some Kafka topic it commits its read position to one system topic named __consumer_offsets. If the consumer process suddenly dies it can start reading from where it left using offset value.
As offsets are controlled by the consumer, it can consume records any order it likes. The consumer can reset offset to the beginning of all messages and start reading from there or can skip old messages and start reading from the most recent messages.
Starting Kafka consumer
Let's see how to start Kafka consumers from Kafka console.
kafka-console-consumer --bootstrap-server localhost:9092 --topic first_topic
By default, Kafka consumers will start reading the most recent message. If you want to read messages from the beginning of the topic then you can use '--from-beginning' argument with the console command.
kafka-console-consumer --bootstrap-server localhost:9092 --topic first_topic \ --from-beginning
We can place multiple consumers in the Kafka group and they will start reading messages in topic partitions parallelly. Lets us start a group with two consumers.
kafka-console-consumer \ --bootstrap-server localhost:9092 \ --topic first_topic \ --from-beginning \ --consumer-property group.id=group1
we can run this command in multiple terminals at the same time. If we start two consumers at the same time, these both consumers will process part of messages parallelly as seen in the following image.
It is also possible to read messages from a particular partition as well. for that, we can use the following command.
kafka-console-consumer --bootstrap-server localhost:9092 --topic first_topic \ --from-beginning --partition 0
This command will read data in partition 0 from the beginning.
These are some of the basics of Kafka consumers. we will see how to implement Kafka producer and consumers using Java and Python APIs in the next few articles.