> Consumers
Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each message goes to one of them; in publish-subscribe the message is broadcast to all consumers. Kafka offers a single consumer
abstraction that generalizes both of these—the consumer group.
Consumers label themselves with a consumer group name, and each message published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers.
If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers.
More commonly, however, we have found that topics have a small number of consumer groups, one for each "logical subscriber". Each group is composed of many consumer instances for scalability and fault tolerance. This is nothing more than publish-subscribe semantics
where the subscriber is cluster of consumers instead of a single process.
> Kafka has stronger ordering guarantees than a traditional messaging system, too.....
> Kafka does it better. By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the
consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still
balances the load over many consumer instances. Note however that there cannot be more consumer instances than partitions.
Kafka only provides a total order over messages within a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order
over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process.
如果某一topic的所有消费者实例都有相同的消费组名,它们的工作方式类似与传统的队列。
如果某一topic的所有消费者实例有各自不同的消费组名,它们的工作方式类似与发布-订阅,所有的消息广播给 所有消费者。
为什么采用消费组的形式呢,文档也给出了答案:for scalability and fault tolerance。
kafka的并行度---分区(the partition),
文档指出:某一topic的消息的每个分区(partition)被一个消费组中的一个消费者消费,可以确保消费者消费的顺序。
Note however that there cannot be more consumer instances than partitions.不要使得消费组中消费者实例数目多于分区数目,相等最好。