Kafka

1. Introduction

Linkedin 2011

Decouples applications

Messaging system

Kafka runs on a cluster with one or multiple datacenters

Stores stream of records in categories called topic

Each record consists of a key, value, timestamp. Key mod nr clusters, partitions by key.

1.1 Why Kafka Is Fast

Persists all data to disk, utilizes cache for high performance read/writes. Writes go to page cache directly, and reads transfer data from page cache to socket using Linux sendFile() system call

1.2

2. Messages

Topic: local group f records. Producers store their data in particular topic via brokers. Consumers read data from that particular topic via brokers. Data stored in topic split into partitions and replicated.

Partition: Ordered and immutable sequence of message/records. Each new record is appended to a partition. Partitions in Kafka don't particularly determine max number of consumers.

Replica: Number of replica's of a Topic.

Offset: the records in the partitions are each assigned a sequential id number called the offset or index, that uniquely identifies each record within the partition. Each consumer tracks their own offset and can go back in the history by changing the offset to an earlier message order. E.g Consumer A has offset=9 and Consumer B has offset=5.

Producer: Kafka client that produces messages as single or batch messages to Topics. Single messages may improve speed, but reduce throughput.

Consumer: Kafka client reads data from Topics. It can read from multiple topics at the same time. Each consumer is assigned one or more partitions and its consumer's responsibility to keep track of offsets. A consumer can browse through the history of messages by changing offset. When a new consumer comes in, a partition may block old consumers for re-balancing, this may cause the order of the messages to change for the old consumers.

Broker: each node in Kafka Cluster is called Broker. AKA Kafka Server or Kafka Node. Topics are distributed across Brokers. A single broker hosts topic partition of one or more topics. Controllers are special kind of brokers. Each Kafka cluster has one and only one controller.

3. Use Cases

  • Messaging

  • Website tracking

  • Metrics

  • Log aggregation

  • Stream processing

  • Event sourcing

  • Commit log

4. Group ID

Consumers with the same group ID will

Queue: each message is only processed by a single consumer

Topic: each message is processed by all consumers

Last updated