- vừa được xem lúc

💻 Common Kafka Commands and Core Concepts 📨

0 0 4

Người đăng: Truong Phung

Theo Viblo Asia

1. Quick Setup

We can quickly start Kafka using Docker Compose Follow Quick Setup Guide, and to test Kafka setup with the CLI (Command Line Interface), follow these steps:

  1. Enter the Kafka Container : Run this from a new Terminal Window being used as Terminal for producer
    docker exec -it dev-kafka bash
    
  2. Produce Messages: Send a message to a topic (replace my_topic with your topic name):
    kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic
    
    Type your message and press Enter to send it.
  3. Consume Messages: Read messages from the topic (Execute this in another Terminal Window for comsumer after running same command in step 1):
    kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my_topic --from-beginning
    
  4. List Topics: Verify that your topics were created successfully:
    kafka-topics.sh --bootstrap-server localhost:9092 --list
    
  5. Describe Topic Details: Check topic configuration and partitioning:
    kafka-topics.sh --bootstrap-server localhost:9092 --topic my_topic --describe
    

These commands allow you to interact with Kafka topics, produce and consume messages, and verify your setup.

2. Common Kafka Commands

  1. Create a Topic with Replication and Partitions:

    kafka-topics.sh --create \
    --topic my_topic \
    --partitions 3 \
    --replication-factor 2 \
    --bootstrap-server localhost:9092
    
  2. List Topics:

    kafka-topics.sh --list --bootstrap-server localhost:9092
    
  3. Describe a Topic (view partition, replication, and ISR details):

    kafka-topics.sh --describe \ --topic my_topic \ --bootstrap-server localhost:9092
    
  4. Produce Messages to a Topic (send messages from CLI):

    kafka-console-producer.sh --topic my_topic --bootstrap-server localhost:9092
    

    Type messages in the terminal to send them to Kafka (e.g.,{"message": "Hello, Kafka KRaft!"}).

  5. Consume Messages from a Topic (reads messages with a specific consumer group, can run many consummer Terminals to test this):

    kafka-console-consumer.sh --topic my_topic \
    --from-beginning \
    --group my_consumer_group \
    --bootstrap-server localhost:9092
    
  6. Check Consumer Group Offsets (view offsets and lag for each consumer group):

    kafka-consumer-groups.sh --describe \ --group my_consumer_group \ --bootstrap-server localhost:9092
    
  7. Reset Consumer Group Offsets (useful to replay or skip messages):

    kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
    --group my_consumer_group \
    --topic my_topic \
    --reset-offsets --to-earliest --execute
    
  8. Increase Partitions for a Topic (add partitions for scalability):

    kafka-topics.sh --alter \
    --topic my_topic \
    --partitions 5 \
    --bootstrap-server localhost:9092
    
  9. Delete a Topic (remove a topic and its data):

    kafka-topics.sh --delete \
    --topic my_topic \
    --bootstrap-server localhost:9092
    

3. More Advanced Commands

  1. List All Consumer Groups:

    kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
    
  2. View All Configurations of a Topic:

    kafka-configs.sh --bootstrap-server localhost:9092 \
    --entity-type topics \
    --entity-name my_topic \
    --describe
    
  3. Update Topic Configurations (e.g., changing the retention period):

    kafka-configs.sh --bootstrap-server localhost:9092 \
    --entity-type topics \
    --entity-name my_topic \
    --alter --add-config retention.ms=604800000
    
  4. View Broker Configurations:

    kafka-configs.sh --bootstrap-server localhost:9092 \
    --entity-type brokers \
    --entity-name 1 \
    --describe
    
  5. Update Broker Configurations (e.g., changing log retention size):

    kafka-configs.sh --bootstrap-server localhost:9092 \
    --entity-type brokers \
    --entity-name 1 \
    --alter --add-config log.retention.bytes=1073741824
    
  6. Delete Consumer Group Offsets (for removing stale or inactive consumer group offsets):

    kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
    --delete --group my_consumer_group
    
  7. View Cluster Metadata (get details about brokers, topics, and partitions):

    kafka-metadata-shell.sh --bootstrap-server localhost:9092
    
  8. Monitor Kafka Lag for a Specific Consumer Group (useful for tracking how behind consumers are):

    kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
    --describe --group my_consumer_group
    
  9. Run a Log Compaction on a Topic (if compaction is enabled and required):

    kafka-topics.sh --alter --topic my_topic \
    --config cleanup.policy=compact \
    --bootstrap-server localhost:9092
    
  10. Change Default Partition Assignment Strategy (configure broker-level partition assignment strategy):

    To change the assignment strategy (e.g., to range, roundrobin, or sticky), update the broker configuration file (server.properties) with:

    partition.assignment.strategy=org.apache.kafka.clients.consumer.StickyAssignor
    

4. Kafka Core Concepts

Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications. Here are some common key aspects of Kafka:

  1. Core Concepts:
  • Producer: Sends messages (records) to Kafka topics.
  • Consumer: Reads messages from Kafka topics.
  • Topic: A category or feed name to which records are sent; it’s partitioned for scalability.
  • Partition: A single topic can be divided into multiple partitions, allowing parallel processing and distribution across Kafka brokers.
  • Broker: A server that stores topic partitions and handles requests from clients (producers and consumers).
  1. Message Durability:
  • Kafka ensures data durability by replicating partitions across multiple brokers, reducing the risk of data loss in case of broker failure.
  • Each message in Kafka is stored on disk, providing fault tolerance.
  1. Scalability:
  • Kafka's design allows horizontal scaling by adding more brokers to a cluster and increasing partitions within a topic, distributing the load across brokers.
  1. High Throughput and Low Latency:
  • Kafka is optimized for high throughput, handling millions of messages per second with low latency, making it suitable for real-time data processing.
  1. Offset Management:
  • Consumers track their position in a topic using offsets, which indicates the last message read. This allows consumers to resume from a specific point in case of a failure.
  1. Data Retention:
  • Kafka retains messages for a configurable period (e.g., days or weeks), even after they are consumed, which allows for replaying or reprocessing past messages.
  1. Stream Processing:
  • Kafka can be integrated with stream processing frameworks like Apache Flink, Kafka Streams, and Apache Spark to process real-time data streams directly from topics.
  1. Use Cases:
  • Kafka is widely used for logging, monitoring, real-time analytics, event sourcing, and building data pipelines between systems.

By offering these capabilities, Kafka provides a robust, scalable, and fault-tolerant solution for managing real-time data streams in distributed environments.

5.Topic replication and consumer groups:

1. Topic Replication:

  • Replication ensures high availability and fault tolerance for Kafka topics. Each topic is divided into partitions, and these partitions are replicated across multiple brokers.
  • Replication Factor: This defines the number of copies of a partition. For example, a replication factor of 3 means that there are three copies of each partition on different brokers.
  • Leader and Followers: For each partition, one replica is designated as the leader, and the others are followers. Producers and consumers interact with the leader, while followers replicate the data from the leader. If a broker holding the leader replica fails, one of the follower replicas is automatically promoted to be the new leader, ensuring that the data remains accessible.

2. Consumer Group:

  • A consumer group is a group of consumers that work together to read messages from a Kafka topic in parallel, allowing horizontal scaling.
  • Each consumer in a group reads from a subset of partitions, ensuring that each partition is consumed by only one consumer at a time within the group.
  • Load Balancing: Kafka distributes partitions among consumers in the group, balancing the workload. If a consumer fails or leaves the group, Kafka will rebalance the partitions across the remaining consumers.
  • Offset Management: Each consumer group maintains its own offsets, allowing different groups to consume the same topic independently, each keeping track of where they left off.

Together, replication ensures data durability and availability, while consumer groups provide scalability and fault-tolerant message processing.

6. The way Kafka distributes messages

Kafka distributes messages between consumers in a consumer group based on partition assignment. Here's a brief explanation of how this works:

  1. Partitions and Consumers:
  • Each Kafka topic is divided into multiple partitions. When consumers are part of a consumer group, Kafka assigns each partition to a specific consumer within the group.
  • Each partition is consumed by only one consumer in a given consumer group, ensuring that messages from a partition are processed sequentially by that consumer.
  1. Load Balancing:
  • Kafka uses a partition assignment strategy (like range, round-robin, or sticky) to distribute partitions evenly among the consumers in a group.
  • If the number of consumers equals the number of partitions, each consumer will be assigned exactly one partition.
  • If there are more partitions than consumers, some consumers will be responsible for multiple partitions.
  • If there are more consumers than partitions, some consumers will remain idle since each partition can only be assigned to one consumer in the group.
  1. Rebalancing:
  • When a consumer joins or leaves a consumer group, or when partitions change (e.g., a new topic partition is added), Kafka triggers a rebalance.
  • During rebalancing, partitions are redistributed among the consumers, ensuring that each partition remains assigned to only one consumer.

By distributing partitions this way, Kafka allows for parallel processing of messages within a consumer group, improving scalability and load distribution.

7. Minimize the risk of message loss

Example when building Golang services with the IBM Sarama library for Kafka, it's crucial to implement strategies that minimize the risk of message loss. Here are some key practices to follow:

  1. Enable Acknowledgments (acks)
  • Configure the producer's acknowledgment setting to ensure message durability. Setting acks=all ensures that all replicas acknowledge the message before it is considered sent, reducing the risk of loss in case of broker failure.
    config.Producer.RequiredAcks = sarama.WaitForAll
    
  1. Use Idempotent Producer
  • Enable idempotence in the producer configuration to ensure that duplicate messages are not produced during retries. This provides exactly-once delivery semantics and helps avoid data loss.
    config.Producer.Idempotent = true
    
  1. Configure Message Retries
  • Set up the producer's retry mechanism to resend messages in case of transient errors. Use a reasonable retry count and a backoff strategy to handle temporary failures without losing messages.
    config.Producer.Retry.Max = 5
    config.Producer.Retry.Backoff = 100 * time.Millisecond
    
  1. Enable Message Durability (Replication)
  • Ensure that the Kafka topic is configured with sufficient replication across brokers. A higher replication factor reduces the risk of data loss if a broker fails.
    # Example for creating a topic with a replication factor of 3
    kafka-topics.sh --create --topic my-topic --replication-factor 3 --partitions 3 --zookeeper localhost:2181
    
  1. Use Reliable Consumer Offsets Management
  • For consumers, commit offsets manually after successfully processing a message to avoid skipping unprocessed messages. Utilize Sarama’s OffsetManager for managing committed offsets.
    consumer.MarkOffset(msg, "")
    
  1. Handle Consumer Failures Gracefully
  • Implement error handling in the consumer logic to manage failures (e.g., retries) while processing messages. This prevents premature offset commits and message loss due to processing errors.
  1. Monitor and Handle Kafka Cluster Failures
  • Set up monitoring and alerting for Kafka broker failures or performance issues. Handle failovers gracefully to prevent message loss when brokers are slow or unresponsive.

By following these practices, you can significantly mitigate the risk of losing messages while using the Sarama library with Kafka in Golang.

If you found this helpful, let me know by leaving a 👍 or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! 😃

Bình luận

Bài viết tương tự

- vừa được xem lúc

Apache Presto - Hướng dẫn cài đặt

Bài viết này mình sẽ hướng dẫn các bạn cách cài đặt Apache Presto, trước tiên, để làm theo hướng dẫn này thì yêu cầu cơ bản như sau:. .

0 0 44

- vừa được xem lúc

Apache Presto - Giới thiệu tổng quan và kiến trúc của Apache Presto

Sau seri HIVE thì mình sẽ mang đến tiếp tục seri về Apache Presto, thằng này thì có thể sử dụng HIVE như là một connector trong kiến trúc của nó, cùng tìm hiểu về nó nhé, let's start. Apache Presto rất hữu ích để thực hiện các truy vấn thậm chí là hàng petabyte dữ liệu.

0 0 44

- vừa được xem lúc

Đọc dữ liệu từ một file text và ghi lại dưới dạng file parquet trên HDFS sử dụng Spark (Phần 2)

Các bạn chưa đọc phần 1 thì có thể đọc tại đây nha : Đọc dữ liệu từ một file text và ghi lại dưới dạng file parquet trên HDFS sử dụng Spark (Phần 1). Ghi dữ liệu ra file parquet sử dụng Spark.

0 0 50

- vừa được xem lúc

Đọc dữ liệu từ một file text và ghi lại dưới dạng file parquet trên HDFS sử dụng Spark (Phần 1)

Định dạng text là một định dạng vô cùng phổ biến cả trên HDFS hay bất cứ đâu. Dữ liệu file text được trình bày thành từng dòng, mỗi dòng có thể coi như một bản ghi và đánh dấu kết thúc bằng kí tự "" (

0 0 37

- vừa được xem lúc

Blockchain dưới con mắt làng Vũ Đại 4.0

Mở bài. Hey nhô các bạn, lại là mình đây .

0 0 51

- vừa được xem lúc

Khám phá từng ngõ ngách Apache Druid - Phần 1

1. Giới thiệu. Trước khi đi vào nội dung chính mình muốn kể 1 câu chuyện sau:. .

0 0 574