Kafka Development

567 installs128 stars

Summary

This covers the full Kafka stack from broker architecture through producers, consumers, and Kafka Streams. You get specific guidance on partition strategy, offset management, and exactly-once semantics, plus real configuration values like acks=all and enable.idempotence=true. The retry and dead letter queue section is especially practical, recommending patterns like 3 runtime retries per redrive with a maximum of 5 redrives. It also walks through schema evolution with Schema Registry, security with SASL/SSL, and monitoring metrics that actually matter. If you're building event-driven systems and need to make decisions about topic design, consumer groups, or handling late-arriving data in stream processing, this gives you the patterns and tradeoffs.

Install to Claude Code

npx -y skills add mindrally/skills --skill kafka-development --agent claude-code

Installs to .claude/skills

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

Files

SKILL.mdView on GitHub

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Vibe Prospecting MCP

Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

Put your SEO on autopilot

An agent that runs the SEO playbooks that move rankings and ships PRs you control.

Get founding access →

First SeenJun 3, 2026

View on GitHub

Kafka Development

This skill provides best practices for Apache Kafka event streaming and distributed messaging systems. Apply these guidelines when building Kafka-based applications.

Core Principles

Kafka is a distributed event streaming platform for high-throughput, fault-tolerant messaging
Unlike traditional pub/sub, Kafka uses a pull model - consumers pull messages from partitions
Design for scalability, durability, and exactly-once semantics where needed
Leave NO todos, placeholders, or missing pieces in the implementation

Workflow: Setting Up a Kafka Producer-Consumer Pipeline

Define the topic — choose a descriptive name, set partition count based on expected consumer parallelism, and configure retention and replication factor.
Design the message schema — register an Avro or JSON schema in Schema Registry; ensure backward compatibility from the start.
Implement the producer — configure acks=all, enable idempotence, select a partition key that distributes evenly, and add error handling with retry logic.
Implement the consumer — set enable.auto.commit=false, pick an appropriate auto.offset.reset policy, process messages idempotently, and commit offsets only after successful processing.
Add observability — instrument producer send-rate, consumer lag, and broker under-replicated-partitions; propagate trace context in message headers.
Test end-to-end — use Testcontainers or an embedded Kafka broker to verify the full produce-consume-commit cycle, including failure and rebalance scenarios.
Deploy and monitor — roll out with lag alerts, dead-letter-topic routing for persistent failures, and dashboards for key broker and client metrics.

Architecture Overview

Core Components

Topics: Categories/feeds for organizing messages
Partitions: Ordered, immutable sequences within topics enabling parallelism
Producers: Clients that publish messages to topics
Consumers: Clients that read messages from topics
Consumer Groups: Coordinate consumption across multiple consumers
Brokers: Kafka servers that store data and serve clients

Key Concepts

Messages are appended to partitions in order
Each message has an offset - a unique sequential ID within the partition
Consumers maintain their own cursor (offset) and can read streams repeatedly
Partitions are distributed across brokers for scalability

Topic Design

Partitioning Strategy

Use partition keys to place related events in the same partition
Messages with the same key always go to the same partition
This ensures ordering for related events
Choose keys carefully - uneven distribution causes hot partitions

Partition Count

More partitions = more parallelism but more overhead
Consider: expected throughput, consumer count, broker resources
Start with number of consumers you expect to run concurrently
Partitions can be increased but not decreased

Topic Configuration

retention.ms: How long to keep messages (default 7 days)
retention.bytes: Maximum size per partition
cleanup.policy: delete (remove old) or compact (keep latest per key)
min.insync.replicas: Minimum replicas that must acknowledge

Producer Best Practices

Reliability Settings

acks=all               # Wait for all replicas to acknowledge
retries=MAX_INT        # Retry on transient failures
enable.idempotence=true # Prevent duplicate messages on retry

Performance Tuning

batch.size: Accumulate messages before sending (default 16KB)
linger.ms: Wait time for batching (0 = send immediately)
buffer.memory: Total memory for buffering unsent messages
compression.type: gzip, snappy, lz4, or zstd for bandwidth savings

Error Handling

Implement retry logic with exponential backoff
Handle retriable vs non-retriable exceptions differently
Log and alert on send failures
Consider dead letter topics for messages that fail repeatedly

Example: Java Producer with Idempotence and Error Handling

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");

try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
    ProducerRecord<String, String> record =
        new ProducerRecord<>("orders", "order-123", "{\"item\":\"widget\",\"qty\":5}");

    producer.send(record, (metadata, exception) -> {
        if (exception != null) {
            log.error("Send failed for key=order-123", exception);
            // Route to dead-letter topic or alert
        } else {
            log.info("Delivered to {}-{} offset {}",
                metadata.topic(), metadata.partition(), metadata.offset());
        }
    });
}

Partitioner

Default: hash of key determines partition (null key = round-robin)
Custom partitioners for specific routing needs
Ensure even distribution to avoid hot partitions

Consumer Best Practices

Offset Management

Consumers track which messages they've processed via offsets
auto.offset.reset: earliest (start from beginning) or latest (only new messages)
Commit offsets after successful processing, not before
Use enable.auto.commit=false for exactly-once semantics

Consumer Groups

Consumers in a group share partitions (each partition to one consumer)
More consumers than partitions = some consumers idle
Group rebalancing occurs when consumers join/leave
Use group.instance.id for static membership to reduce rebalances

Processing Patterns

Process messages in order within a partition
Handle out-of-order messages across partitions if needed
Implement idempotent processing for at-least-once delivery
Consider transactional processing for exactly-once

Example: Java Consumer with Manual Offset Commit

Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "order-processing-group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
    consumer.subscribe(Collections.singletonList("orders"));

    while (running) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
        for (ConsumerRecord<String, String> record : records) {
            try {
                processOrder(record.key(), record.value());
            } catch (Exception e) {
                log.error("Failed to process offset={} key={}", record.offset(), record.key(), e);
                publishToDeadLetterTopic(record, e);
            }
        }
        consumer.commitSync(); // Commit only after successful processing
    }
}

Timeouts and Failures

Implement processing timeout to isolate slow events
When timeout occurs, set event aside and continue to next message
Maintain overall system performance over processing every single event
Use dead letter queues for messages failing all retries

Error Handling and Retry

Retry Strategy

Allow multiple runtime retries per processing attempt
Example: 3 runtime retries per redrive, maximum 5 redrives = 15 total retries
Runtime retries typically cover 99% of failures
After exhausting retries, route to dead letter queue

Dead Letter Topics

Create dedicated DLT for messages that can't be processed
Include original topic, partition, offset, and error details
Monitor DLT for patterns indicating systemic issues
Implement manual or automated retry from DLT

Schema Management

Schema Registry

Use Confluent Schema Registry for schema management
Producers validate data against registered schemas during serialization
Schema mismatches throw exceptions, preventing malformed data
Provides common reference for producers and consumers

Schema Evolution

Design schemas for forward and backward compatibility
Add optional fields with defaults for backward compatibility
Avoid removing or renaming fields
Use schema versioning and migration strategies

Kafka Streams

State Management

Implement log compaction to maintain latest version of each key
Periodically purge old data from state stores
Monitor state store size and access patterns
Use appropriate storage backends for your scale

Windowing Operations

Handle out-of-order events and skewed timestamps
Use appropriate time extraction and watermarking techniques
Configure grace periods for late-arriving data
Choose window types based on use case (tumbling, hopping, sliding, session)

Security

Authentication

Use SASL/SSL for client authentication
Support SASL mechanisms: PLAIN, SCRAM, OAUTHBEARER, GSSAPI
Enable SSL for encryption in transit
Rotate credentials regularly

Authorization

Use Kafka ACLs for fine-grained access control
Grant minimum necessary permissions per principal
Separate read/write permissions by topic
Audit access patterns regularly

Monitoring and Observability

Key Metrics

Producer: record-send-rate, record-error-rate, batch-size-avg
Consumer: records-consumed-rate, records-lag, commit-latency
Broker: under-replicated-partitions, request-latency, disk-usage

Lag Monitoring

Consumer lag = last produced offset - last committed offset
High lag indicates consumers can't keep up
Alert on increasing lag trends
Scale consumers or optimize processing

Distributed Tracing

Propagate trace context in message headers
Use OpenTelemetry for end-to-end tracing
Correlate producer and consumer spans
Track message journey through the pipeline

Testing

Unit Testing

Mock Kafka clients for isolated testing
Test serialization/deserialization logic
Verify partitioning logic
Test error handling paths

Integration Testing

Use embedded Kafka or Testcontainers
Test full producer-consumer flows
Verify exactly-once semantics if used
Test rebalancing scenarios

Performance Testing

Load test with production-like message rates
Test consumer throughput and lag behavior
Verify broker resource usage under load
Test failure and recovery scenarios

Common Patterns

Event Sourcing

Store all state changes as immutable events
Rebuild state by replaying events
Use log compaction for snapshots
Enable time-travel debugging

CQRS (Command Query Responsibility Segregation)

Separate write (command) and read (query) models
Use Kafka as the event store
Build read-optimized projections from events
Handle eventual consistency appropriately

Saga Pattern

Coordinate distributed transactions across services
Each service publishes events for next step
Implement compensating transactions for rollback
Use correlation IDs to track saga instances

Change Data Capture (CDC)

Capture database changes as Kafka events
Use Debezium or similar CDC tools
Enable real-time data synchronization
Build event-driven integrations

Kafka Development

Install to Claude Code

Kafka Development

Install to Claude Code

Kafka Development

Core Principles

Workflow: Setting Up a Kafka Producer-Consumer Pipeline

Architecture Overview

Core Components

Key Concepts

Topic Design

Partitioning Strategy

Partition Count

Topic Configuration

Producer Best Practices

Reliability Settings

Performance Tuning

Error Handling

Example: Java Producer with Idempotence and Error Handling

Partitioner

Consumer Best Practices

Offset Management

Consumer Groups

Processing Patterns

Example: Java Consumer with Manual Offset Commit

Timeouts and Failures

Error Handling and Retry

Retry Strategy

Dead Letter Topics

Schema Management

Schema Registry

Schema Evolution

Kafka Streams

State Management

Windowing Operations

Security

Authentication

Authorization

Monitoring and Observability

Key Metrics

Lag Monitoring

Distributed Tracing

Testing

Unit Testing

Integration Testing

Performance Testing

Common Patterns

Event Sourcing

CQRS (Command Query Responsibility Segregation)

Saga Pattern

Change Data Capture (CDC)

Recommended

Kafka Development

Core Principles

Workflow: Setting Up a Kafka Producer-Consumer Pipeline

Architecture Overview

Core Components

Key Concepts

Topic Design

Partitioning Strategy

Partition Count

Topic Configuration

Producer Best Practices

Reliability Settings

Performance Tuning

Error Handling

Example: Java Producer with Idempotence and Error Handling

Partitioner

Consumer Best Practices

Offset Management

Consumer Groups

Processing Patterns

Example: Java Consumer with Manual Offset Commit

Timeouts and Failures

Error Handling and Retry

Retry Strategy

Dead Letter Topics

Schema Management

Schema Registry

Schema Evolution