Build high-volume data pipelines that move data between databases, analytics systems, and microservices in real time.
Implement real-time analytics by computing aggregates and metrics over continuous streams of events.
Create event sourcing systems that record every state change as a permanent, replayable log.
Aggregate logs from many services into a single durable location for monitoring and debugging.
Requires building from source, setting up distributed infrastructure (broker cluster, storage, coordination), and configuring multiple interconnected services.
Apache Kafka is a distributed event streaming platform, a system for reliably moving large volumes of data between services and applications in real time. The core problem it solves is the same one that arises whenever multiple systems need to share a continuous flow of data: how do you move millions of events per second reliably, durably, and at scale without the sender needing to know about the receiver? Kafka works by organizing data into topics, named channels where producers write messages and consumers read them. Messages are stored durably on disk in an ordered log, which means consumers can read from any point in history, not just in real time. This also makes Kafka resilient: if a consumer goes down and comes back, it can pick up where it left off. Multiple consumers can independently read the same topic at different speeds without interfering with each other. The system is designed for horizontal scaling, you can spread a topic across many partitions, each stored on different machines (called brokers), so throughput grows by adding servers. Kafka was originally developed at LinkedIn to handle the stream of user activity events (page views, clicks, logs) that their infrastructure generated at a rate that overwhelmed traditional message queues. Today it is used for high-volume data pipelines (moving data between databases, analytics systems, and microservices), real-time analytics (computing aggregates over streaming data), event sourcing (recording every state change as a permanent event log), and log aggregation (collecting logs from many services into one place). Kafka also includes Kafka Streams, a library for processing and transforming data streams directly in Java without a separate processing cluster. The stack is primarily Java with some Scala. It is built using Gradle, runs on the JVM, and is an Apache Software Foundation project used by thousands of companies.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.