explaingit

apache/kafka

📈 Trending32,620JavaAudience · developerComplexity · 4/5ActiveLicenseSetup · hard

TLDR

Distributed platform for reliably streaming millions of events per second between services, with durable storage and independent consumers.

Mindmap

mindmap
  root((Kafka))
    What it does
      Event streaming
      Durable message log
      Multi-consumer support
    Architecture
      Topics and partitions
      Brokers and scaling
      Horizontal distribution
    Use cases
      Data pipelines
      Real-time analytics
      Event sourcing
      Log aggregation
    Tech stack
      Java
      Scala
      Gradle
      JVM
    Key features
      Kafka Streams
      Replay from history
      Fault tolerance

Things people build with this

USE CASE 1

Build high-volume data pipelines that move data between databases, analytics systems, and microservices in real time.

USE CASE 2

Implement real-time analytics by computing aggregates and metrics over continuous streams of events.

USE CASE 3

Create event sourcing systems that record every state change as a permanent, replayable log.

USE CASE 4

Aggregate logs from many services into a single durable location for monitoring and debugging.

Tech stack

JavaScalaGradleJVM

Getting it running

Difficulty · hard Time to first run · 1day+

Requires building from source, setting up distributed infrastructure (broker cluster, storage, coordination), and configuring multiple interconnected services.

Use freely for any purpose under the Apache License 2.0, including commercial use, as long as you include a copy of the license and state significant changes.

In plain English

Apache Kafka is a distributed event streaming platform, a system for reliably moving large volumes of data between services and applications in real time. The core problem it solves is the same one that arises whenever multiple systems need to share a continuous flow of data: how do you move millions of events per second reliably, durably, and at scale without the sender needing to know about the receiver? Kafka works by organizing data into topics, named channels where producers write messages and consumers read them. Messages are stored durably on disk in an ordered log, which means consumers can read from any point in history, not just in real time. This also makes Kafka resilient: if a consumer goes down and comes back, it can pick up where it left off. Multiple consumers can independently read the same topic at different speeds without interfering with each other. The system is designed for horizontal scaling, you can spread a topic across many partitions, each stored on different machines (called brokers), so throughput grows by adding servers. Kafka was originally developed at LinkedIn to handle the stream of user activity events (page views, clicks, logs) that their infrastructure generated at a rate that overwhelmed traditional message queues. Today it is used for high-volume data pipelines (moving data between databases, analytics systems, and microservices), real-time analytics (computing aggregates over streaming data), event sourcing (recording every state change as a permanent event log), and log aggregation (collecting logs from many services into one place). Kafka also includes Kafka Streams, a library for processing and transforming data streams directly in Java without a separate processing cluster. The stack is primarily Java with some Scala. It is built using Gradle, runs on the JVM, and is an Apache Software Foundation project used by thousands of companies.

Copy-paste prompts

Prompt 1
Show me how to set up a Kafka producer and consumer in Java to send and receive messages from a topic.
Prompt 2
How do I use Kafka Streams to filter and transform a stream of events without setting up a separate processing cluster?
Prompt 3
Explain how Kafka partitions work and why spreading a topic across multiple partitions improves throughput.
Prompt 4
Walk me through building a real-time analytics pipeline that reads user activity events from Kafka and computes click-through rates.
Prompt 5
How do I configure Kafka to replay messages from a specific point in history for a consumer that went offline?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.