explaingit

robinhood/faust

6,825PythonAudience · developerComplexity · 4/5Setup · hard

TLDR

Python library for building real-time stream processors on top of Apache Kafka, inspired by Kafka Streams but written in plain Python. Built by Robinhood for billions of daily events, now deprecated, the community faust-streaming fork is actively maintained.

Mindmap

mindmap
  root((Faust))
    What it does
      Stream processing
      Kafka integration
      Real-time pipelines
    Features
      Distributed Tables
      Time windowing
      RocksDB storage
    Tech Stack
      Python asyncio
      Apache Kafka
      RocksDB
    Status
      Deprecated by Robinhood
      Community fork active
      Python 3.6 required
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Build a real-time event processor in Python that reads from a Kafka topic and reacts to each message as it arrives rather than in batches.

USE CASE 2

Create a distributed key/value store in your stream app that tracks windowed counts like clicks in the last hour with automatic expiry.

USE CASE 3

Process live event streams using familiar Python libraries like Pandas or NumPy alongside the stream processing logic.

USE CASE 4

Replace a scheduled batch job with a continuously running pipeline that updates results in real time.

Tech stack

PythonApache KafkaRocksDBasyncio

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a running Apache Kafka cluster, this repository is deprecated, use the community faust-streaming fork for new projects.

In plain English

Faust is a Python library that lets developers build systems that process continuous streams of data, reading events as they arrive rather than working on batches after the fact. It was built by Robinhood and used internally to handle billions of events per day across distributed systems and real-time data pipelines. The library is now deprecated and no longer maintained by Robinhood, an active community-maintained fork continues at a separate GitHub repository. The core idea comes from Kafka Streams, a Java-based stream processing tool, but Faust brings that approach to plain Python. You connect it to Apache Kafka, a messaging system that acts as a high-throughput queue, and then write ordinary Python functions that react to each incoming message. Because it uses Python's async features, those functions can also make web requests or run other background work without blocking the stream. Faust includes a built-in distributed key/value store called Tables. These work like Python dictionaries in your code, but the data is stored on disk using RocksDB (a fast embedded database) and replicated across all nodes in your cluster. If one machine fails, another picks up where it left off automatically. Tables also support time-based windowing, so you can track counts like "clicks in the last hour" and let older windows expire on their own. Because it is just Python, Faust works alongside any library you already use: NumPy, Pandas, Django, Flask, or anything else. Models describe how messages are serialized, using Python type annotations to define the shape of expected data. The library is statically typed and works with the mypy type checker, which can catch errors before you run anything. Faust requires Python 3.6 or later. Given the deprecation notice, new projects should consider the community fork rather than this repository.

Copy-paste prompts

Prompt 1
Write a Faust app that reads JSON events from a Kafka topic named user-clicks and counts unique users per 1-hour window using a Faust Table.
Prompt 2
How do I define a Faust model with Python type annotations to validate the structure of incoming Kafka messages before processing them?
Prompt 3
Set up a Faust worker that reads order events from Kafka, enriches each one with a REST API call, and writes results to an output topic.
Prompt 4
How do I deploy a Faust app across multiple worker nodes so Tables are automatically partitioned and replicated for fault tolerance?
Prompt 5
Migrate this Faust code to the faust-streaming community fork so I get continued updates and Python 3.10 and later support.
Open on GitHub → Explain another repo

← robinhood on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.