Analysis updated 2026-07-03
Stream database changes to search indexes, caches, and analytics stores without writing to multiple systems from application code.
Replay all missed changes to a subscriber that went offline using the infinite look-back feature.
Run the included relay and client examples locally to see a full change-data-capture pipeline working end to end.
Build a high-throughput event streaming layer for thousands of database change events per second at low latency.
| linkedin/databus | networknt/light-4j | spring-io/initializr | |
|---|---|---|---|
| Stars | 3,680 | 3,680 | 3,679 |
| Language | Java | Java | Java |
| Setup difficulty | hard | moderate | moderate |
| Complexity | 4/5 | 3/5 | 3/5 |
| Audience | ops devops | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires manually downloading an Oracle JDBC driver jar directly from Oracle's website before the Gradle build will succeed.
Databus is an open-source system built by LinkedIn that solves a specific plumbing problem in large-scale software: keeping multiple databases and data stores synchronized when the original source of truth changes. In a typical internet company, you might store user data in a primary database but also keep copies of that data in a search index, a cache, and a separate analytics store. When someone updates their profile, all those other systems need to hear about the change. Doing that reliably at scale is harder than it sounds. Databus solves this by watching the transaction log of the primary database and streaming those changes out to any downstream system that subscribes. Rather than having application code write to both the database and a messaging system simultaneously (which creates consistency problems if one write fails), Databus treats the database as the single source of truth and derives everything else from it. LinkedIn built this to feed systems like their People Search index and Social Graph index, both of which need to stay in sync with the primary member profile data. The system is designed to handle thousands of events per second per server with low millisecond latency, and it supports subscribers catching up on historical changes through what the README calls infinite look-back. This matters for scenarios where a downstream system goes offline and needs to replay everything it missed. Building the project requires a separate Oracle JDBC driver jar that must be downloaded directly from Oracle's website before the build will work. Once that prerequisite is in place, the project builds with Gradle. The repository includes example code for both a relay (the component that reads from the database log) and a client (a subscriber that receives and processes the change stream). Running both lets you see the full pipeline working locally. The architecture is described in more detail in a 2012 research paper presented at the ACM Symposium on Cloud Computing. Full documentation lives on the project's GitHub wiki. The project is licensed under Apache 2.0.
An open-source change-data-capture system from LinkedIn that watches a database's transaction log and streams every change to downstream systems like search indexes and caches in near real-time. Handles thousands of events per second with millisecond latency.
Mainly Java. The stack also includes Java, Gradle, Oracle JDBC.
Apache 2.0, free to use, modify, and distribute for any purpose including commercial, with attribution.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly ops devops.
This repo across BitVibe Labs
Verify against the repo before relying on details.