explaingit

linkedin/databus

Analysis updated 2026-07-03

3,680JavaAudience · ops devopsComplexity · 4/5LicenseSetup · hard

TLDR

An open-source change-data-capture system from LinkedIn that watches a database's transaction log and streams every change to downstream systems like search indexes and caches in near real-time. Handles thousands of events per second with millisecond latency.

Mindmap

mindmap
  root((databus))
    What it does
      Watch database transaction log
      Stream changes to subscribers
      Keep downstream stores in sync
    Key concepts
      Change data capture
      Infinite look-back replay
      Low millisecond latency
    Components
      Relay reads DB log
      Client subscribes to stream
      Example code included
    Who its for
      Backend and data engineers
      Large-scale data platforms
      Search and cache sync
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Stream database changes to search indexes, caches, and analytics stores without writing to multiple systems from application code.

USE CASE 2

Replay all missed changes to a subscriber that went offline using the infinite look-back feature.

USE CASE 3

Run the included relay and client examples locally to see a full change-data-capture pipeline working end to end.

USE CASE 4

Build a high-throughput event streaming layer for thousands of database change events per second at low latency.

What is it built with?

JavaGradleOracle JDBC

How does it compare?

linkedin/databusnetworknt/light-4jspring-io/initializr
Stars3,6803,6803,679
LanguageJavaJavaJava
Setup difficultyhardmoderatemoderate
Complexity4/53/53/5
Audienceops devopsdeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires manually downloading an Oracle JDBC driver jar directly from Oracle's website before the Gradle build will succeed.

Apache 2.0, free to use, modify, and distribute for any purpose including commercial, with attribution.

In plain English

Databus is an open-source system built by LinkedIn that solves a specific plumbing problem in large-scale software: keeping multiple databases and data stores synchronized when the original source of truth changes. In a typical internet company, you might store user data in a primary database but also keep copies of that data in a search index, a cache, and a separate analytics store. When someone updates their profile, all those other systems need to hear about the change. Doing that reliably at scale is harder than it sounds. Databus solves this by watching the transaction log of the primary database and streaming those changes out to any downstream system that subscribes. Rather than having application code write to both the database and a messaging system simultaneously (which creates consistency problems if one write fails), Databus treats the database as the single source of truth and derives everything else from it. LinkedIn built this to feed systems like their People Search index and Social Graph index, both of which need to stay in sync with the primary member profile data. The system is designed to handle thousands of events per second per server with low millisecond latency, and it supports subscribers catching up on historical changes through what the README calls infinite look-back. This matters for scenarios where a downstream system goes offline and needs to replay everything it missed. Building the project requires a separate Oracle JDBC driver jar that must be downloaded directly from Oracle's website before the build will work. Once that prerequisite is in place, the project builds with Gradle. The repository includes example code for both a relay (the component that reads from the database log) and a client (a subscriber that receives and processes the change stream). Running both lets you see the full pipeline working locally. The architecture is described in more detail in a 2012 research paper presented at the ACM Symposium on Cloud Computing. Full documentation lives on the project's GitHub wiki. The project is licensed under Apache 2.0.

Copy-paste prompts

Prompt 1
I downloaded the Oracle JDBC jar and built Databus with Gradle. How do I configure the relay to connect to my primary database log and start streaming change events?
Prompt 2
Walk me through running the Databus example relay and client locally so I can see the full change-data-capture pipeline in action.
Prompt 3
How does Databus compare to Debezium for change-data-capture in a Java stack? When would I pick Databus over a Kafka Connect Debezium connector?
Prompt 4
I want to use Databus to keep a search index in sync with a primary database. Describe the sequence of events from a database row update to the search index seeing the change.
Prompt 5
How does Databus infinite look-back work and how do I configure a client to replay all changes starting from a specific timestamp?

Frequently asked questions

What is databus?

An open-source change-data-capture system from LinkedIn that watches a database's transaction log and streams every change to downstream systems like search indexes and caches in near real-time. Handles thousands of events per second with millisecond latency.

What language is databus written in?

Mainly Java. The stack also includes Java, Gradle, Oracle JDBC.

What license does databus use?

Apache 2.0, free to use, modify, and distribute for any purpose including commercial, with attribution.

How hard is databus to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is databus for?

Mainly ops devops.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub linkedin on gitmyhub

Verify against the repo before relying on details.