twitter/the-algorithm

Analysis updated 2026-06-20

★ 73,112ScalaAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((repo))
    What it Does
      Feed ranking pipeline
      Candidate selection
      Content filtering
    Key Components
      SimClusters interests
      TwHIN relationships
      Tweepcred reputation
    Tech Stack
      Scala
      Python
      Rust Navi server
    Audience
      ML researchers
      Algorithm engineers
    Use Cases
      Study recommendations
      Research transparency

mindmap root((repo)) What it Does Feed ranking pipeline Candidate selection Content filtering Key Components SimClusters interests TwHIN relationships Tweepcred reputation Tech Stack Scala Python Rust Navi server Audience ML researchers Algorithm engineers Use Cases Study recommendations Research transparency

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Study how a production-scale social media recommendation feed algorithm is structured across candidate selection, ranking, and filtering stages.

USE CASE 2

Research how interest communities are detected in large social graphs using the SimClusters algorithm.

USE CASE 3

Understand how user reputation scoring works at scale using the Tweepcred page-rank-style system.

What is it built with?

ScalaPythonRustBazel

How does it compare?

	twitter/the-algorithm	apache/spark	lichess-org/lila
Stars	73,112	43,240	18,184
Language	Scala	Scala	Scala
Setup difficulty	hard	hard	hard
Complexity	5/5	5/5	5/5
Audience	researcher	data	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Not a standalone runnable app, requires X's internal infrastructure, intended as reference and study material only.

No license information is specified in the repository description.

In plain English

This is the source code for the recommendation algorithm that powers X (formerly Twitter). Its job is to decide which posts appear in your "For You" feed, which notifications you receive, and what shows up when you search or explore the platform. In short, it answers the question: out of hundreds of millions of posts, which ones should this specific user see right now? The system works in several stages. First, candidate sources gather a large pool of potentially relevant posts from both accounts you follow and accounts you don't. Then ranking models score each candidate based on factors like how likely you are to engage with it, how reputable the author is, and whether it matches your interests. Finally, filtering layers remove content that violates policies or legal requirements before the final feed is assembled and delivered to you. Key internal components include SimClusters (which groups users into interest communities), TwHIN (which builds relationship maps between users and posts), and a page-rank-style reputation scorer called Tweepcred. You would look at this repository if you are a researcher studying recommendation systems, a developer curious about how large-scale feed algorithms are structured, or someone interested in transparency around algorithmic content selection. It is not a standalone runnable application but rather a collection of services and machine learning jobs that require the broader X infrastructure to operate. The primary languages are Scala and Python, with some Rust for high-performance model serving (a component called Navi). Build tooling uses Bazel. This is reference and study material, not a plug-and-play product.

Copy-paste prompts

Prompt 1

Explain how the SimClusters component in twitter/the-algorithm groups users into interest communities, and how I could apply the same concept to my own recommendation system.

Prompt 2

Break down the For You feed ranking pipeline in the-algorithm: what signals are used to score each candidate post and in what order?

Prompt 3

I'm building a recommendation system from scratch. Using twitter/the-algorithm as a reference, how should I structure the candidate retrieval and ranking stages?

Prompt 4

How does Tweepcred calculate author reputation in the-algorithm, and how does that score influence which posts get surfaced?

Frequently asked questions

What is the-algorithm?

The source code for X's (formerly Twitter's) recommendation system that decides which posts appear in your For You feed, a multi-stage pipeline covering candidate selection, ranking, and filtering.

What language is the-algorithm written in?

Mainly Scala. The stack also includes Scala, Python, Rust.

What license does the-algorithm use?

No license information is specified in the repository description.

How hard is the-algorithm to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is the-algorithm for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub twitter on gitmyhub

Verify against the repo before relying on details.