explaingit

twitter/the-algorithm-ml

10,566PythonAudience · researcherComplexity · 5/5Setup · hard

TLDR

Twitter's open-sourced machine learning models that power the For You feed, including the Heavy Ranker that decides what content appears on your home timeline and TwHIN embeddings that represent users and content as numerical vectors.

Mindmap

mindmap
  root((the-algorithm-ml))
    What it does
      Feed ranking
      Recommendations
      User embeddings
    Models
      Heavy Ranker
      TwHIN embeddings
    Tech Stack
      Python
      PyTorch
      torchrec
    Requirements
      Linux only
      Nvidia GPU
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Study how Twitter's For You feed ranking model selects and orders content to show to users

USE CASE 2

Use TwHIN embeddings as a starting point for building your own social media recommendation system

USE CASE 3

Adapt the Heavy Ranker model architecture for a custom large-scale content ranking task

Tech stack

PythonPyTorchtorchrec

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a Linux machine with an Nvidia GPU, torchrec does not run on other platforms without workarounds.

In plain English

This repository contains open-sourced machine learning models that Twitter uses to power parts of its recommendation system. The code covers two specific models: the Heavy Ranker that decides what shows up in the For You feed on the home timeline, and TwHIN embeddings, which are a way of representing Twitter users and content as numerical vectors for use in recommendation tasks. A research paper on TwHIN is linked from the README for anyone who wants the technical background. The project is written in Python and is intended to run inside a Python virtual environment on Linux machines. It also depends on torchrec, a library for large-scale recommendation systems that works best with an Nvidia GPU. If you do not have a Linux machine with an Nvidia GPU, running this code locally will likely require extra workarounds the README does not cover. Setup is handled by a single shell script, and each sub-project within the repository has its own README with more specific instructions for running that model. The top-level README is brief and points readers to those individual sub-project folders for details. The README is sparse overall. It identifies what is included and how to get started at a high level, but does not describe in plain terms how the ranking or embedding models work, what inputs they take, or how they were trained. Readers who want deeper context would need to explore the sub-project folders and the linked research paper directly. This repository is primarily useful to people with a machine learning background who want to study or adapt the actual models Twitter uses. It is not a product users interact with directly, and it is not a tool for general-purpose use without significant technical knowledge.

Copy-paste prompts

Prompt 1
Walk me through how Twitter's Heavy Ranker model from this repo decides what content to show in the For You feed.
Prompt 2
Show me how TwHIN embeddings work and how I can use them to represent users and content in a recommendation system.
Prompt 3
Set up the twitter/the-algorithm-ml environment on a Linux machine with an Nvidia GPU and run the Heavy Ranker training script.
Prompt 4
Explain the architecture of the Heavy Ranker model in this repo and what input features it takes during training.
Open on GitHub → Explain another repo

← twitter on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.