facebookresearch/reagent

Analysis updated 2026-07-03

★ 3,699PythonAudience · researcherComplexity · 4/5LicenseSetup · moderate

Mindmap

mindmap
  root((ReAgent))
    What it does
      Offline RL training
      Policy evaluation
      Recommendation optimization
    Algorithms
      Q-learning variants
      Policy gradient
      Contextual bandits
    Tech stack
      Python
      PyTorch
      Docker
    Status
      Archived by Facebook
      Succeeded by Pearl

mindmap root((ReAgent)) What it does Offline RL training Policy evaluation Recommendation optimization Algorithms Q-learning variants Policy gradient Contextual bandits Tech stack Python PyTorch Docker Status Archived by Facebook Succeeded by Pearl

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Study how to train reinforcement learning models on historical data when live simulation is not possible.

USE CASE 2

Use the policy evaluation tools to test a new decision-making policy against old data before deploying it.

USE CASE 3

Explore Q-learning and policy gradient algorithms in a production-scale Python/PyTorch codebase.

USE CASE 4

Reference the distributed training setup for handling large recommendation datasets with reinforcement learning.

What is it built with?

PythonPyTorchDocker

How does it compare?

	facebookresearch/reagent	opengeos/leafmap	stability-ai/stable-audio-tools
Stars	3,699	3,699	3,699
Language	Python	Python	Python
Setup difficulty	moderate	easy	hard
Complexity	4/5	2/5	4/5
Audience	researcher	data	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 1h+

Docker install available but the project is archived and no longer maintained, new users should consider the successor project Pearl instead.

Open-source and free to use, exact license terms are in the repository.

In plain English

ReAgent is an archived open-source platform built by Facebook for applying reinforcement learning to real-world problems at scale. Reinforcement learning is a type of AI training where a system learns by taking actions and receiving feedback, rather than being given labeled examples. ReAgent was built to make this approach practical for large-scale recommendation and optimization tasks, such as deciding what to show users in a feed or how to allocate resources. The platform is no longer actively maintained. Facebook's team has moved to a successor project called Pearl, and the README directs users there for ongoing support. ReAgent is preserved as a reference and for teams that built on top of it. When it was active, ReAgent provided a full pipeline: taking raw data, transforming it into training-ready inputs, training AI models using a variety of reinforcement learning algorithms, and then serving those models in production. It was designed for situations where you cannot run a live simulation, so it trained on batches of previously collected data rather than interacting with an environment in real time. It also included tools for evaluating a new policy using old data, which is important when you cannot test a policy by actually deploying it. The platform supported a wide range of algorithms, including several variants of Q-learning for environments with discrete or continuous actions, policy gradient methods, and contextual bandit approaches for simpler decision problems where only a single decision is made rather than a sequence. ReAgent was written in Python and used PyTorch for model training. It supported distributed training for handling large datasets. Installation was possible via Docker or manually, with detailed instructions available in the repository's documentation folder.

Copy-paste prompts

Prompt 1

I'm studying facebookresearch/reagent. Explain how offline reinforcement learning works here, how does it train without a live environment?

Prompt 2

Using ReAgent's architecture as a reference, how would I build a batch reinforcement learning pipeline in PyTorch for a recommendation system?

Prompt 3

Show me how ReAgent evaluates a new policy using old logged data, what is counterfactual policy evaluation and how is it implemented here?

Prompt 4

I want to understand the difference between the Q-learning and contextual bandit approaches in ReAgent, when would I use each one?

Prompt 5

How does ReAgent handle distributed training for large datasets? Describe the data pipeline from raw logs to trained model.

Frequently asked questions

What is reagent?

An archived Facebook platform for applying reinforcement learning to large-scale recommendation and optimization problems using previously collected data, now superseded by Pearl.

What language is reagent written in?

Mainly Python. The stack also includes Python, PyTorch, Docker.

What license does reagent use?

Open-source and free to use, exact license terms are in the repository.

How hard is reagent to set up?

Setup difficulty is rated moderate, with roughly 1h+ to a first successful run.

Who is reagent for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub facebookresearch on gitmyhub

Verify against the repo before relying on details.