dlr-rm/stable-baselines3

★ 13,253PythonAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((stable-baselines3))
    What it does
      RL algorithm library
      Train by trial and error
      Built on PyTorch
    Interface
      scikit-learn style
      create then learn
      then predict
    Tracking
      TensorBoard logs
      Weights and Biases
      Hugging Face sharing
    Ecosystem
      SB3 Contrib extra algos
      SBX JAX fast variant
      RL Baselines3 Zoo
    Requirements
      Python 3.10 plus
      PyTorch 2.3 plus

mindmap root((stable-baselines3)) What it does RL algorithm library Train by trial and error Built on PyTorch Interface scikit-learn style create then learn then predict Tracking TensorBoard logs Weights and Biases Hugging Face sharing Ecosystem SB3 Contrib extra algos SBX JAX fast variant RL Baselines3 Zoo Requirements Python 3.10 plus PyTorch 2.3 plus

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Train a game-playing agent using a standard reinforcement learning algorithm like PPO or SAC in under 20 lines of code

USE CASE 2

Reproduce published reinforcement learning research results using well-tested algorithm implementations

USE CASE 3

Build a custom robotic control policy by training an agent in a simulation environment and evaluating it with TensorBoard

USE CASE 4

Use a pre-trained SB3 model from Hugging Face as a starting point for a new robotics or game AI project

Tech stack

PythonPyTorchTensorBoard

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Python 3.10+ and PyTorch 2.3+, GPU optional but recommended for faster training.

In plain English

Stable Baselines3 (SB3) is a Python library that provides clean, tested implementations of reinforcement learning algorithms. Reinforcement learning is a branch of machine learning where a software agent learns by trial and error: it takes actions in an environment, receives a score based on how well it did, and gradually learns to make better decisions. SB3 is built on PyTorch and is intended for researchers and practitioners who want reliable starting points for their own experiments. The library is developed by the German Aerospace Center (DLR) Robotics and Mechatronics Center. It is the third generation of the Stable Baselines project. The goal is to make it easier to reproduce published research results and to give people a solid foundation to build new ideas on top of, rather than reimplementing the same algorithms from scratch each time. SB3 provides a consistent interface across all its algorithms, following a style similar to the scikit-learn machine learning library that many Python developers already know. You create a model, call learn() to train it, and then use predict() to run it. Training progress can be tracked with Tensorboard. The library supports custom environments, custom policies, and custom callbacks, and works in Jupyter notebooks. The README notes that SB3 itself is now in a stable maintenance phase, focused on bug fixes. Newer experimental algorithms are released in a companion package called SB3 Contrib. A JAX-based variant called SBX offers much faster training at the cost of fewer features. A training framework called RL Baselines3 Zoo adds hyperparameter tuning, pre-trained agents, and experiment management on top of SB3. The library requires Python 3.10 or newer and PyTorch 2.3 or newer. It can be installed with pip. Integration with Weights and Biases for experiment tracking and Hugging Face for sharing trained models is also available.

Copy-paste prompts

Prompt 1

Train a PPO agent in the CartPole-v1 gym environment using Stable Baselines3 and plot the reward curve with TensorBoard

Prompt 2

Show me how to define a custom gym environment in Python and train a SAC agent on it using Stable Baselines3

Prompt 3

Use Stable Baselines3 to load a pre-trained model from Hugging Face and evaluate it on a new environment

Prompt 4

Add a custom callback to a Stable Baselines3 training run that saves a checkpoint every 10,000 steps

Prompt 5

Compare training speed between Stable Baselines3 PPO and the SBX JAX variant on the same environment

Open on GitHub → Explain another repo

← dlr-rm on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.