explaingit

thu-ml/tianshou

10,685PythonAudience · researcherComplexity · 3/5Setup · moderate

TLDR

Tianshou is a Python reinforcement learning library built on PyTorch that provides both high-level training APIs and low-level algorithm customization for researchers and practitioners.

Mindmap

mindmap
  root((tianshou))
    What it does
      RL agent training
      Algorithm library
      Env simulation
    Algorithms
      DQN and Rainbow
      PPO and SAC
      Offline BCQ and CQL
    Features
      High-level trainer
      Parallel environments
      Multi-agent support
    Tech
      Python
      PyTorch
      Gymnasium
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Train a DQN or PPO agent on a Gymnasium environment using Tianshou's high-level trainer with a few lines of Python.

USE CASE 2

Implement and test a custom reinforcement learning algorithm by plugging into Tianshou's lower-level procedural API.

USE CASE 3

Run parallel environment rollouts using Tianshou's vectorized env support to speed up data collection.

USE CASE 4

Apply offline RL algorithms like BCQ or CQL to a logged dataset without live environment interaction.

Tech stack

PythonPyTorchGymnasium

Getting it running

Difficulty · moderate Time to first run · 30min

Version 2 is not backward compatible with v1, migrating users must follow the changelog migration guide.

In plain English

Tianshou is a Python library for building and training reinforcement learning agents. Reinforcement learning is a branch of machine learning where a program learns to make decisions by trying things in an environment and getting feedback in the form of rewards or penalties. Tianshou is built on top of PyTorch, a popular framework for machine learning in Python, and it connects with Gymnasium, a standard library for simulation environments. The library is aimed at two groups: researchers who want to experiment with or modify learning algorithms at a low level, and practitioners who want to apply existing algorithms to their own problems without writing everything from scratch. To serve both, Tianshou offers two layers of interface: a high-level API for straightforward training workflows, and a lower-level procedural API for deeper customization. Tianshou ships with a large collection of implemented algorithms covering most major families of reinforcement learning techniques, including Q-learning variants like DQN and Rainbow, policy gradient methods like PPO and SAC, and offline learning algorithms like BCQ and CQL. It also includes support for multi-agent settings, model-based approaches, and imitation learning. Environments can run in parallel to speed up data collection, and it integrates with fast environment libraries like EnvPool for further acceleration. Version 2, released recently, is a complete redesign of the library's internal structure. It separates the concepts of learning algorithms and policies into distinct components, clarifies the class hierarchy between different algorithm types, and updates the naming of parameters to be more consistent. The release is not backward compatible with earlier versions, so users coming from version 1 need to follow the migration guide in the changelog. The project is maintained at Tsinghua University and is open source. Documentation, tutorials, and benchmark results for standard environments are available on the project's website.

Copy-paste prompts

Prompt 1
Using Tianshou, write a Python script that trains a PPO agent on the CartPole-v1 Gymnasium environment and prints the average reward each epoch.
Prompt 2
How do I set up Tianshou's vectorized environments with 8 parallel workers to speed up training for a custom Gymnasium env?
Prompt 3
I'm migrating a Tianshou v1 project to v2. What are the key breaking changes to the Policy and Trainer classes I need to update?
Prompt 4
Show me how to implement a custom replay buffer in Tianshou v2 that prioritizes transitions by TD error.
Open on GitHub → Explain another repo

← thu-ml on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.