Train a DQN or PPO agent on a Gymnasium environment using Tianshou's high-level trainer with a few lines of Python.
Implement and test a custom reinforcement learning algorithm by plugging into Tianshou's lower-level procedural API.
Run parallel environment rollouts using Tianshou's vectorized env support to speed up data collection.
Apply offline RL algorithms like BCQ or CQL to a logged dataset without live environment interaction.
Version 2 is not backward compatible with v1, migrating users must follow the changelog migration guide.
Tianshou is a Python library for building and training reinforcement learning agents. Reinforcement learning is a branch of machine learning where a program learns to make decisions by trying things in an environment and getting feedback in the form of rewards or penalties. Tianshou is built on top of PyTorch, a popular framework for machine learning in Python, and it connects with Gymnasium, a standard library for simulation environments. The library is aimed at two groups: researchers who want to experiment with or modify learning algorithms at a low level, and practitioners who want to apply existing algorithms to their own problems without writing everything from scratch. To serve both, Tianshou offers two layers of interface: a high-level API for straightforward training workflows, and a lower-level procedural API for deeper customization. Tianshou ships with a large collection of implemented algorithms covering most major families of reinforcement learning techniques, including Q-learning variants like DQN and Rainbow, policy gradient methods like PPO and SAC, and offline learning algorithms like BCQ and CQL. It also includes support for multi-agent settings, model-based approaches, and imitation learning. Environments can run in parallel to speed up data collection, and it integrates with fast environment libraries like EnvPool for further acceleration. Version 2, released recently, is a complete redesign of the library's internal structure. It separates the concepts of learning algorithms and policies into distinct components, clarifies the class hierarchy between different algorithm types, and updates the naming of parameters to be more consistent. The release is not backward compatible with earlier versions, so users coming from version 1 need to follow the migration guide in the changelog. The project is maintained at Tsinghua University and is open source. Documentation, tutorials, and benchmark results for standard environments are available on the project's website.
← thu-ml on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.