datawhalechina/easy-rl

★ 14,151Jupyter NotebookAudience · researcherComplexity · 3/5LicenseSetup · moderate

Mindmap

mindmap
  root((Easy-RL))
    What it does
      RL textbook
      Chinese language
      Code examples
    Algorithms
      Q-learning and DQN
      PPO
      Actor-Critic
    Format
      13 chapters
      Jupyter Notebooks
      Exercises
    Access
      Free PDF download
      Print edition
      Online version

mindmap root((Easy-RL)) What it does RL textbook Chinese language Code examples Algorithms Q-learning and DQN PPO Actor-Critic Format 13 chapters Jupyter Notebooks Exercises Access Free PDF download Print edition Online version

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Work through 13 structured chapters to learn reinforcement learning from Markov decision processes to actor-critic methods.

USE CASE 2

Run companion Jupyter Notebooks that implement Q-learning, DQN, and PPO and watch them train on game environments.

USE CASE 3

Use the exercises at the end of each chapter to test your understanding before moving to the next algorithm.

USE CASE 4

Download the free PDF from GitHub releases to study offline or purchase the print edition from a Chinese publisher.

Tech stack

PythonJupyter Notebook

Getting it running

Difficulty · moderate Time to first run · 1h+

Chinese-language content, readers without Chinese will need a translation tool, the code notebooks are readable regardless of language.

Creative Commons BY-NC-SA 4.0, free to share and adapt with attribution, but not for commercial purposes, and derivatives must use the same license.

In plain English

Easy-RL, nicknamed the "Mushroom Book," is a Chinese-language textbook and tutorial series on reinforcement learning, a branch of machine learning where a program learns to make decisions by trying actions and receiving rewards or penalties. The mushroom name is a nod to Super Mario: the idea is that reading this book gives you a power-up, letting you explore reinforcement learning with growing confidence rather than being overwhelmed by its mathematical complexity. The content is drawn from several well-known Chinese university lecture series, primarily a deep reinforcement learning course by Professor Hung-yi Lee of National Taiwan University, known for making technical subjects accessible through game-based examples such as teaching an AI to play Atari games. Additional chapters pull from an introductory reinforcement learning course by Professor Bolei Zhou and a hands-on practical series by a world-champion reinforcement learning practitioner. The textbook covers 13 chapters progressing from fundamental concepts, through Markov decision processes (a mathematical framework for sequential decision-making), into specific algorithms including Q-learning, DQN (and its variants like Double DQN and Dueling DQN), policy gradient methods, the PPO algorithm, actor-critic methods, imitation learning, and handling sparse rewards. Each chapter comes with exercises and most include companion Jupyter Notebook code files so readers can run the algorithms and see them working directly. The book has been published in print by a Chinese publisher (People's Posts and Telecommunications Press) and is available from major Chinese book retailers. A free PDF version is available from the GitHub releases page. The online readable version is continuously updated at the project's website. All content is released under a Creative Commons BY-NC-SA 4.0 license, meaning it can be freely shared and adapted for non-commercial purposes with attribution.

Copy-paste prompts

Prompt 1

I am studying the easy-rl textbook on reinforcement learning. Explain the difference between Q-learning and DQN as the book covers it, and show me a simple Python implementation of Q-learning for a grid world.

Prompt 2

Help me run the DQN notebook from easy-rl on a CartPole environment. What packages do I need to install and what does the training loop look like?

Prompt 3

I am on the PPO chapter in easy-rl. Explain the clipped surrogate objective in plain English and show me what it looks like in Python code.

Prompt 4

Show me how actor-critic methods differ from pure policy gradient as explained in easy-rl, with a minimal code example showing both the actor and critic networks updating together.

Open on GitHub → Explain another repo

← datawhalechina on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.