explaingit

shangtongzhang/reinforcement-learning-an-introduction

Analysis updated 2026-06-24

14,646PythonAudience · researcherComplexity · 4/5Setup · moderate

TLDR

Python implementations of every figure and worked example from the Sutton and Barto Reinforcement Learning textbook, second edition, organised by chapter.

Mindmap

mindmap
  root((reinforcement-learning-an-introduction))
    Inputs
      Book chapter number
      Python scripts
      Algorithm parameters
    Outputs
      Reproduced figures
      Plot images
      Algorithm runs
    Use Cases
      Study RL from the book
      Reproduce textbook plots
      Tweak RL hyperparameters
      Build intuition for algorithms
    Tech Stack
      Python
      NumPy
      Matplotlib
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Reproduce a specific figure from the Sutton and Barto book to verify your understanding

USE CASE 2

Modify the Q-learning or Sarsa scripts to test new hyperparameters on cliff walking

USE CASE 3

Use the bandit, gridworld, and blackjack code as starter templates for your own RL coursework

USE CASE 4

Compare Monte Carlo, TD, and n-step methods side by side on the same task

What is it built with?

PythonNumPyMatplotlib

How does it compare?

shangtongzhang/reinforcement-learning-an-introductioncoderamp-labs/gitingestcomfy-org/comfyui-manager
Stars14,64614,65014,634
LanguagePythonPythonPython
Setup difficultymoderateeasymoderate
Complexity4/52/52/5
Audienceresearchervibe codervibe coder

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

No requirements.txt or install guide in the README, so you must guess deps (NumPy, Matplotlib) and read the book to use the code meaningfully.

In plain English

This repository is a Python recreation of the code examples and figures from a textbook called Reinforcement Learning: An Introduction (Second Edition) by Richard Sutton and Andrew Barto. That book is a well known starting point for the field of reinforcement learning, which is a branch of machine learning where an agent learns to make decisions by trying things and seeing what rewards or penalties follow. The repository does not teach reinforcement learning from scratch on its own, it assumes you have the book open beside you and want runnable code for the examples and plots inside it. The README is almost entirely a table of contents organised chapter by chapter, matching the structure of the book. For each chapter it lists the figures and worked examples that have been reproduced, with links to the resulting plot images stored in the repository. The covered chapters include Tic Tac Toe in Chapter 1, multi armed bandits in Chapter 2, gridworld dynamic programming in Chapters 3 and 4, Monte Carlo methods and blackjack in Chapter 5, temporal difference learning with Sarsa, Q learning, and cliff walking in Chapter 6, n step methods in Chapter 7, Dyna and planning in Chapter 8, function approximation in Chapter 9, and the Mountain Car task in Chapter 10. There is a short note in the README asking readers to open GitHub issues rather than emailing the author if they hit bugs or confusion in the code, and stating that the repository does not include solutions to the book's exercises. There are no installation instructions, no usage examples, and no description of dependencies in the part of the README that was shown. It functions more as an index that maps each book figure to a script and an image rather than as a guided tutorial.

Copy-paste prompts

Prompt 1
Walk me through the Sarsa vs Q-learning code in reinforcement-learning-an-introduction and explain why their cliff walking results differ
Prompt 2
Adapt the Mountain Car tile coding script from chapter 10 to use neural network function approximation instead
Prompt 3
Run the multi armed bandit examples and plot how the epsilon-greedy parameter changes regret
Prompt 4
Extend the blackjack Monte Carlo code so it also evaluates a soft policy with importance sampling
Prompt 5
Show me how the Dyna chapter 8 script builds and uses its model of the environment

Frequently asked questions

What is reinforcement-learning-an-introduction?

Python implementations of every figure and worked example from the Sutton and Barto Reinforcement Learning textbook, second edition, organised by chapter.

What language is reinforcement-learning-an-introduction written in?

Mainly Python. The stack also includes Python, NumPy, Matplotlib.

How hard is reinforcement-learning-an-introduction to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is reinforcement-learning-an-introduction for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.