morvanzhou/reinforcement-learning-with-tensorflow

★ 9,453PythonAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((repo))
    What it does
      RL algorithm tutorials
      Progressive difficulty
      Video companions
    Algorithms
      Q-learning and Sarsa
      Deep Q Networks
      Policy Gradients
      A3C and PPO
    Experiments
      Robot arm sim
      LunarLander
      BipedalWalker
    Audience
      ML beginners
      Python developers
      Chinese learners

mindmap root((repo)) What it does RL algorithm tutorials Progressive difficulty Video companions Algorithms Q-learning and Sarsa Deep Q Networks Policy Gradients A3C and PPO Experiments Robot arm sim LunarLander BipedalWalker Audience ML beginners Python developers Chinese learners

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Work through progressive tutorials to learn core reinforcement learning algorithms like Q-learning, DQN, and Actor-Critic from scratch.

USE CASE 2

Run pre-built experiments applying RL algorithms to a simulated robot arm, 2D car, or standard OpenAI Gym environments like LunarLander.

USE CASE 3

Use the tutorial scripts as starting templates to build your own RL agent for a custom environment.

Tech stack

PythonTensorFlowOpenAI Gym

Getting it running

Difficulty · moderate Time to first run · 30min

No packaged install, clone the repo and run individual Python scripts directly, requires TensorFlow and OpenAI Gym installed separately.

In plain English

This repository is a collection of tutorials on reinforcement learning, a branch of machine learning where an AI agent learns by trial and error: it takes actions, receives rewards or penalties, and over time figures out which actions lead to the best outcomes. The tutorials are written in Python using TensorFlow, and they progress from simple starting examples up to more advanced methods developed in recent years. The creator, MorvanZhou, originally produced these materials in Chinese and also offers companion videos on YouTube and a dedicated Chinese tutorial site called Mofan Python. English-language video explanations are available via a YouTube playlist linked from the README. The tutorial list covers a wide range of standard reinforcement learning algorithms. It starts with basic methods like Q-learning and Sarsa (which are table-based approaches where the agent memorizes what to do in each situation), then moves into Deep Q Networks (which replace the table with a neural network so the agent can handle more complex situations). From there it covers more specialized variations such as Double DQN, Prioritized Experience Replay, Dueling DQN, Policy Gradients, Actor-Critic, Deep Deterministic Policy Gradient, A3C (a faster parallel training method), Dyna-Q, Proximal Policy Optimization, and a curiosity-driven learning model. Alongside the algorithm tutorials, the repository includes several experiment folders where these methods are applied to specific challenges: a simulated 2D car, a robot arm, and standard benchmark environments from OpenAI Gym called BipedalWalker and LunarLander. These serve as practical demonstrations of how the algorithms behave on real tasks. The project is primarily a learning resource rather than a production library. There is no packaged install, you work directly with the Python scripts in each tutorial folder. The README includes donation links for those who find the tutorials useful.

Copy-paste prompts

Prompt 1

Using the morvanzhou reinforcement-learning-with-tensorflow tutorial scripts, help me set up a basic Q-learning agent for a simple grid environment and explain each step.

Prompt 2

Walk me through the difference between the DQN and Double DQN tutorials in this repo, what problem does Double DQN fix and what changes in the code?

Prompt 3

Help me adapt the Actor-Critic tutorial code from this repo to train on a custom OpenAI Gym environment I created.

Prompt 4

Show me how to run the robot arm experiment from this RL tutorial repo and explain how to read the reward curve during training.

Open on GitHub → Explain another repo

← morvanzhou on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.