wangshusen/drl

★ 4,605Audience · researcherComplexity · 1/5Setup · easy

Mindmap

mindmap
  root((drl course))
    Topics
      Value-based methods
      Policy gradient
      Actor-critic
      Multi-agent RL
    Key Algorithms
      Q-learning SARSA
      DQN double DQN
      REINFORCE A2C
    Special Topics
      AlphaGo case study
      Imitation learning
    Format
      PDF slides
      Chinese video lectures

mindmap root((drl course)) Topics Value-based methods Policy gradient Actor-critic Multi-agent RL Key Algorithms Q-learning SARSA DQN double DQN REINFORCE A2C Special Topics AlphaGo case study Imitation learning Format PDF slides Chinese video lectures

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Study the theory of Q-learning, SARSA, and DQN through structured university lecture slides.

USE CASE 2

Learn how policy gradient methods like REINFORCE and A2C work from clearly organized course materials.

USE CASE 3

Understand multi-agent reinforcement learning and imitation learning through dedicated lecture sections.

Tech stack

PDF

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

This repository is a course on deep reinforcement learning, organized as a series of lecture slides and accompanying video recordings. The videos are in Chinese, and the slides are PDF files available directly from the repository. It is structured as a university-style curriculum rather than runnable code. Reinforcement learning is a branch of machine learning where a software agent learns to make decisions by trying things out and receiving feedback, similar to how a person learns a game by playing it repeatedly. "Deep" reinforcement learning means the agent uses a neural network to process what it sees and decide what to do next, which allows it to handle far more complex situations than older rule-based approaches. The course moves through eight major topic areas. It opens with a conceptual overview of how reinforcement learning works, covering the main families of approaches: value-based methods, policy-based methods, and actor-critic methods. It also includes a session on AlphaGo to show how these ideas apply to a well-known real-world system. From there it goes into TD learning, which is a specific technique for estimating how good a given situation is by looking at outcomes a few steps ahead rather than waiting until the end of a game or task. This section covers Sarsa, Q-learning, and multi-step methods. Later sections go deeper into value-based approaches including experience replay and double DQN, then into policy gradient methods including REINFORCE and A2C. A section on continuous action spaces covers scenarios where the agent does not just pick from a fixed list of options but chooses values along a range. The final sections introduce multi-agent settings, where several agents interact, and imitation learning, where an agent learns by observing examples rather than by trial and error. This is a self-study or classroom resource. There is no software to install and no coding exercises included in the repository itself.

Copy-paste prompts

Prompt 1

Explain the difference between value-based and policy-based reinforcement learning as covered in the wangshusen/drl course.

Prompt 2

What is TD learning and how does it relate to Q-learning, based on the wangshusen/drl lecture slides?

Prompt 3

How does experience replay improve DQN, according to the course materials in wangshusen/drl?

Prompt 4

Explain actor-critic reinforcement learning and when to use it over REINFORCE, based on this course.

Prompt 5

Using the wangshusen/drl course content, describe how AlphaGo applies deep reinforcement learning to the game of Go.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub wangshusen on gitmyhub

Verify against the repo before relying on details.