andri27-ts/reinforcement-learning

★ 4,716Jupyter NotebookAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((RL course))
    Course structure
      60-day challenge
      8 weekly modules
      Jupyter notebooks
    Algorithms
      Q-learning and DQN
      Policy gradients
      Actor-critic PPO
    Tools
      PyTorch
      OpenAI Gym
      Atari environments
    Audience
      ML learners
      Python developers

mindmap root((RL course)) Course structure 60-day challenge 8 weekly modules Jupyter notebooks Algorithms Q-learning and DQN Policy gradients Actor-critic PPO Tools PyTorch OpenAI Gym Atari environments Audience ML learners Python developers

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Work through hands-on reinforcement learning projects one week at a time, from basic Q-learning to advanced policy gradient methods.

USE CASE 2

Study working code implementations of DQN, actor-critic, and PPO to understand how each algorithm works in practice.

USE CASE 3

Use the weekly project structure as a guided path to go from machine learning beginner to implementing RL algorithms from scratch.

Tech stack

PythonPyTorchJupyter NotebookOpenAI Gym

Getting it running

Difficulty · moderate Time to first run · 1h+

Requires Python, PyTorch, and OpenAI Gym, some Gym environments need additional system dependencies for rendering.

In plain English

This repository is a self-paced course in deep reinforcement learning, structured as a 60-day challenge. It is designed for people who already have basic Python skills and some familiarity with machine learning, and want to go deeper into the techniques behind AI systems that learn from trial and error, such as the ones powering AlphaGo and competitive game-playing bots. The course runs for eight weeks. Each week introduces a new set of algorithms, starting with the core ideas of how an agent explores an environment and learns from rewards, then moving through increasingly advanced techniques: Q-learning, deep Q-networks, policy gradient methods, actor-critic approaches, and Proximal Policy Optimization. Week 6 covers evolution strategies and genetic algorithms, and Week 7 introduces model-based reinforcement learning. Lectures come from two main sources: David Silver's DeepMind course and Berkeley's Deep Reinforcement Learning course, both freely available on YouTube. Each week pairs those video lectures with working Python code in Jupyter notebooks, using PyTorch and OpenAI Gym environments such as Atari games and robotics simulations. The code implementations are the central deliverable. Each week's project asks you to run and study an algorithm on a specific environment, and several weeks include suggestions for extending the code on your own. There is also a Slack community, with access by invitation via email, for participants going through the challenge at the same time. The author has since published a book based on this material called Reinforcement Learning Algorithms with Python, which covers the same topics in more depth across 13 chapters. The book is referenced in the README for anyone who wants a more structured written resource alongside the code.

Copy-paste prompts

Prompt 1

I'm on Week 3 of this reinforcement learning course covering DQN. Show me how to implement an experience replay buffer in PyTorch for an Atari game.

Prompt 2

Walk me through the actor-critic algorithm from Week 5. Explain what the actor and critic networks do and how they train together.

Prompt 3

Implement the PPO clipped objective update step in PyTorch matching the approach in the Week 5 or 6 notebooks of this course.

Prompt 4

I want to extend the Week 7 model-based RL code. How do I add a simple world model that predicts the next state given the current state and action?

Open on GitHub → Explain another repo

← andri27-ts on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.