explaingit

michaeltmatthews/purejaxgcrl

15PythonAudience · researcherComplexity · 4/5Setup · moderate

TLDR

Research code for training a single AI agent to pursue hundreds of different goals at once, from mining resources in a Minecraft-like game to navigating a gridworld. Includes multiple learning algorithms with GPU-accelerated JAX implementations.

Mindmap

mindmap
  root((purejaxgcrl))
    What It Does
      Multi-goal RL training
      Single agent many objectives
    Environments
      Craftax game
      Gridworld
    Algorithms
      LEO and Dual LEO
      PPO and PQN baselines
      Hindsight Experience Replay
    Tech Stack
      Python
      JAX GPU compute
    Audience
      ML researchers
      RL practitioners
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Train a single AI agent to handle 512 different objectives in a Minecraft-style game without retraining per goal.

USE CASE 2

Benchmark goal-conditioned RL algorithms like LEO, Dual LEO, PPO, PQN, and HER on shared environments.

USE CASE 3

Reproduce ICML 2026 paper results using pre-tuned default settings for each algorithm.

Tech stack

PythonJAX

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Python 3.10+ and a specific JAX version, GPU hardware recommended for Craftax experiments.

No license specified in the repository.

In plain English

This is the official code release for a research paper called "Goal-Conditioned Agents that Learn Everything All at Once," accepted at the ICML 2026 machine learning conference. The project is about training AI agents that can pursue many different goals at the same time rather than being trained for one specific objective. The term goal-conditioned reinforcement learning refers to a family of techniques where an agent learns to behave differently depending on which goal it is currently trying to achieve. Instead of training one specialized agent per task, you train a single agent that reads a goal as input and adapts its behavior accordingly. The challenge is making that work reliably when the number of possible goals is large. The research tests these ideas in two environments. The first is Craftax, a game inspired by Minecraft where an agent can mine resources, craft tools, and explore. The paper defines 136 distinct goals for the simpler version of that game and 512 goals for the full version. The second is a simple grid-based world included for quick experiments where results are easier to interpret. The repository says a capable agent can be trained on the gridworld in under a minute. The code implements several learning algorithms, including two called PPO and PQN that serve as baselines, a method called Hindsight Experience Replay, and the new methods introduced in the paper called LEO and Dual LEO. Each algorithm is contained in a single Python file so that researchers can read and modify them without navigating a complex codebase. All implementations use JAX, a Python library developed at Google that makes numerical computation run very fast, particularly on graphics hardware. Installation requires Python 3.10 or later and a specific version of JAX. The default settings for each script are pre-set to the values that performed best in the paper's experiments. Trained agents can be visualized after training using an included renderer.

Copy-paste prompts

Prompt 1
I want to train a goal-conditioned RL agent using JAX on the Craftax environment with the LEO algorithm from purejaxgcrl. Walk me through the file structure and how to run the main training script.
Prompt 2
How does purejaxgcrl implement Hindsight Experience Replay? Show me the relevant code and explain how goals are relabeled after each episode.
Prompt 3
I want to add a new environment to purejaxgcrl. What interface does it need to implement to work with the existing LEO training loop?
Prompt 4
Compare the LEO and Dual LEO algorithms in purejaxgcrl. What is the architectural difference and when would I choose one over the other?
Open on GitHub → Explain another repo

← michaeltmatthews on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.