oliverz-dot/explore-execute-chain

★ 25PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((E2C))
    What it does
      Splits reasoning phases
      Reduces token cost 8x
      Domain adaptation
    Phases
      Short exploration 1k tokens
      Long execution 10k tokens
    Training
      Supervised fine-tuning
      Two-stage GRPO RL
      Domain adaptation step
    Use Cases
      Math competition problems
      Medical question answering
      Custom domain fine-tuning

mindmap root((E2C)) What it does Splits reasoning phases Reduces token cost 8x Domain adaptation Phases Short exploration 1k tokens Long execution 10k tokens Training Supervised fine-tuning Two-stage GRPO RL Domain adaptation step Use Cases Math competition problems Medical question answering Custom domain fine-tuning

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run inference with pretrained E2C checkpoints to reproduce paper results on math and medical reasoning benchmarks.

USE CASE 2

Fine-tune a language model on a new domain using only the short exploration segments, cutting token cost by roughly 97% versus standard fine-tuning.

USE CASE 3

Evaluate how E2C-style reasoning compares to standard chain-of-thought on custom benchmark tasks using the included evaluation scripts.

Tech stack

PythonPyTorchHugging Face

Getting it running

Difficulty · hard Time to first run · 1h+

Requires at least 16 GB GPU memory for inference, full training needs 4 GPUs with 40 GB each for SFT and 8 GPUs for the RL step.

The explanation does not mention a license for this project.

In plain English

Explore-Execute Chain (E2C) is a research project about making AI language models reason more efficiently. The central idea is to split the reasoning process into two distinct phases within a single model: a short exploration phase where the model sketches a high-level plan and picks the best approach (around 1,000 tokens), followed by a longer execution phase where it carries out that plan step by step (around 10,000 tokens). The benefit of this split is efficiency. When looking for the best answer by trying multiple possibilities at test time, the search only needs to cover the short exploration phase rather than full reasoning chains, which the authors report makes test-time compute about 8 times cheaper. When adapting the model to a new subject area (such as medical question answering), only the exploration segments need fine-tuning, using roughly 3.5 percent of the tokens that a standard full fine-tuning approach would require. The repository includes pretrained model checkpoints and training datasets hosted on Hugging Face, as well as scripts for inference, training, and evaluation. Running inference requires a machine with at least 16 GB of GPU memory. Training from scratch requires significantly more: the supervised fine-tuning step is designed for 4 GPUs with 40 GB each, and the reinforcement learning step for 8. An interactive demo lets you test the model on eight built-in problems spanning math, medical, and code domains, or supply your own. Training follows three steps: supervised fine-tuning on exploration-execution pairs, reinforcement learning using a two-stage GRPO process, and an optional lightweight adaptation step for new domains. Evaluation scripts cover 16 benchmarks across math and medical reasoning. The paper reports that the approach matches or beats standard methods on math competition problems while using around 7 times fewer tokens at test time. This is a research codebase tied to a specific paper and pretrained models. It is aimed at machine learning practitioners who want to reproduce the paper results or experiment with the E2C training approach on their own data and compute.

Copy-paste prompts

Prompt 1

I want to run the explore-execute-chain pretrained model on a custom math problem. Show me the Python code to load the checkpoint from Hugging Face and generate an E2C response.

Prompt 2

Help me adapt the E2C training pipeline to a new domain. I have 1,000 labeled examples. Which training step handles domain adaptation and what command do I run?

Prompt 3

Explain the two-stage GRPO training process in explore-execute-chain: what does each stage optimize and why are two stages needed instead of one?

Prompt 4

I want to reproduce the AIME benchmark results from the E2C paper. Which evaluation script do I run and what GPU setup is required?

Prompt 5

Show me how to run the interactive demo that tests E2C on the eight built-in problems spanning math, medical, and code domains.

Open on GitHub → Explain another repo

← oliverz-dot on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.