karpathy/autoresearch

Analysis updated 2026-05-18

★ 79,286PythonAudience · researcherComplexity · 3/5Setup · hard

Mindmap

mindmap
  root((repo))
    What it does
      AI runs experiments
      Modifies training code
      Tracks validation metric
      Five-minute budget per run
    How it works
      prepare.py data setup
      train.py agent edits
      program.md instructions
      Keeps or discards changes
    Use cases
      Study autonomous research
      Tinker with AI loops
      Experiment overnight
      Compare architectures fairly
    Tech stack
      Python
      PyTorch
      NVIDIA GPU
      uv package manager
    Requirements
      Single GPU
      Python 3.10+
      uv project manager

mindmap root((repo)) What it does AI runs experiments Modifies training code Tracks validation metric Five-minute budget per run How it works prepare.py data setup train.py agent edits program.md instructions Keeps or discards changes Use cases Study autonomous research Tinker with AI loops Experiment overnight Compare architectures fairly Tech stack Python PyTorch NVIDIA GPU uv package manager Requirements Single GPU Python 3.10+ uv project manager

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run autonomous machine-learning experiments overnight and wake up to a log of what the AI discovered.

USE CASE 2

Study how an AI agent iterates on real training pipelines and makes architectural decisions.

USE CASE 3

Compare different model architectures and hyperparameters fairly under a fixed time budget.

USE CASE 4

Tinker with autonomous AI research loops without manual intervention between runs.

What is it built with?

PythonPyTorchNVIDIA GPUuv

How does it compare?

	karpathy/autoresearch	vllm-project/vllm	infiniflow/ragflow
Stars	79,286	79,191	79,820
Language	Python	Python	Python
Setup difficulty	hard	hard	hard
Complexity	3/5	4/5	4/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires NVIDIA GPU with CUDA support, PyTorch compilation, and autonomous agent loop that may take hours to produce meaningful results.

License could not be detected automatically. Check the repository's LICENSE file before use.

In plain English

autoresearch is a small experimental setup that lets an AI agent run machine-learning research on a single GPU automatically, overnight. The idea is to give the agent a working but simplified language-model training pipeline and let it experiment by itself: it modifies the training code, runs a short training, checks whether the result improved, keeps or discards the change, and repeats. You wake up the next day to a log of experiments and, hopefully, a better model. The training code is a simplified single-GPU implementation drawn from a related project called nanochat. The repository is deliberately tiny and centers on three files. prepare.py handles one-time data preparation, it downloads training data and trains a tokenizer, plus runtime utilities. The agent does not touch this file. train.py is the single file the agent edits and contains the full model, optimizer, and training loop, so architecture, hyperparameters, batch size, and similar choices are all fair game. program.md is a short instructions file that you, the human, edit to set up your "research org", it is what you point the agent at to start a run. Each training run uses a fixed five-minute wall-clock time budget, no matter the hardware. The metric tracked is val_bpb (validation bits per byte), where lower is better. The fixed budget means roughly twelve experiments per hour and around a hundred while you sleep, and it lets architectural changes be compared fairly. Someone would use autoresearch to tinker with autonomous AI research loops or to study how an agent iterates on a real training pipeline. Requirements are a single NVIDIA GPU, Python 3.10 or newer, and the uv project manager.

Copy-paste prompts

Prompt 1

Set up autoresearch to run overnight experiments on my GPU. What do I need to edit in program.md to get started?

Prompt 2

How do I modify train.py so the AI agent can experiment with different batch sizes and learning rates?

Prompt 3

Show me how to interpret the validation bits per byte metric and the experiment logs autoresearch produces.

Prompt 4

I want the AI to try different model architectures. Which file should I focus on and what constraints should I set?

Prompt 5

Walk me through what happens in one five-minute training run and how the agent decides to keep or discard a change.

Frequently asked questions

What is autoresearch?

An experimental system that lets an AI agent automatically run machine-learning research overnight on a single GPU, modifying training code and iterating to improve model performance.

What language is autoresearch written in?

Mainly Python. The stack also includes Python, PyTorch, NVIDIA GPU.

What license does autoresearch use?

License could not be detected automatically. Check the repository's LICENSE file before use.

How hard is autoresearch to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is autoresearch for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub karpathy on gitmyhub

Verify against the repo before relying on details.