kyegomez/openmythos

★ 12,560PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((repo))
    What It Is
      Research library
      Recurrent-depth transformer
      Claude architecture hypothesis
    Architecture
      Looped shared layers
      State updated each pass
      No chain-of-thought tokens
    Model Sizes
      1B to 1T parameters
      Pre-configured presets
      Attention style options
    Training
      Single and multi-GPU
      FineWeb-Edu dataset
      AdamW optimizer

mindmap root((repo)) What It Is Research library Recurrent-depth transformer Claude architecture hypothesis Architecture Looped shared layers State updated each pass No chain-of-thought tokens Model Sizes 1B to 1T parameters Pre-configured presets Attention style options Training Single and multi-GPU FineWeb-Edu dataset AdamW optimizer

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Train a recurrent-depth transformer on a single GPU or multi-GPU setup using the included 3B parameter training script.

USE CASE 2

Experiment with looped-layer architectures as a research alternative to standard one-pass transformers.

USE CASE 3

Use pre-configured model presets from 1B to 1 trillion parameters to prototype experiments without writing architecture code.

Tech stack

PythonPyTorch

Getting it running

Difficulty · hard Time to first run · 1day+

Training billion-parameter models requires high-end GPU hardware with sufficient VRAM, not runnable on a standard laptop.

In plain English

OpenMythos is a Python library that implements a theoretical guess at how the Claude AI model (made by Anthropic) might be built internally. The author starts from a hypothesis that Claude uses a specific architecture called a Recurrent-Depth Transformer, then builds a working version of that architecture from scratch using publicly available research papers. The project is explicitly marked as independent and not affiliated with Anthropic. The central idea of a Recurrent-Depth Transformer is that instead of stacking hundreds of unique layers once, a smaller set of layers is run repeatedly in a loop. Each pass through the loop updates an internal state, and the original input signal is re-injected at every step to keep the model from losing track of what it was asked. This looped processing happens entirely inside a single forward pass, with no intermediate text outputs, meaning the model can do more "thinking" without generating any visible chain-of-thought tokens. The library is installable via pip and provides pre-configured model sizes ranging from 1 billion to 1 trillion parameters. Each size preset specifies how many internal dimensions, expert modules, loop iterations, and context length the model uses. The attention mechanism can be switched between two styles: one that reduces memory by using fewer key-value heads, and one that compresses key-value representations using a low-rank factorization technique. A training script for the 3 billion parameter variant is included, targeting a dataset called FineWeb-Edu. It supports both single-GPU and multi-GPU training, uses the AdamW optimizer, and trains in lower-precision floating point to reduce memory use. The documentation folder includes a full API reference and a guide on recommended training datasets. This repository is a research and experimentation tool, not a finished product. It is useful for developers and researchers interested in exploring alternative transformer architectures inspired by speculation about frontier AI model internals.

Copy-paste prompts

Prompt 1

Set up kyegomez/openmythos and train the 3B parameter recurrent-depth model on the FineWeb-Edu dataset on a single GPU.

Prompt 2

Explain how the recurrent-depth loop in OpenMythos differs from a standard transformer forward pass and what problems it might solve.

Prompt 3

Modify the OpenMythos 1B preset to run with 8 loop iterations instead of the default and compare the output perplexity.

Prompt 4

What datasets does the OpenMythos documentation recommend for training the smaller 1B parameter variant from scratch?

Open on GitHub → Explain another repo

← kyegomez on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.