explaingit

sapientinc/hrm-text

617Python

TLDR

HRM-Text is a code release that lets a small team pretrain a 1 billion parameter language model from scratch for roughly $1000 in GPU rental.

Mindmap

A visual breakdown will appear here once this repo is fully enriched.

In plain English

HRM-Text is a code release that lets a small team pretrain a 1 billion parameter language model from scratch for roughly $1000 in GPU rental. The headline claim in the README is that the same approach reaches benchmark numbers comparable to much larger projects while using 130 to 600 times less compute and 150 to 900 times less data. HRM stands for hierarchical recurrent model, the architecture the authors are pushing as an alternative to a standard transformer of the same size. The repository ships the full pretraining stack: a hierarchical recurrent architecture, a sequence packing trick called PrefixLM, FlashAttention 3 attention kernels, distributed training via PyTorch FSDP2, evaluation scripts for common benchmarks, and a tool to export the trained checkpoint into Hugging Face Transformers format. The README is explicit that the attention path needs Hopper-class GPUs such as the H100, since it relies on FlashAttention 3. Two reference runs are documented. The L size has 600 million parameters and trains on a single node of 8 H100s in about 50 hours, with reported scores including 77.6% on GSM8k and 56.6% on MMLU. The XL size has 1 billion parameters and trains on two nodes of 8 H100s each in about 46 hours, scoring 84.7% on GSM8k and 60.7% on MMLU. The pricing math assumes $2 per H100 hour. The workflow walks the user through preparing tokenized data with a companion repo called data_io, running training in a published Docker image, checking NCCL communication for multi-node setups, logging to Weights and Biases, launching with torchrun, evaluating against benchmarks like GSM8k, MATH, MMLU, and ARC, and finally exporting to the Hugging Face format. The README also lists alternative baseline architectures included for comparison, such as a standard transformer, a tiny recursive model, and a universal transformer.

Open on GitHub → Explain another repo

Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.