Reproduce a training dataset where AI models get only as much reasoning as each math problem actually requires.
Download and evaluate a pre-trained 4B or 7B reasoning model without rebuilding the dataset from scratch.
Study how to measure problem difficulty by finding the minimum reasoning snippet needed to reach the right answer.
Run standard math benchmark evaluations on models trained with adaptive hint-length data.
Requires GPU servers and experience running AI model servers locally. Not a plug-and-play tool, aimed at researchers with an ML infrastructure background.
This is a research project exploring how to train AI reasoning models more effectively by using fewer, more carefully chosen examples. The core idea is that not every problem requires the same amount of step-by-step reasoning, so the training data should reflect that instead of treating all problems the same way. The method works by running two AI models side by side on a set of math problems. One model is a "thinking" model that writes out long, detailed reasoning chains. The other is a simpler "instruct" model that tries to answer directly. For each problem, the code figures out the shortest snippet from the thinking model's reasoning chain that the instruct model actually needs in order to get the right answer. That minimum snippet is a measure of how hard the problem is, and it determines how much step-by-step reasoning gets included in the final training example. Problems fall into three categories: easy ones the instruct model can solve with no hints, medium ones that need a small reasoning snippet, and hard ones where the full reasoning chain is necessary. The result is a 1,000-example training dataset where easy problems get short answers and hard problems get long ones, rather than padding everything with unnecessarily long reasoning or cutting everything short. The repository includes the raw problem set, the finished 1K training dataset, all the scripts to reproduce the dataset from scratch, and evaluation code that tests the trained models on standard math benchmarks. Two trained models are available for download: a 4-billion-parameter version and a 7-billion-parameter version. This is primarily a research artifact aimed at people working on AI model training, not a general-purpose tool. Running the data construction pipeline requires access to GPU servers and some familiarity with running AI model servers locally.
← redai-infra on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.