Analysis updated 2026-05-18
Give Claude Code a structured research workflow so it catches data leakage and reproducibility issues automatically during an ML experiment.
Log recurring agent mistakes to LESSONS.md so the same failure (like reporting a single-seed result as a finding) never repeats across sessions.
Scaffold a new ML experiment directory with reproducibility guardrails already in place before writing any code.
Use the literature-review reference doc to build a comparison matrix from papers the agent actually reads, not citations it invents.
| toadoum/ai-research-skill | captaingrock/krea2trainer | codenamekt/hexus | |
|---|---|---|---|
| Stars | 7 | 7 | 7 |
| Language | Python | Python | Python |
| Setup difficulty | easy | hard | moderate |
| Complexity | 2/5 | 4/5 | 3/5 |
| Audience | researcher | designer | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires an AI coding agent (Claude Code, Codex, or OpenClaw) already installed, the skill itself is just files you copy into the agent's skills directory.
AI Research Skill is a set of files you drop into your coding agent's skills directory to give it structured guidance for every stage of a machine learning research project. It works with Claude Code, Codex, and OpenClaw by placing a SKILL.md file where those agents automatically look for instructions, so the guidance loads whenever your task looks like research without any extra setup. The skill walks the agent through seven research stages: turning a vague idea into a testable hypothesis, doing a literature review from papers actually read rather than recalled, reproducing the strongest known baseline before claiming any improvement, designing experiments with proper seed fixing and data-leakage checks, running configurations so every result is reproducible, analyzing results across multiple seeds rather than a single lucky run, and writing a paper where every claim is backed by a number. What makes this project different from a static prompt is a self-improving loop. When the agent catches or makes a research mistake, it runs a small Python script that appends the lesson to a LESSONS.md file in your project. At the start of every future task, the agent reads that file and applies what it already learned. If the same mistake appears three times, the lesson gets promoted into the SKILL.md itself. The loop cannot be used to make the agent less rigorous: a built-in guardrail refuses any lesson that would weaken research integrity by fabricating, hiding, or cherry-picking results. The repository also includes reference documents for literature review, experiment design, and paper writing, plus a script that scaffolds a reproducible experiment directory with the correct folder structure. The project is licensed under MIT and aimed at researchers who run ML experiments with AI coding agents and want the agent to catch common mistakes (data leakage, single-seed results, fabricated citations) before they cost weeks of work.
A portable SKILL.md that gives Claude Code, Codex, and OpenClaw a full ML research workflow with guardrails against data leakage, fabricated citations, and single-seed results, plus a self-improving loop that logs and learns from past mistakes.
Mainly Python. The stack also includes Python, SKILL.md, AGENTS.md.
MIT license, use freely for any purpose, including commercial, as long as you keep the copyright notice.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.