Drop a research scaffold into an ML repo so a coding agent can run baseline-first experiments
Track ideas, runs, and reports in a structured .researchloop folder with a runs.jsonl ledger
Have an agent propose two to four grounded experiments before any code runs
Run a budgeted autonomous research loop across PyTorch or HuggingFace projects
Alpha software with breaking changes between minor versions, so pin a version and expect to wire it up to a coding agent before it does useful work.
AutoResearch-AI is an open source npm package that installs a research scaffolding into a machine learning repository so that coding agents like Codex, Claude Code, Hermes, and Cursor can carry out research workflows in a disciplined way. The author describes the project as alpha and pre-1.0, with breaking changes still possible between minor versions, so production users are told to pin a specific version and watch the changelog before upgrading. The package name is autoresearch-ai and the main CLI command is autoresearch, with researchloop kept as a legacy alias. You install it with npm install -g autoresearch-ai, or you clone the repo and run npm link for local development. The quick start commands cover the full loop: init for an agent, set a goal with a metric and direction, inspect the repo, scan papers, propose ideas, generate prompts, run baselines and experiments, compare results, and produce reports. Running autoresearch init creates a hidden .researchloop directory that holds the agent instructions, the baseline notes, the active goal and plan, a repo profile, a team folder, adapters, and a scratchpad with a thread log, a runs.jsonl ledger, a memory file, and folders for ideas, papers, variants, and sweeps. The README is clear that the package does not promise to train models for you. What it provides is the operating system around research: constraints, a baseline first habit, structured experiment logs, idea files, and reports that can be reproduced. The expected interaction with a topic is also spelled out. When given a topic, the agent should first check whether a usable baseline exists and is documented, then propose writing baseline.md if not. Only after that does it offer three modes: propose, which suggests two to four grounded experiments, novel, which reasons about genuinely different hypotheses, and autonomous, which runs the loop inside an agreed budget. The author reports local testing on a MacBook covering init, inspect, prompt, doctor, and report, with detection of pytorch and huggingface projects, plus a tiny synthetic training run completed on Apple's MPS backend. Target users are PhD students, small AI labs, independent researchers, and companies doing model, prompt, or eval optimisation work.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.