Analysis updated 2026-05-18
Stop an AI builder model from grading its own code by routing all judgments through a separate architect model.
Freeze acceptance gates before coding starts so success criteria cannot shift after the builder sees the results.
Keep AI coding session state in a docs folder in the repo instead of losing it in chat history.
| jumperz11/judge-loop | a-bissell/unleash-lite | abhiinnovates/whatsapp-hr-assistant | |
|---|---|---|---|
| Stars | 1 | 1 | 1 |
| Language | Python | Python | Python |
| Setup difficulty | moderate | hard | hard |
| Complexity | 3/5 | 4/5 | 3/5 |
| Audience | developer | researcher | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires access to two separate AI models (one for judging, one for building) and a manual prompt-copy workflow between them.
JudgeLoop is a workflow protocol for software built by AI coding agents. It addresses a specific problem: when you use one AI model to both write code and check whether the code is correct, the model is grading its own work, which is unreliable. JudgeLoop separates those two jobs across different models. The intended setup uses one model (the README names Anthropic's Fable) as the architect and judge, and a different model (with Codex as the suggested default) as the builder. Before any coding starts, the architect defines a set of pass/fail gates for the current work slice, such as a specific endpoint returning the correct status code or a test suite passing with zero failures. These gates are frozen in files in the repository before the builder touches any code. The builder then writes code and reports raw evidence back to the repo, meaning actual command output and exit codes, not opinions. The architect reviews that evidence against the frozen gates and issues a verdict of pass or continue. Everything is stored in a docs folder inside your project so the state is part of the repository rather than living in a chat history that disappears. A small command-line tool called judgeloop provides init and doctor commands to set up and validate the folder structure. The workflow is intentionally manual: you copy prompts into the architect model, paste the output into the builder, and review the verdict yourself. The protocol is designed to make expensive, capable models focus only on judgment and planning while cheaper or faster models handle the typing. The project is early stage and described as a usable manual kit. It is MIT licensed.
A repo-based workflow protocol that separates AI code generation from AI code review, using one model as architect/judge and another as builder, with frozen pass/fail gates stored in the repository.
Mainly Python. The stack also includes Python, Markdown, CLI.
MIT license, use, modify, and distribute freely for any purpose including commercial use.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.