jumperz11/judge-loop

Analysis updated 2026-05-18

★ 1PythonAudience · developerComplexity · 3/5LicenseSetup · moderate

Mindmap

mindmap
  root((JudgeLoop))
    Roles
      Fable architect judge
      Codex default builder
      Swap any LLM
    Protocol
      Freeze gates first
      Builder writes evidence
      Judge reviews gates
    Repo State
      docs/HANDOFF.md
      docs/gates/
      docs/lanes/
    CLI
      judgeloop init
      judgeloop doctor

mindmap root((JudgeLoop)) Roles Fable architect judge Codex default builder Swap any LLM Protocol Freeze gates first Builder writes evidence Judge reviews gates Repo State docs/HANDOFF.md docs/gates/ docs/lanes/ CLI judgeloop init judgeloop doctor

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Stop an AI builder model from grading its own code by routing all judgments through a separate architect model.

USE CASE 2

Freeze acceptance gates before coding starts so success criteria cannot shift after the builder sees the results.

USE CASE 3

Keep AI coding session state in a docs folder in the repo instead of losing it in chat history.

What is it built with?

PythonMarkdownCLI

How does it compare?

	jumperz11/judge-loop	a-bissell/unleash-lite	abhiinnovates/whatsapp-hr-assistant
Stars	1	1	1
Language	Python	Python	Python
Setup difficulty	moderate	hard	hard
Complexity	3/5	4/5	3/5
Audience	developer	researcher	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires access to two separate AI models (one for judging, one for building) and a manual prompt-copy workflow between them.

MIT license, use, modify, and distribute freely for any purpose including commercial use.

In plain English

JudgeLoop is a workflow protocol for software built by AI coding agents. It addresses a specific problem: when you use one AI model to both write code and check whether the code is correct, the model is grading its own work, which is unreliable. JudgeLoop separates those two jobs across different models. The intended setup uses one model (the README names Anthropic's Fable) as the architect and judge, and a different model (with Codex as the suggested default) as the builder. Before any coding starts, the architect defines a set of pass/fail gates for the current work slice, such as a specific endpoint returning the correct status code or a test suite passing with zero failures. These gates are frozen in files in the repository before the builder touches any code. The builder then writes code and reports raw evidence back to the repo, meaning actual command output and exit codes, not opinions. The architect reviews that evidence against the frozen gates and issues a verdict of pass or continue. Everything is stored in a docs folder inside your project so the state is part of the repository rather than living in a chat history that disappears. A small command-line tool called judgeloop provides init and doctor commands to set up and validate the folder structure. The workflow is intentionally manual: you copy prompts into the architect model, paste the output into the builder, and review the verdict yourself. The protocol is designed to make expensive, capable models focus only on judgment and planning while cheaper or faster models handle the typing. The project is early stage and described as a usable manual kit. It is MIT licensed.

Copy-paste prompts

Prompt 1

How do I set up JudgeLoop in my project and define frozen gates for a code slice before the builder starts coding?

Prompt 2

How does the JudgeLoop verdict cycle work, specifically the flow between the architect prompt and the builder evidence report?

Prompt 3

Can I use a different builder LLM instead of Codex with JudgeLoop, and what does the builder need to produce?

Frequently asked questions

What is judge-loop?

A repo-based workflow protocol that separates AI code generation from AI code review, using one model as architect/judge and another as builder, with frozen pass/fail gates stored in the repository.

What language is judge-loop written in?

Mainly Python. The stack also includes Python, Markdown, CLI.

What license does judge-loop use?

MIT license, use, modify, and distribute freely for any purpose including commercial use.

How hard is judge-loop to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is judge-loop for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub jumperz11 on gitmyhub

Verify against the repo before relying on details.