wisoba/deadbranchbench

Analysis updated 2026-05-18

★ 0PythonAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((deadbranchbench))
    What it does
      Record agent events
      Label branches live or dead
      Compute waste metrics
    Key Metrics
      Dead Branch Ratio
      Dead Branch Cost
      Time To Death
      Failed Task Spend
    Tech Stack
      Python CLI
      JSONL event capture
      HTML reports
    Use Cases
      Benchmark agent efficiency
      Find expensive dead paths
      Detect confidently wrong work
    Audience
      AI agent researchers
      Developer tool builders

mindmap root((deadbranchbench)) What it does Record agent events Label branches live or dead Compute waste metrics Key Metrics Dead Branch Ratio Dead Branch Cost Time To Death Failed Task Spend Tech Stack Python CLI JSONL event capture HTML reports Use Cases Benchmark agent efficiency Find expensive dead paths Detect confidently wrong work Audience AI agent researchers Developer tool builders

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Measure what fraction of an AI coding agent's token spend went to dead-end attempts that contributed nothing to the final answer.

USE CASE 2

Detect runs where an agent passed its own internal checks but failed an external test evaluator, revealing confidently wrong work.

USE CASE 3

Generate an HTML report with a branch tree and cost breakdown to understand where an agent wastes the most compute.

What is it built with?

PythonJSONLCLI

How does it compare?

	wisoba/deadbranchbench	0xhassaan/nn-from-scratch	a-little-hoof/dsr
Stars	0	0	0
Language	Python	Python	Python
Setup difficulty	moderate	moderate	hard
Complexity	3/5	4/5	5/5
Audience	researcher	developer	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Week 1 scope: event schema and CLI only. No pruning or agent intelligence built in yet.

No license information is mentioned in the README.

In plain English

When an AI agent works on a coding task, it tries multiple approaches. Some of those attempts succeed and contribute to the final result, others fail, get discarded, or turn out to have been completely unnecessary. DeadBranchBench is a Python tool for measuring how much of that wasted effort actually costs, in terms of tokens, tool calls, retries, and computation time. The main concept is a "dead branch": a path the agent took that produced no contribution to the final result. The tool records what an agent does as a series of events, organizes them into a tree of branches, and then lets a human reviewer label each branch as live (contributed to the solution), support (failed but produced useful information), deferred (preserved for possible future use), or dead (consumed cost with no measurable output). From those labels, the tool computes metrics like the Dead Branch Ratio, which measures what fraction of the total cost went to dead work. The tool also handles a subtler failure mode: an agent that appeared to succeed by its own internal checks but still failed an external evaluator. A run can have zero dead branches by its own accounting and yet fail a test suite. This distinction matters for benchmarking agents on real tasks where finishing a run is not the same as producing a correct result. In practice you run the tool from a command line. You observe a command or script, capture events as a JSONL file, build a trace skeleton, label the branches interactively, compute the metrics, and optionally export an HTML report showing the branch tree, cost breakdown, and top waste contributors. The tool does not prune branches or optimize the agent, it only measures and reports. This is an early-stage project aimed at researchers and developers who build or evaluate AI coding agents and want objective data on where compute is being wasted.

Copy-paste prompts

Prompt 1

I want to measure dead branch cost for my AI coding agent using DeadBranchBench. How do I instrument my agent and capture the event stream?

Prompt 2

How do I use DeadBranchBench to label branches interactively after a run and compute the Dead Branch Ratio?

Prompt 3

I have a trace from DeadBranchBench. How do I generate an HTML report and interpret the branch tree and cost breakdown?

Prompt 4

How do I add an external task evaluator to a DeadBranchBench trace so I can detect confidently wrong agent runs?

Frequently asked questions

What is deadbranchbench?

A benchmarking tool that records AI agent work as events, labels branches as live or dead after review, and measures how much compute cost went to wasted effort.

What language is deadbranchbench written in?

Mainly Python. The stack also includes Python, JSONL, CLI.

What license does deadbranchbench use?

No license information is mentioned in the README.

How hard is deadbranchbench to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is deadbranchbench for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub wisoba on gitmyhub

Verify against the repo before relying on details.