evilsocket/audit

Analysis updated 2026-06-24

★ 397PythonAudience · developerComplexity · 4/5LicenseSetup · moderate

Mindmap

mindmap
  root((audit))
    Inputs
      Git repo
      Claude OAuth login
      YAML config
    Outputs
      Schema validated report
      Proof of concept exploits
      Per finding traces
    Stages
      Recon
      Hunt
      Validate
      Gapfill
      Dedupe
      Trace
      Feedback
      Report
    Tech Stack
      Python
      Claude Agent SDK
      Opus 4.7
      Sonnet 4.6
      OpenRouter

mindmap root((audit)) Inputs Git repo Claude OAuth login YAML config Outputs Schema validated report Proof of concept exploits Per finding traces Stages Recon Hunt Validate Gapfill Dedupe Trace Feedback Report Tech Stack Python Claude Agent SDK Opus 4.7 Sonnet 4.6 OpenRouter

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run a multi stage Claude powered security audit on a code repository from the command line

USE CASE 2

Reproduce found bugs against a live deployment with network access locked to the target host

USE CASE 3

Override per stage models and route through OpenRouter or a custom Anthropic compatible gateway

What is it built with?

PythonClaude Agent SDKYAML

How does it compare?

	evilsocket/audit	avaturn-live/avtr-1	kasothaphie/genrecon
Stars	397	362	478
Language	Python	Python	Python
Setup difficulty	moderate	hard	hard
Complexity	4/5	4/5	5/5
Audience	developer	developer	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Needs the Claude Code Agent SDK and a Claude Pro or Max login, and a real audit can fan out to dozens of hunt tasks so cost caps matter.

MIT license, so you can use and modify it freely as long as you keep the copyright notice.

In plain English

audit is a command-line agent that looks for security bugs in a code repository by running Claude through an eight-stage pipeline. Instead of asking one large model to find everything, the project breaks the work into many small, focused agents that ask narrow questions, then has a second agent on a different model try to disprove the first one. Findings only pass the gate if the agent can show that an attacker-controlled input actually reaches the suspect code path. The pipeline is a from-scratch reimplementation of an architecture Cloudflare described in a blog post called Project Glasswing. The eight stages are: Recon to map the repo and spawn hunt tasks, Hunt to attack one bug class at a time and build proofs of concept, Validate to adversarially re-read with a different model, Gapfill to re-queue thin areas, Dedupe to cluster findings by root cause, Trace to prove reachability, Feedback to spawn new hunts from confirmed bugs, and Report to emit a schema-validated final document. The project is MIT licensed and is built on the official Claude Code Agent SDK. By default it bills against the user's existing Claude Pro or Max subscription through the same OAuth login the regular Claude CLI uses, so no separate API key is needed. The README spends some space on cost control: you can cap concurrency, cap initial hunt fanout, and set a dollar budget that the runner enforces between and within stages. A typical codebase produces 15 to 50 hunt tasks and 25 or more findings to validate, so the controls matter. There is an optional mode where the agents reproduce findings against a live deployment of the target instead of a local proof of concept, with network access restricted to that one host. Other supported setups include routing through OpenRouter or a custom Anthropic-compatible gateway, and per-stage model overrides in a YAML config. By default the Recon, Validate, and Trace stages use Opus 4.7, while Hunt, Gapfill, Dedupe, Feedback, and Report use Sonnet 4.6.

Copy-paste prompts

Prompt 1

Run audit on this repo against my Claude Pro subscription with a 20 dollar budget cap and 4 concurrent hunts

Prompt 2

Edit the audit YAML config so Recon uses Opus 4.7 and Hunt uses a cheaper Sonnet variant through OpenRouter

Prompt 3

Explain the Trace stage in audit and how it proves attacker reachability before a finding is kept

Prompt 4

Compare evilsocket audit to Cloudflare Project Glasswing and list what is faithful versus what is reimagined

Frequently asked questions

What is audit?

CLI security audit agent that runs Claude through an eight stage pipeline (Recon, Hunt, Validate, Gapfill, Dedupe, Trace, Feedback, Report) to find and prove repo bugs.

What language is audit written in?

Mainly Python. The stack also includes Python, Claude Agent SDK, YAML.

What license does audit use?

MIT license, so you can use and modify it freely as long as you keep the copyright notice.

How hard is audit to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is audit for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.