caspian-detector/caspian

★ 1Audience · researcherComplexity · 4/5ActiveLicenseSetup · hard

Mindmap

mindmap
  root((caspian))
    Inputs
      Multi-agent run logs
      TAMAS benchmark
      ACIArena benchmark
      OpenAI API key
    Outputs
      Cascade detection scores
      Attribution per agent
      Precision recall F1 AUROC
    Use Cases
      Detect prompt injection chains
      Audit AutoGen and CrewAI runs
      Reproduce paper experiments
    Tech Stack
      Python
      OpenAI API
      AutoGen
      CrewAI
      MetaGPT

mindmap root((caspian)) Inputs Multi-agent run logs TAMAS benchmark ACIArena benchmark OpenAI API key Outputs Cascade detection scores Attribution per agent Precision recall F1 AUROC Use Cases Detect prompt injection chains Audit AutoGen and CrewAI runs Reproduce paper experiments Tech Stack Python OpenAI API AutoGen CrewAI MetaGPT

Things people build with this

USE CASE 1

Reproduce the CASPIAN paper results on the TAMAS and ACIArena benchmarks.

USE CASE 2

Add CASPIAN as a runtime monitor to an AutoGen pipeline to flag suspicious agent-to-agent influence.

USE CASE 3

Compare CASPIAN against simpler single-agent guardrails on a custom multi-agent app.

USE CASE 4

Use the attribution output to debug which agent in a CrewAI workflow amplified a bad instruction.

Tech stack

PythonOpenAIAutoGenCrewAIMetaGPT

Getting it running

Difficulty · hard Time to first run · 1day+

Needs Python deps, an OpenAI API key, and downloading two separate benchmark suites before any experiment will run.

MIT license, you can use, modify, and redistribute the code freely with attribution.

In plain English

CASPIAN is research code from a group at Virginia Tech that tries to spot a specific kind of failure in systems where several AI language model agents talk to each other. In these multi agent setups, one agent can be tricked or fed bad input, and its output then flows on to the next agent, then the next, until the whole group ends up producing something harmful or wrong. The authors call this a cascade attack, and their point is that the early warning signs are spread out across many small messages between agents, so any single check looking at one agent in isolation will miss it. The tool watches the agents while they run and builds a picture of who is influencing whom over time. It does this with a method the paper calls late interaction conditional transfer entropy, which is a statistical way of measuring how much one agent's output is shaping another's. From that picture it tries to do two things at once: notice when a cascade is starting, and point at which agents kicked it off, which ones passed it along, and which ones made it worse. The authors say it does this with under one percent extra latency, meaning the monitoring is fast enough to run online while the agents are working. The README is mostly a setup and experiments guide for researchers who want to reproduce the results. It walks through cloning the repo, installing Python dependencies, setting an OpenAI API key, and pulling down two benchmark suites called TAMAS and ACIArena, which contain hundreds of scripted attack and benign scenarios across different agent frameworks like AutoGen, CrewAI, MetaGPT, and LLMDebate. Once the benchmarks are in place there are long lists of command line invocations for running the full attack matrix, running attack only or benign only subsets, limiting the number of scenarios for debugging, and so on. After the runs finish, a separate set of evaluation commands compute detection metrics like precision, recall, F1, and AUROC, plus attribution metrics that measure how well the tool identified the origin, amplifier, and bridge agents in each cascade. The repo is in active development, has very few stars, and points at an arXiv paper that is still listed as coming soon. It is released under the MIT license and is clearly aimed at security and machine learning researchers rather than general users.

Copy-paste prompts

Prompt 1

Clone CASPIAN, install its Python deps, set my OpenAI key, and pull the TAMAS and ACIArena benchmarks ready to run.

Prompt 2

Run the full attack matrix in CASPIAN limited to 20 scenarios per framework so I can sanity-check the pipeline overnight.

Prompt 3

Show me how to compute precision, recall, F1, and AUROC on a benign-only subset using the evaluation scripts in this repo.

Prompt 4

Wrap the CASPIAN monitor as a callback I can plug into a CrewAI run to flag cascade attacks while it executes.

Prompt 5

Explain the late interaction conditional transfer entropy method in CASPIAN with a small numerical example.

Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.