Catch agent loops and redundant tool calls before they burn tokens
Compress long agent traces by dropping stale tool result blocks
Build a memory of past agent patterns recalled by semantic search
Replay a failed session and ask the model to simulate a fork point
pip install is simple but useful features need an LLM provider key and the ONNX embedding extra.
agentmw is a piece of middleware for AI agents. When developers build programs that let a language model take many turns of action (for example, calling tools, reading files, then answering), the model can get stuck in loops, repeat the same tool call, contradict itself, give up halfway, or make up facts. agentmw sits between your code and the model, watches what is happening in each session, and tries to catch these failures while the run is still going. It is open source, released under Apache-2.0, and the README states it works with any model and any framework. The library has several layers. An LLM monitor sends recent turns to a provider (Ollama, OpenAI, Anthropic, or OpenRouter) to classify them as loop, redundant tool call, contradiction, abandonment, or hallucination. A heuristic monitor runs the same checks with regex rules as a prefilter and as a fallback. A compression layer trims stale tool-result blocks from the conversation while keeping the most recent ones. A reasoning library in SQLite remembers patterns from past runs and recalls them by meaning using a small local embedding model (BGE-small ONNX, 30 MB). There is also a time-travel command-line tool that walks through a saved trace, identifies the "point of no return" where the agent went off-track, counts how many tokens were wasted, and can ask the provider to simulate what would have happened on the other branch. After each session, a background extractor distills one to three reusable patterns and adds them to the reasoning library, so the system grows its own memory of what worked and what did not. A circuit breaker short-circuits provider calls after three failures in 30 seconds, with a 60-second cooldown, so a sick monitor never slows down the main client. Installation is via pip, with optional extras for semantic recall and for the MCP server (a protocol for talking to tools like Claude Desktop and Cursor). The agentmw command-line tool includes a demo, a config viewer, memory save and recall, session listing, timeline and replay, an extract command, and a stats command. From Python, you wrap a normal client like anthropic.Anthropic with the wrap function and call it as usual. After the call, a trace object exposes which monitors triggered, the compression ratio, and which past patterns were recalled. Config reads defaults, then a TOML file, then environment variables, then explicit arguments.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.