Drop into an existing OpenAI Python app with two lines to block jailbreaks and homoglyph injection attacks.
Wrap a custom tool calling function with secure_agent to gate subprocess and eval calls before they execute.
Use the encrypted memory store to keep conversation history off disk in plain text for a desktop assistant.
Trial attack presets in the local browser dashboard to evaluate firewall coverage before a launch.
Just pip install plus a small setup script to set the security question, then two lines of code to enable.
AgentShield is a Python library that sits between an application and the AI model it talks to, watching the prompts and responses for signs of attack. The README describes it as a firewall for AI agents, meant to catch jailbreaks, prompt injections, and similar tricks before they reach the model, without forcing the developer to rewrite their existing code. The project lists several defense layers. There are pattern matchers that look for known jailbreak phrasings, base64 or hex evasion tricks, and zero-width characters. A homoglyph normalizer converts visually similar letters from other alphabets back to plain Latin, so an attacker cannot hide the word ignore by swapping in Greek or Cyrillic lookalikes. A small machine learning classifier, built from a TF-IDF vectorizer and a logistic regression model, is used to flag injection attempts it has not seen before. A time-based tracker watches request patterns per user to spot brute-force probing. There are also pieces that protect the host application itself. A tool-calling guard checks arguments before letting code call things like subprocess or eval. A data masking layer redacts API keys and other secrets in outgoing text. An encrypted local memory store, using AES-256 through Fernet, keeps saved conversations and credentials from sitting on disk in plain text, with the encryption key derived from a security question through PBKDF2. Installation is a pip install of the agentshield-firewall package. After running a small setup script to configure the security question, the developer adds two lines, an import and a call to agentshield.init, and the library monkey-patches the OpenAI client and outgoing HTTP requests so calls to AI endpoints are automatically inspected. A decorator named secure_agent is offered for wrapping specific functions instead. The README also describes a local browser dashboard for trying attack presets and several test scripts for the firewall, the auth layer, and the auto-protect hooks. The project is released under the Apache 2.0 license.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.