explaingit

rion0709/agentshield

1PythonAudience · developerComplexity · 3/5ActiveLicenseSetup · easy

TLDR

Python firewall library for AI agents that monkey-patches the OpenAI client to inspect prompts and responses for jailbreaks, prompt injection, secret leaks, and unsafe tool calls.

Mindmap

mindmap
  root((agentshield))
    Inputs
      LLM prompts
      Tool call arguments
      User identifiers
    Outputs
      Blocked or sanitised calls
      Masked secrets
      Encrypted memory store
    Use Cases
      Block prompt injection
      Mask API keys in outputs
      Rate limit abusive users
      Encrypt local chat history
    Tech Stack
      Python
      scikit-learn
      TF-IDF
      AES-256
      PBKDF2
      Fernet

Things people build with this

USE CASE 1

Drop into an existing OpenAI Python app with two lines to block jailbreaks and homoglyph injection attacks.

USE CASE 2

Wrap a custom tool calling function with secure_agent to gate subprocess and eval calls before they execute.

USE CASE 3

Use the encrypted memory store to keep conversation history off disk in plain text for a desktop assistant.

USE CASE 4

Trial attack presets in the local browser dashboard to evaluate firewall coverage before a launch.

Tech stack

Pythonscikit-learnTF-IDFFernetAES

Getting it running

Difficulty · easy Time to first run · 30min

Just pip install plus a small setup script to set the security question, then two lines of code to enable.

Apache 2.0 license, free to use commercially with patent protection, as long as you keep the license notice and state any changes.

In plain English

AgentShield is a Python library that sits between an application and the AI model it talks to, watching the prompts and responses for signs of attack. The README describes it as a firewall for AI agents, meant to catch jailbreaks, prompt injections, and similar tricks before they reach the model, without forcing the developer to rewrite their existing code. The project lists several defense layers. There are pattern matchers that look for known jailbreak phrasings, base64 or hex evasion tricks, and zero-width characters. A homoglyph normalizer converts visually similar letters from other alphabets back to plain Latin, so an attacker cannot hide the word ignore by swapping in Greek or Cyrillic lookalikes. A small machine learning classifier, built from a TF-IDF vectorizer and a logistic regression model, is used to flag injection attempts it has not seen before. A time-based tracker watches request patterns per user to spot brute-force probing. There are also pieces that protect the host application itself. A tool-calling guard checks arguments before letting code call things like subprocess or eval. A data masking layer redacts API keys and other secrets in outgoing text. An encrypted local memory store, using AES-256 through Fernet, keeps saved conversations and credentials from sitting on disk in plain text, with the encryption key derived from a security question through PBKDF2. Installation is a pip install of the agentshield-firewall package. After running a small setup script to configure the security question, the developer adds two lines, an import and a call to agentshield.init, and the library monkey-patches the OpenAI client and outgoing HTTP requests so calls to AI endpoints are automatically inspected. A decorator named secure_agent is offered for wrapping specific functions instead. The README also describes a local browser dashboard for trying attack presets and several test scripts for the firewall, the auth layer, and the auto-protect hooks. The project is released under the Apache 2.0 license.

Copy-paste prompts

Prompt 1
Install agentshield-firewall, run the setup script, and add agentshield.init to my existing OpenAI Python app.
Prompt 2
Wrap my run_query function with the secure_agent decorator from AgentShield so subprocess calls are gated.
Prompt 3
Tune the TF-IDF plus logistic regression classifier in AgentShield by adding 20 of my own jailbreak attempts as training data.
Prompt 4
Configure the data masking layer in AgentShield to redact AWS access keys and Stripe secret keys in outbound responses.
Prompt 5
Use the AgentShield dashboard to run the homoglyph attack preset against my Claude integration and report any prompts that get through.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.