Analysis updated 2026-05-18
Strip names, emails, and credit card numbers from user input before sending it to an external AI like ChatGPT or Claude.
Run a local OpenAI-compatible proxy that automatically redacts PII from all requests and restores original values in responses.
Batch-process files containing personal data to produce sanitized versions safe for AI analysis.
Define custom regex patterns for internal identifiers like customer IDs to extend the standard detection rules.
| one-million-lines/privacy-pii-redactor | adeliox/klein-head-swap | ats4321/ragit | |
|---|---|---|---|
| Stars | 4 | 4 | 4 |
| Language | Python | Python | Python |
| Setup difficulty | easy | moderate | moderate |
| Complexity | 2/5 | 3/5 | 2/5 |
| Audience | developer | designer | developer |
Figures from each repo's GitHub metadata at analysis time.
Redis is optional and only needed for reversible mapping storage, the core library and CLI work without it via in-memory storage.
Privacy-First PII Redactor is a tool that sits between your application and an external AI service. Before your prompts leave your system, the tool scans them for private information, replaces each piece with a labeled placeholder like EMAIL_1 or CREDIT_CARD_1, and forwards the cleaned version. Optionally, it stores a mapping and restores the original values into whatever the AI sends back. The detection system combines three layers. Regular expressions catch clearly structured data: email addresses, phone numbers, credit card numbers (with Luhn validation), IBAN bank account numbers, social security numbers, IP addresses, and more. Microsoft Presidio (a Microsoft open-source library) and spaCy (a natural language processing library) add a second and third layer that can recognize names, organization names, and locations even without predictable formatting. When the three layers detect overlapping matches, the tool resolves conflicts by priority: regex wins over Presidio, which wins over spaCy. You can use this tool three ways. As a Python library, you import it and call redact() on any string, getting back both the cleaned text and a mapping of what was replaced. As a command-line tool, you can process files or strings from the terminal. As a REST API running locally via Docker, you send text to a /v1/redact endpoint and receive cleaned text in return. There is also an LLM proxy mode where the tool acts as an OpenAI-compatible endpoint: it redacts your request, forwards it to the actual AI provider, and puts original values back into the response before returning it to you. The README includes an important warning: detection is probabilistic. Names and physical addresses are harder to catch reliably than emails or credit card numbers. The tool reduces risk but does not guarantee compliance with privacy regulations like GDPR or HIPAA. The license is MIT, allowing free use including commercial applications.
A Python library and REST API that strips names, emails, credit card numbers, and other private data from text before it reaches an external AI, with optional mapping storage to restore values in the response.
Mainly Python. The stack also includes Python, FastAPI, Microsoft Presidio.
Use freely for any purpose, including commercial, as long as you keep the copyright notice (MIT License).
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.