explaingit

one-million-lines/privacy-pii-redactor

Analysis updated 2026-05-18

4PythonAudience · developerComplexity · 2/5LicenseSetup · easy

TLDR

A Python library and REST API that strips names, emails, credit card numbers, and other private data from text before it reaches an external AI, with optional mapping storage to restore values in the response.

Mindmap

mindmap
  root((PII Redactor))
    What it does
      Strip PII from prompts
      Replace with placeholders
      Restore after LLM reply
    Detection Layers
      Regex patterns
      Microsoft Presidio
      spaCy NER
    PII Types
      Emails phones cards
      Names organizations
      SSN IBAN IP address
    Usage Modes
      Python library
      CLI tool
      REST API Docker
      LLM proxy mode
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Strip names, emails, and credit card numbers from user input before sending it to an external AI like ChatGPT or Claude.

USE CASE 2

Run a local OpenAI-compatible proxy that automatically redacts PII from all requests and restores original values in responses.

USE CASE 3

Batch-process files containing personal data to produce sanitized versions safe for AI analysis.

USE CASE 4

Define custom regex patterns for internal identifiers like customer IDs to extend the standard detection rules.

What is it built with?

PythonFastAPIMicrosoft PresidiospaCyRedisDocker

How does it compare?

one-million-lines/privacy-pii-redactoradeliox/klein-head-swapats4321/ragit
Stars444
LanguagePythonPythonPython
Setup difficultyeasymoderatemoderate
Complexity2/53/52/5
Audiencedeveloperdesignerdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Redis is optional and only needed for reversible mapping storage, the core library and CLI work without it via in-memory storage.

Use freely for any purpose, including commercial, as long as you keep the copyright notice (MIT License).

In plain English

Privacy-First PII Redactor is a tool that sits between your application and an external AI service. Before your prompts leave your system, the tool scans them for private information, replaces each piece with a labeled placeholder like EMAIL_1 or CREDIT_CARD_1, and forwards the cleaned version. Optionally, it stores a mapping and restores the original values into whatever the AI sends back. The detection system combines three layers. Regular expressions catch clearly structured data: email addresses, phone numbers, credit card numbers (with Luhn validation), IBAN bank account numbers, social security numbers, IP addresses, and more. Microsoft Presidio (a Microsoft open-source library) and spaCy (a natural language processing library) add a second and third layer that can recognize names, organization names, and locations even without predictable formatting. When the three layers detect overlapping matches, the tool resolves conflicts by priority: regex wins over Presidio, which wins over spaCy. You can use this tool three ways. As a Python library, you import it and call redact() on any string, getting back both the cleaned text and a mapping of what was replaced. As a command-line tool, you can process files or strings from the terminal. As a REST API running locally via Docker, you send text to a /v1/redact endpoint and receive cleaned text in return. There is also an LLM proxy mode where the tool acts as an OpenAI-compatible endpoint: it redacts your request, forwards it to the actual AI provider, and puts original values back into the response before returning it to you. The README includes an important warning: detection is probabilistic. Names and physical addresses are harder to catch reliably than emails or credit card numbers. The tool reduces risk but does not guarantee compliance with privacy regulations like GDPR or HIPAA. The license is MIT, allowing free use including commercial applications.

Copy-paste prompts

Prompt 1
Use privacy-pii-redactor to strip emails, names, and credit card numbers from this customer support log before sending it to an AI: [paste log].
Prompt 2
Set up the privacy-pii-redactor Docker API on localhost:8000 and show me how to POST text to /v1/redact then restore the original values using the mapping_id.
Prompt 3
Configure privacy-pii-redactor as an OpenAI-compatible LLM proxy so all my app's ChatGPT calls automatically have PII stripped before leaving the server.
Prompt 4
Write a YAML config for privacy-pii-redactor that adds a custom recognizer for internal customer IDs matching the pattern CUS-[0-9]{6}.
Prompt 5
Use the pii-redactor CLI to scan a text file for PII and output a JSON detection report without modifying the file.

Frequently asked questions

What is privacy-pii-redactor?

A Python library and REST API that strips names, emails, credit card numbers, and other private data from text before it reaches an external AI, with optional mapping storage to restore values in the response.

What language is privacy-pii-redactor written in?

Mainly Python. The stack also includes Python, FastAPI, Microsoft Presidio.

What license does privacy-pii-redactor use?

Use freely for any purpose, including commercial, as long as you keep the copyright notice (MIT License).

How hard is privacy-pii-redactor to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is privacy-pii-redactor for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub one-million-lines on gitmyhub

Verify against the repo before relying on details.