explaingit

douglasmun/aws-cdr-gateway

Analysis updated 2026-05-18

4PythonAudience · ops devopsComplexity · 4/5Setup · hard

TLDR

A serverless AWS pipeline that strips macros, scripts, and active content from Office files, PDFs, and images before they reach your application. Also runs as a local HTTP service with no cloud account needed.

Mindmap

mindmap
  root((aws-cdr-gateway))
    What it does
      Strip macros scripts
      Clean Office PDFs images
      Quarantine unknown files
      Fail-closed policy
    Deployment
      AWS Lambda SAM
      Terraform option
      Local FastAPI mode
      Docker sidecar
    File Formats
      Office OOXML variants
      PDF pikepdf
      Images via Pillow
      Legacy OLE quarantine
    Security
      Decompression bomb guard
      ReDoS hardening
      Fault isolation
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Automatically clean uploaded files in an S3-backed application to remove macros and scripts before storing them.

USE CASE 2

Run a local HTTP file-sanitizing sidecar next to a web app that accepts uploads, in any programming language.

USE CASE 3

Quarantine uploaded files that cannot be proven safe using a fail-closed AWS pipeline.

USE CASE 4

Sanitize PDF and Office uploads in CI or on-premises without requiring AWS credentials.

What is it built with?

PythonAWS LambdaFastAPIEventBridgeS3pikepdfPillowTerraform

How does it compare?

douglasmun/aws-cdr-gatewayadeliox/klein-head-swapats4321/ragit
Stars444
LanguagePythonPythonPython
Setup difficultyhardmoderatemoderate
Complexity4/53/52/5
Audienceops devopsdesignerdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires live AWS credentials and SAM CLI or Terraform to deploy the cloud pipeline, the local FastAPI service runs without AWS and starts in minutes.

In plain English

This tool automatically cleans uploaded files before your application trusts them. It strips potentially dangerous active content from Office documents, PDFs, and images, then routes the cleaned version to a safe storage location or sends the original to a quarantine bucket if it cannot be made safe. The process is called Content Disarmament and Reconstruction, or CDR. The cloud version runs as a serverless AWS function. When someone uploads a file to an S3 storage bucket, it triggers this pipeline automatically. Word documents, Excel spreadsheets, PDFs, and common image formats all pass through specific cleaning routines. For Office files, it removes macros, embedded scripts, external data connections, and other components that could execute code when someone opens the document. For PDFs, it strips JavaScript, auto-open actions, embedded files, and form actions. For images, it re-encodes the file from scratch to remove metadata that could carry malicious content. Files in older formats like the original .doc.xls, and .ppt are sent straight to quarantine because their internal structure makes safe reconstruction too risky. The tool follows a fail-closed principle: anything it cannot confirm as safe goes to quarantine. It never labels a file as sanitized unless the cleaning has been positively verified. A second mode runs the same cleaning engine as a plain HTTP service on your own computer or server, with no AWS account required. You send it a file via a web request and it returns the cleaned version. This makes it useful as a sidecar next to any web application that accepts uploads, regardless of what programming language the main application uses. The repository includes thorough test coverage (227 tests), deployment guides for AWS using either SAM or Terraform, Docker and Kubernetes deployment instructions for the local service, and documentation comparing its coverage against other known file security tools.

Copy-paste prompts

Prompt 1
Deploy aws-cdr-gateway to AWS using SAM CLI pointing at my S3 bucket so uploaded Office files are automatically stripped of macros before storage.
Prompt 2
Run the aws-cdr-gateway local CDR service with Docker and POST a PDF to it to strip JavaScript and auto-open actions.
Prompt 3
Set up aws-cdr-gateway as a Docker sidecar in a Node.js upload service so every file is disarmed before it reaches my database.
Prompt 4
Explain what active content aws-cdr-gateway removes from an Excel .xlsx file and show the test that verifies VBA macro removal.
Prompt 5
Deploy aws-cdr-gateway using Terraform instead of SAM and configure the EventBridge trigger for the S3 CDR Lambda.

Frequently asked questions

What is aws-cdr-gateway?

A serverless AWS pipeline that strips macros, scripts, and active content from Office files, PDFs, and images before they reach your application. Also runs as a local HTTP service with no cloud account needed.

What language is aws-cdr-gateway written in?

Mainly Python. The stack also includes Python, AWS Lambda, FastAPI.

How hard is aws-cdr-gateway to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is aws-cdr-gateway for?

Mainly ops devops.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub douglasmun on gitmyhub

Verify against the repo before relying on details.