Self-host a serverless image watermark cleanup API on Modal
Remove stamped text from images you have permission to edit
Call a deployed cleanup endpoint from a browser image picker
Run a deterministic image edit pipeline with custom prompts and seeds
Requires a Hugging Face token in a .env file and a Modal account, and the first GPU call takes 30 to 60 seconds while weights download.
This repository is a small Python tool for removing visible overlays such as watermarks and stamped text from images. The author frames it carefully as an authorized cleanup tool, meaning it is meant for images you own or have permission to edit, not for stripping watermarks off other people's work. The README repeats this several times, and the command-line interface even requires you to pass an --authorized flag before it will run a cleanup. The license is MIT. Under the hood the image editing is done by a model from Alibaba called Qwen-Image-Edit-2511, accessed through the Hugging Face Diffusers library. The interesting part of the project is how it is deployed. Rather than asking the user to set up a GPU machine, the code is wrapped in a FastAPI service that runs on Modal, a serverless platform that gives the function an H100 GPU on demand. A configuration setting keeps one container warm so the model stays loaded between calls and responses come back quickly. The project offers two ways to use it. The first is a command-line tool installed as watermark. It has subcommands to sync a Hugging Face token into a Modal secret, run the API in development mode, deploy it as a warm service, and call a deployed endpoint with either a local image path or an image URL. A separate browser-images subcommand looks at an open browser surface and points out candidate images for cleanup. The second way is to call the deployed FastAPI service directly. The service exposes a health endpoint, a cleanup endpoint, and an edit endpoint that share the same handler. Requests pass the image as a base64 string along with a prompt, a seed, the number of inference steps, two guidance values, and an output format. The response returns the edited image as base64 plus model metadata and the latency. A few configuration details are spelled out in the README. The project expects a .env file containing a Hugging Face API token, and the same token can also be named HF_TOKEN or HUGGING_FACE_HUB_TOKEN if that is more familiar. Modal authentication uses modal setup. On the first GPU call the Qwen weights are downloaded into a Modal volume so later calls reuse the cache. The README quotes a first-call latency of thirty to sixty seconds and follow-up latency of five to fifteen seconds. Features listed include PNG, JPEG, and WebP input and output, deterministic results through a configurable seed, custom prompts beyond the default overlay cleanup wording, and the option to point at remote URLs as well as local files. The README closes with an ethical use note that restates the authorization requirement: do not use this to remove watermarks from copyrighted or third-party content without permission.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.