Convert a scanned textbook PDF into a hosted MkDocs Material site
Stitch a multi-part PDF book into one chapter tree
Filter QR codes from extracted images with the qr_filter plugin
Deploy the generated site to Cloudflare Pages via the cf_pages plugin
You need MinerU output or a separate MinerU CLI install plus MkDocs Material before minerupress can produce a site.
MineruPress is a Python tool that bridges two existing tools to turn long PDF books into publishable websites. The first tool, MinerU, takes a PDF and produces a JSON file plus a folder of images. MineruPress takes that output and shapes it into a clean folder of Markdown chapters and image assets for the second tool, MkDocs Material, which builds a static documentation style site you can host. The author pitches it for scanned textbooks, course handouts, internal manuals, and any other long PDF you want to migrate to a searchable site. Getting started is short. You install the pip package with the all extra, install MkDocs and the Material theme, then run minerupress init to scaffold a fresh book workspace. The workspace contains a book.yml for chapter configuration, an mkdocs.yml for the site, a Makefile of common commands, and a .env.example for sensitive variables. From there, if you already have MinerU results you drop them into resources/mineru and run minerupress export. If you only have a PDF, you set the source mode to the MinerU cloud API or to a local MinerU CLI you installed separately, and run minerupress fetch instead. The README is careful about the problems MineruPress is trying to solve. A long book often comes back from MinerU as a heap of JSON, images, and loose text. One logical book can be split across several PDF parts that still need to map onto the same chapter list. Chapter boundaries are derived from headings first, with hand written regex only as a fallback, so tables of contents, appendices, bilingual headings, and project style course materials get handled. Exports are idempotent: the Markdown and image folders are rebuilt each time so stale files do not bleed into a new build. Differences between books, like filtering QR code images or adding spaces between Chinese and English text, are handled by plugins rather than being hard coded. Three plugins ship with the tool: qr_filter uses OpenCV, cjk_spacing uses the pangu library, and cf_pages deploys the built site to Cloudflare Pages. You can write your own by subclassing ExportPlugin. The repo also ships an agent skill folder so AI agents can drive the same workflow, and the project is Apache License 2.0.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.