explaingit

aronnaxlin/minerupress

23PythonAudience · writerComplexity · 3/5ActiveLicenseSetup · moderate

TLDR

Python CLI that turns MinerU PDF parsing output into a clean Markdown and image tree ready for MkDocs Material to publish as a searchable static site.

Mindmap

mindmap
  root((minerupress))
    Inputs
      PDF files
      MinerU JSON
      MinerU images
    Outputs
      Markdown chapters
      Image assets
      Static site
    Use Cases
      Publish scanned textbooks
      Migrate manuals to web
      Build course sites
    Tech Stack
      Python
      MkDocs Material
      MinerU
      OpenCV

Things people build with this

USE CASE 1

Convert a scanned textbook PDF into a hosted MkDocs Material site

USE CASE 2

Stitch a multi-part PDF book into one chapter tree

USE CASE 3

Filter QR codes from extracted images with the qr_filter plugin

USE CASE 4

Deploy the generated site to Cloudflare Pages via the cf_pages plugin

Tech stack

PythonMkDocsMinerUOpenCV

Getting it running

Difficulty · moderate Time to first run · 1h+

You need MinerU output or a separate MinerU CLI install plus MkDocs Material before minerupress can produce a site.

Apache 2.0, a permissive license that lets you use, modify, and distribute the code commercially as long as you preserve copyright and license notices.

In plain English

MineruPress is a Python tool that bridges two existing tools to turn long PDF books into publishable websites. The first tool, MinerU, takes a PDF and produces a JSON file plus a folder of images. MineruPress takes that output and shapes it into a clean folder of Markdown chapters and image assets for the second tool, MkDocs Material, which builds a static documentation style site you can host. The author pitches it for scanned textbooks, course handouts, internal manuals, and any other long PDF you want to migrate to a searchable site. Getting started is short. You install the pip package with the all extra, install MkDocs and the Material theme, then run minerupress init to scaffold a fresh book workspace. The workspace contains a book.yml for chapter configuration, an mkdocs.yml for the site, a Makefile of common commands, and a .env.example for sensitive variables. From there, if you already have MinerU results you drop them into resources/mineru and run minerupress export. If you only have a PDF, you set the source mode to the MinerU cloud API or to a local MinerU CLI you installed separately, and run minerupress fetch instead. The README is careful about the problems MineruPress is trying to solve. A long book often comes back from MinerU as a heap of JSON, images, and loose text. One logical book can be split across several PDF parts that still need to map onto the same chapter list. Chapter boundaries are derived from headings first, with hand written regex only as a fallback, so tables of contents, appendices, bilingual headings, and project style course materials get handled. Exports are idempotent: the Markdown and image folders are rebuilt each time so stale files do not bleed into a new build. Differences between books, like filtering QR code images or adding spaces between Chinese and English text, are handled by plugins rather than being hard coded. Three plugins ship with the tool: qr_filter uses OpenCV, cjk_spacing uses the pangu library, and cf_pages deploys the built site to Cloudflare Pages. You can write your own by subclassing ExportPlugin. The repo also ships an agent skill folder so AI agents can drive the same workflow, and the project is Apache License 2.0.

Copy-paste prompts

Prompt 1
Run minerupress init for a new book workspace and explain the book.yml and mkdocs.yml it creates
Prompt 2
Take this PDF and use minerupress fetch with the MinerU cloud API mode to produce a Markdown tree
Prompt 3
Write a custom ExportPlugin for minerupress that strips page-number footers from each chapter
Prompt 4
Configure the cjk_spacing plugin in minerupress for a Chinese textbook with inline English terms
Prompt 5
Deploy my minerupress output to Cloudflare Pages using the cf_pages plugin
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.