explaingit

ninehills/pdf2md

36Go

TLDR

pdf2md is a command-line tool, written in Go, that converts PDF documents into Markdown text.

Mindmap

A visual breakdown will appear here once this repo is fully enriched.

In plain English

pdf2md is a command-line tool, written in Go, that converts PDF documents into Markdown text. It does this by running a vision language model, a kind of AI that reads images and outputs text, over each page of the PDF. The whole pipeline runs locally on your machine, with the model itself executing inside Docker containers that the tool starts for you. The project supports three different models you can pick between at the command line. The default is dots-ocr, a layout-aware OCR model from RedNote that runs on the vLLM inference engine. The second is logics-parsing-v2 from Alibaba, also on vLLM, which parses pages into structured HTML before converting to Markdown. The third is paddleocr-vl-1.5-gguf, which uses a two-stage pipeline: a separate ONNX container runs PaddleOCR's PP-DocLayoutV3 to detect bounding boxes and labels on each page, then a llama.cpp container runs the PaddleOCR-VL vision-language model on each cropped region, and the tool merges the results into Markdown and JSON. The binary is pure Go with no Python, no onnxruntime install, and no CUDA install required on the host. The only prerequisites are Docker and the nvidia-container-toolkit for GPU inference. Releases are precompiled for six platform combinations, Linux, macOS, and Windows on both amd64 and arm64. Building from source is the usual git clone and go build. Usage is straightforward: ./pdf2md paper.pdf converts a PDF with the default model, and flags let you pick the model, set the output directory, change the rendering DPI (default 200), set the concurrency level (default 16), pick which Docker images and ports to use, and point at a local model weights directory. Model weights are downloaded from Hugging Face if not already present. The code is organized into small focused Go packages under pkg/, including pdf for page rendering through go-fitz, docker for container management, inference for the VLM HTTP client, layoutclient for the ONNX container client, htmlmd for HTML to Markdown conversion, and markdown for assembling the final output. The README reports 78 tests across 13 packages.

Open on GitHub → Explain another repo

Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.