explaingit

ninehills/pdf2md

Analysis updated 2026-06-24

36GoAudience · developerComplexity · 4/5Setup · hard

TLDR

Command-line tool in Go that converts PDF documents to Markdown by running vision language models locally inside Docker containers it spins up.

Mindmap

mindmap
  root((pdf2md))
    Inputs
      PDF file
      Model flag
      DPI setting
      Output directory
    Outputs
      Markdown text
      JSON structure
    Use Cases
      OCR scanned papers
      Convert reports to MD
      Extract structured docs
    Tech Stack
      Go
      Docker
      vLLM
      llama.cpp
      ONNX
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Batch-convert academic PDFs to Markdown for a knowledge base

USE CASE 2

Run layout-aware OCR on scanned documents locally without cloud APIs

USE CASE 3

Extract structured HTML or Markdown from mixed-content reports

What is it built with?

GoDockervLLMllama.cppONNXCUDA

How does it compare?

ninehills/pdf2md732124645/promptopsaasixh/devgrep
Stars363127
LanguageGoGoGo
Setup difficultyhardeasyeasy
Complexity4/53/52/5
Audiencedeveloperdeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires Docker plus nvidia-container-toolkit and a GPU for the VLM inference containers.

In plain English

pdf2md is a command-line tool, written in Go, that converts PDF documents into Markdown text. It does this by running a vision language model, a kind of AI that reads images and outputs text, over each page of the PDF. The whole pipeline runs locally on your machine, with the model itself executing inside Docker containers that the tool starts for you. The project supports three different models you can pick between at the command line. The default is dots-ocr, a layout-aware OCR model from RedNote that runs on the vLLM inference engine. The second is logics-parsing-v2 from Alibaba, also on vLLM, which parses pages into structured HTML before converting to Markdown. The third is paddleocr-vl-1.5-gguf, which uses a two-stage pipeline: a separate ONNX container runs PaddleOCR's PP-DocLayoutV3 to detect bounding boxes and labels on each page, then a llama.cpp container runs the PaddleOCR-VL vision-language model on each cropped region, and the tool merges the results into Markdown and JSON. The binary is pure Go with no Python, no onnxruntime install, and no CUDA install required on the host. The only prerequisites are Docker and the nvidia-container-toolkit for GPU inference. Releases are precompiled for six platform combinations, Linux, macOS, and Windows on both amd64 and arm64. Building from source is the usual git clone and go build. Usage is straightforward: ./pdf2md paper.pdf converts a PDF with the default model, and flags let you pick the model, set the output directory, change the rendering DPI (default 200), set the concurrency level (default 16), pick which Docker images and ports to use, and point at a local model weights directory. Model weights are downloaded from Hugging Face if not already present. The code is organized into small focused Go packages under pkg/, including pdf for page rendering through go-fitz, docker for container management, inference for the VLM HTTP client, layoutclient for the ONNX container client, htmlmd for HTML to Markdown conversion, and markdown for assembling the final output. The README reports 78 tests across 13 packages.

Copy-paste prompts

Prompt 1
Show me how to run pdf2md on a single PDF with the default dots-ocr model and a custom output directory
Prompt 2
Compare the three pdf2md models dots-ocr, logics-parsing-v2, and paddleocr-vl-1.5-gguf for accuracy and speed
Prompt 3
Walk me through the two-stage PaddleOCR-VL pipeline in pdf2md and where the ONNX container fits in
Prompt 4
Build a wrapper script around pdf2md that processes a folder of PDFs in parallel

Frequently asked questions

What is pdf2md?

Command-line tool in Go that converts PDF documents to Markdown by running vision language models locally inside Docker containers it spins up.

What language is pdf2md written in?

Mainly Go. The stack also includes Go, Docker, vLLM.

How hard is pdf2md to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is pdf2md for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.