liumengxuan04/translate-paper-pdf-to-md

Analysis updated 2026-06-24

★ 20PythonAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((translate-paper-pdf-to-md))
    Inputs
      English PDF paper
      Translation preferences
      Optional crop spec JSON
    Outputs
      Target language Markdown
      Cropped figure assets
      Validation report
    Use Cases
      Read foreign papers
      Localize research notes
      Re-edit translated drafts
    Tech Stack
      Python
      Codex
      pdftotext
      ImageMagick

mindmap root((translate-paper-pdf-to-md)) Inputs English PDF paper Translation preferences Optional crop spec JSON Outputs Target language Markdown Cropped figure assets Validation report Use Cases Read foreign papers Localize research notes Re-edit translated drafts Tech Stack Python Codex pdftotext ImageMagick

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Translate an English research PDF into Chinese Markdown for study

USE CASE 2

Extract figures and tables from a paper into an assets folder

USE CASE 3

Validate that a translated Markdown paper has all images and citations

USE CASE 4

Add a Codex skill that asks for tone and terminology before translating

What is it built with?

PythonCodexpdftotextpdftocairopdfimagesImageMagick

How does it compare?

	liumengxuan04/translate-paper-pdf-to-md	chloeqxq/macd	demiurg92/design-continuity-guard
Stars	20	20	20
Language	Python	Python	Python
Setup difficulty	moderate	hard	easy
Complexity	3/5	4/5	1/5
Audience	researcher	researcher	vibe coder

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Needs Codex installed plus command line tools pdfinfo, pdftotext, pdftocairo, pdfimages, and ImageMagick convert on PATH.

In plain English

This repository is a skill for Codex, OpenAI's coding agent, that helps you turn an English academic PDF paper into a Markdown document in another language, most often Chinese. The README is written in Chinese with an English version linked. The author is clear that this is not a one-click machine translation pipeline. It is meant for people who actually want to read, study, or re-edit a paper, so the workflow asks questions and produces an editable result rather than dumping raw translated text. The skill keeps the structure of the original paper. It preserves section hierarchy, figure and table numbering, equation labels, citations, acknowledgements, and the reference list. Figures and complex tables that cannot be rebuilt as Markdown are cropped from the PDF pages and saved into an assets folder, while simpler tables and equations are rewritten as Markdown tables and LaTeX expressions. Before the actual translation starts, the skill asks for preferences: which target language or region, the paper's field, the tone and intended reader, terminology choices, and how to handle figures and tables. Installation is a copy or symlink of the skill folder into the Codex skills directory. You then invoke it by name in a Codex prompt, for example asking it to translate a PDF at a given path into Chinese Markdown. Two helper Python scripts ship with the repository: one extracts text, layout, and page images from the PDF and can crop figures based on a JSON spec, and the other validates that the final Markdown has all expected images, references, figure numbers, table numbers, and equation tags. The author gives a rough cost estimate of about 0.7 US dollars for a full pass on a 23 page paper, with the caveat that real costs vary by model, paper length, figure density, and retries. The Python scripts only need the standard library, but you are expected to have command line tools like pdfinfo, pdftotext, pdftocairo, pdfimages, and ImageMagick's convert installed. The repository has 20 stars and is written in Python.

Copy-paste prompts

Prompt 1

Install the translate-paper-pdf-to-md skill into my Codex skills directory and show me how to invoke it on a PDF

Prompt 2

Use the translate-paper-pdf-to-md skill to convert paper.pdf into formal academic Chinese Markdown for a distributed systems audience

Prompt 3

Run extract_pdf_assets.py on this PDF with a crop spec for figures 1 to 4 and tables 1 to 2

Prompt 4

Validate paper_zh.md with validate_markdown_assets.py and list any missing image links or references

Prompt 5

Adapt this skill to target Japanese Markdown output while keeping the same figure cropping workflow

Frequently asked questions

What is translate-paper-pdf-to-md?

A Codex skill that translates English academic PDFs into target-language Markdown while preserving sections, figures, tables, equations, and references.

What language is translate-paper-pdf-to-md written in?

Mainly Python. The stack also includes Python, Codex, pdftotext.

How hard is translate-paper-pdf-to-md to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is translate-paper-pdf-to-md for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.