Translate an English research PDF into Chinese Markdown for study
Extract figures and tables from a paper into an assets folder
Validate that a translated Markdown paper has all images and citations
Add a Codex skill that asks for tone and terminology before translating
Needs Codex installed plus command line tools pdfinfo, pdftotext, pdftocairo, pdfimages, and ImageMagick convert on PATH.
This repository is a skill for Codex, OpenAI's coding agent, that helps you turn an English academic PDF paper into a Markdown document in another language, most often Chinese. The README is written in Chinese with an English version linked. The author is clear that this is not a one-click machine translation pipeline. It is meant for people who actually want to read, study, or re-edit a paper, so the workflow asks questions and produces an editable result rather than dumping raw translated text. The skill keeps the structure of the original paper. It preserves section hierarchy, figure and table numbering, equation labels, citations, acknowledgements, and the reference list. Figures and complex tables that cannot be rebuilt as Markdown are cropped from the PDF pages and saved into an assets folder, while simpler tables and equations are rewritten as Markdown tables and LaTeX expressions. Before the actual translation starts, the skill asks for preferences: which target language or region, the paper's field, the tone and intended reader, terminology choices, and how to handle figures and tables. Installation is a copy or symlink of the skill folder into the Codex skills directory. You then invoke it by name in a Codex prompt, for example asking it to translate a PDF at a given path into Chinese Markdown. Two helper Python scripts ship with the repository: one extracts text, layout, and page images from the PDF and can crop figures based on a JSON spec, and the other validates that the final Markdown has all expected images, references, figure numbers, table numbers, and equation tags. The author gives a rough cost estimate of about 0.7 US dollars for a full pass on a 23 page paper, with the caveat that real costs vary by model, paper length, figure density, and retries. The Python scripts only need the standard library, but you are expected to have command line tools like pdfinfo, pdftotext, pdftocairo, pdfimages, and ImageMagick's convert installed. The repository has 20 stars and is written in Python.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.