explaingit

funstory-ai/babeldoc

8,457PythonAudience · researcherComplexity · 2/5Setup · moderate

TLDR

Python tool that translates PDF documents into bilingual output with the original and translated text side by side, focused on academic papers and scientific content.

Mindmap

mindmap
  root((babeldoc))
    What it does
      Translate PDFs
      Bilingual output
      Preserve formulas
    Use modes
      Hosted service
      Self-hosted web UI
      Command-line tool
      Python library
    Content focus
      Scientific papers
      Academic documents
    Integrations
      Zotero plugin
      OpenAI API
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Translate an English scientific paper to Chinese while preserving layout, formulas, and technical notation, producing a bilingual PDF.

USE CASE 2

Process multiple PDF files in batch from the command line, restricting translation to specific page ranges.

USE CASE 3

Embed BabelDOC into a Python script or pipeline to automate PDF translation for a research workflow.

Tech stack

Python

Getting it running

Difficulty · moderate Time to first run · 30min

Requires an OpenAI API key or compatible LLM service to perform translations.

In plain English

BabelDOC is a Python library and command-line tool that translates PDF documents and produces bilingual output, meaning you get both the original and the translated text together in one file for side-by-side comparison. It focuses on scientific papers and academic documents, where preserving the layout and correctly handling formulas and technical notation matters. The primary translation direction is English to Chinese, though basic support for other language combinations is included. Translation requires access to a large language model API, such as OpenAI's GPT models or a compatible service. You supply the API key when running the tool, and BabelDOC handles parsing the PDF, sending the text to the model, and reassembling the output. You can use BabelDOC in three ways. A hosted online service at Immersive Translate provides a free quota of 1000 pages per month for straightforward use without any setup. A self-hosted option called PDFMathTranslate-next bundles BabelDOC with a web interface and supports a wider range of translation services. The command-line interface and Python API let you embed the library directly into your own programs or scripts. The command-line tool accepts one or more PDF files, a source language, and a target language. Options let you restrict translation to specific pages, control how the bilingual output is arranged (original and translated side by side, or on alternating pages), and manage watermarks on the output. The tool also integrates with Zotero, a popular academic reference manager, through third-party plugins. The README notes the CLI is primarily for debugging and that most end users are better served by the hosted service or the self-hosted web interface.

Copy-paste prompts

Prompt 1
I have a 20-page English physics paper in PDF format and want a bilingual English-Chinese version. Give me the BabelDOC CLI command using an OpenAI-compatible API key.
Prompt 2
How do I use BabelDOC to translate only pages 5 through 15 of a PDF and arrange the output so original and translated paragraphs alternate?
Prompt 3
I want to integrate BabelDOC into a Python script that watches a folder and auto-translates any new PDF dropped in. Write the basic script.
Prompt 4
BabelDOC has a Zotero plugin integration. How do I set it up so papers I save to Zotero get automatically translated?
Open on GitHub → Explain another repo

← funstory-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.