Convert academic papers, legal documents, or manuals from PDF to HTML so they display in any browser without a PDF plugin.
Embed existing PDF content into a website as HTML, preserving the original multi-column layout, equations, and fonts.
Process documents with Chinese or Japanese characters, complex equations, or multi-column magazine layouts into searchable web pages.
Requires building from source with Poppler and FontForge as system dependencies, build instructions are on the project wiki, not the README.
pdf2htmlEX is a command-line tool that converts PDF files into HTML pages while preserving the original text, fonts, and layout. Unlike basic PDF-to-text converters that strip out formatting, this tool produces HTML output that looks nearly identical to the original document: text stays positioned correctly on the page, fonts are embedded, and visual elements like figures and mathematical formulas are retained. The HTML it generates uses standard web technologies, so the result opens in any browser without plugins. You can produce a single self-contained HTML file or a version that loads pages on demand, which allows large documents to start displaying before the entire file has downloaded. The output file size is often comparable to the original PDF, sometimes smaller. The tool handles a range of document types that are normally difficult to convert: academic papers with complex equations and column layouts, magazines with multi-column formatting, documents containing Chinese and Japanese characters, and files with unusual fonts. Demos linked from the README show converted versions of a 16th-century Bible, a LaTeX cheat sheet, a scientific paper, and a Linux magazine issue. The project depends on two other open-source tools, Poppler for reading PDF files and FontForge for handling fonts. The README notes the project is no longer under active development and has been seeking a new maintainer since 2016. Download and build instructions are on the project wiki rather than in the README itself. The license is GPLv3 for the overall package, with some components released under looser terms.
← coolwanglu on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.