Extract structured data from scanned invoices, contracts, or academic papers into JSON automatically.
Build a document processing pipeline that converts photographed pages into clean Markdown for further analysis.
Parse mixed documents containing text, tables, and math formulas into machine-readable format.
Run fast parallel processing on large batches of digital PDFs to extract their layout and content.
Requires a GPU and downloading pre-trained model weights from Hugging Face before running inference.
Dolphin is an AI model from ByteDance that reads document images and converts them into structured, machine-readable output. If you have a PDF, a scanned page, or a photo of a document, Dolphin can analyze it and produce a clean representation of the text, tables, formulas, code blocks, and layout, including the correct reading order. The problem it solves is that documents come in many forms: some are digital files where the text is already embedded, others are photographs or scans where the content only exists as pixels. Previous tools often handled only one type well. Dolphin-v2 (the current version) first classifies what kind of document it is looking at, then applies a different parsing strategy depending on that classification. Photographed documents get processed as a whole, while digital documents are broken into elements and parsed in parallel, which is faster. The model can identify up to 21 types of document elements, extract attribute fields, handle mathematical formulas and code, and output results as JSON or Markdown. It was accepted as a paper at ACL 2025, a major natural language processing research conference. For developers wanting to run it, setup involves cloning the repository, installing Python dependencies, and downloading the pre-trained model weights from Hugging Face. Inference can be run on single images, entire directories, or PDF files. There is also support for faster inference using vLLM and TensorRT-LLM, which are tools for accelerating model serving. This is a research model and developer tool, not a finished consumer product. It is most useful for teams building document processing pipelines, such as extracting structured data from invoices, academic papers, contracts, or scanned records.
← bytedance on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.