rednote-hilab/dots.ocr

★ 8,612PythonAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((dots.ocr))
    What it does
      Document parsing
      Table extraction
      Layout understanding
    Input types
      PDFs
      Scanned documents
      Web screenshots
      Natural scene photos
    Outputs
      Structured text
      SVG from diagrams
    Features
      Multi-language
      Local inference
      Open weights

mindmap root((dots.ocr)) What it does Document parsing Table extraction Layout understanding Input types PDFs Scanned documents Web screenshots Natural scene photos Outputs Structured text SVG from diagrams Features Multi-language Local inference Open weights

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Extract tables and headings from scanned PDFs into machine-readable output without a paid OCR API.

USE CASE 2

Parse web page screenshots or charts into clean text and SVG code for further processing.

USE CASE 3

Recognize text in natural scene photos where it appears on signs, products, or backgrounds.

USE CASE 4

Run document parsing locally to keep sensitive files off third-party servers using the downloaded model weights.

Tech stack

Python

Getting it running

Difficulty · moderate Time to first run · 30min

Model weights must be downloaded from HuggingFace before first use, GPU recommended for acceptable speed.

In plain English

dots.ocr is an AI model that reads documents and extracts their content in a structured way. Unlike simpler tools that just pull out raw text, it understands page layout: it can identify headings, tables, columns, and figures, and reproduce the document in a clean, machine-readable format. It was built by the AI research team at Xiaohongshu (the Chinese social media platform known as Little Red Book). The model handles a wide range of document types and can recognize scripts from many languages, not just Latin or Chinese text. It also goes beyond standard document parsing: it can take a chart or diagram and convert it into SVG code, parse web page screenshots, and spot text that appears in natural scenes rather than printed pages. This makes it more general than tools focused only on PDFs or scanned books. The project has gone through several versions. The original dots.ocr model was based on a relatively small 1.7 billion parameter language model, and the team later released dots.ocr-1.5 and then rebranded it as dots.mocr. The model weights are hosted on HuggingFace and can be downloaded for local use. A live demo is available on the project's website so you can test it without any setup. The README includes detailed benchmark comparisons against other document-parsing systems, showing how dots.mocr scores on standardized tests for academic paper parsing, table recognition, multi-column layouts, old scanned documents, and more. The numbers place it among the higher-performing models of its size class on most of these tests, though very large commercial models still score higher on some benchmarks. If you work with a lot of documents, PDFs, or scanned files and need to extract their content programmatically, this project offers a local, open-weight model you can run without sending data to a third-party API.

Copy-paste prompts

Prompt 1

Show me how to download the dots.mocr model from HuggingFace and run it on a PDF to extract its text and tables as structured output.

Prompt 2

Write Python code using dots.ocr to process a folder of scanned invoices and output a CSV with the extracted fields.

Prompt 3

How does dots.ocr handle multi-column layouts, show me an example parsing a two-column academic paper.

Prompt 4

Set up dots.ocr in a Python script that watches a folder and parses any new PDF dropped into it automatically.

Prompt 5

Compare dots.mocr to Tesseract for a scanned document with mixed English and Chinese text, when should I pick each?

Open on GitHub → Explain another repo

← rednote-hilab on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.