paddlepaddle/paddleocr

★ 77,178PythonAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((PaddleOCR))
    What it does
      Text from images
      Full-page PDF parsing
      Table and chart detection
    Features
      100 plus languages
      PP-StructureV3
      Markdown and JSON output
    Integration
      RAGFlow
      Dify
      LLM pipelines
    Hardware
      CPU
      NVIDIA GPU
      AI accelerators

mindmap root((PaddleOCR)) What it does Text from images Full-page PDF parsing Table and chart detection Features 100 plus languages PP-StructureV3 Markdown and JSON output Integration RAGFlow Dify LLM pipelines Hardware CPU NVIDIA GPU AI accelerators

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Extract text from scanned invoices, ID cards, or photographed documents across 100+ languages.

USE CASE 2

Parse full PDF pages into structured Markdown or JSON for feeding into an LLM or RAG pipeline.

USE CASE 3

Automate data extraction from tables and forms in scanned documents for downstream processing.

USE CASE 4

Build a document search engine by OCR-indexing a large archive of scanned PDFs.

Tech stack

PythonPaddlePaddle

Getting it running

Difficulty · moderate Time to first run · 30min

Requires PaddlePaddle installed via pip, GPU support requires a CUDA-compatible GPU and matching PaddlePaddle GPU build.

In plain English

PaddleOCR is an open-source OCR (Optical Character Recognition) toolkit developed by Baidu's PaddlePaddle AI platform. OCR is the technology that reads text from images, scanned documents, and PDFs, converting visual text into machine-readable data. The problem PaddleOCR addresses is that many real-world documents (invoices, ID cards, books, street signs, handwritten notes) exist as images or PDFs, not as structured text, making them inaccessible to software that needs to process or search the content. The toolkit does more than just read individual characters. Its document parsing pipeline, called PP-StructureV3, can analyze a full page: detect text blocks, tables, charts, figures, and headers, then output the entire document as structured Markdown or JSON, formats that AI systems like LLMs (large language models) can directly consume. A vision-language model called PaddleOCR-VL-1.5 handles complex real-world documents that are skewed, poorly lit, warped, or photographed from a screen rather than scanned cleanly. The system supports over 100 languages including Chinese, Japanese, Arabic, and mixed multilingual documents. It's designed for both research and production: it can run on CPUs, NVIDIA GPUs, and specialized AI accelerators, and has been integrated into popular AI frameworks like Dify and RAGFlow (tools for building AI pipelines with document retrieval). You would use PaddleOCR when you need to extract text from documents at scale, process PDFs for AI systems, build a document search engine, automate data extraction from forms, or create RAG (Retrieval-Augmented Generation) pipelines that need to search through document archives. The tech stack is Python, built on the PaddlePaddle deep learning framework. It runs on Linux, Windows, and macOS, supports multiple hardware backends, and is installed via pip.

Copy-paste prompts

Prompt 1

Help me run PaddleOCR on a folder of scanned invoice images to extract all text and export the results as a CSV file.

Prompt 2

Set up PP-StructureV3 in PaddleOCR to parse a multi-page PDF into Markdown, preserving tables and headings for use in an LLM pipeline.

Prompt 3

Show me how to integrate PaddleOCR into a RAGFlow pipeline so scanned PDFs are parsed and indexed for AI question-answering.

Prompt 4

Write a Python script using PaddleOCR to process a batch of images, detect text regions, and output bounding boxes and recognized text to a JSON file.

Open on GitHub → Explain another repo

← paddlepaddle on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.