explaingit

hiroi-sora/umi-ocr

44,291PythonAudience · vibe coderComplexity · 3/5QuietLicenseSetup · moderate

TLDR

Free, offline OCR tool that extracts text from images and documents on your computer without uploading anything to the internet.

Mindmap

mindmap
  root((repo))
    What it does
      Screenshot OCR
      Batch image processing
      Document OCR
      QR/barcode reading
    Key features
      Ignore zones
      Multiple output formats
      Searchable PDFs
      19 barcode protocols
    How to use
      Windows and Linux
      No installation needed
      Hotkey capture
      HTTP API and CLI
    Tech stack
      Python backend
      Qt and QML UI
      PaddleOCR engine
      RapidOCR support

Things people build with this

USE CASE 1

Quickly copy text from screenshots or windows that don't allow text selection using a hotkey.

USE CASE 2

Process hundreds of scanned images or photos in bulk to extract and organize text into CSV or JSON files.

USE CASE 3

Convert PDF documents into searchable PDFs with an invisible text layer while keeping the original page images.

USE CASE 4

Extract data from QR codes, barcodes, and other machine-readable codes in images without external services.

Tech stack

PythonQtQMLPaddleOCRRapidOCR

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Python environment setup and downloading OCR model files on first run.

Free and open-source; you can use, modify, and distribute it freely.

In plain English

Umi-OCR is a free, open-source, fully offline OCR (Optical Character Recognition) tool that extracts text from images and documents without sending anything to the internet. OCR is the technology that reads text embedded in a picture, for example, turning a screenshot of a webpage into editable text, or extracting data from a scanned form. Most commercial OCR services require uploading your files to a remote server, which raises both cost and privacy concerns. Umi-OCR solves this by running entirely on your own machine. The tool offers four main modes. Screenshot OCR lets you capture any area of your screen with a hotkey and immediately get the text, making it ideal for copying from windows or applications that don't allow text selection. Batch OCR processes folders of image files (JPEG, PNG, WEBP, TIFF, and others) in bulk, outputting results as plain text, Markdown, CSV, or JSONL. Document OCR handles PDF, EPUB, XPS, and other document formats, optionally generating a searchable dual-layer PDF where the original page image is preserved with an invisible text layer beneath it. A QR/barcode feature reads or generates codes from images, supporting 19 protocols including QR Code, EAN, PDF417, and Data Matrix. A particularly practical feature is the "ignore zone", when batch-processing images that all share the same watermark or header/footer position, you draw rectangles over those areas and Umi-OCR automatically discards text found there without affecting the rest of the page. The application runs offline on Windows and Linux, requires no installation (just extract and launch), and auto-detects your system language. Internally it uses PaddleOCR or RapidOCR as the recognition engine, with support for multiple languages. The UI is built with Qt/QML and the back end is Python. An HTTP API and command-line interface are available for integrating Umi-OCR into automated workflows or scripts.

Copy-paste prompts

Prompt 1
How do I set up Umi-OCR on Windows and use the screenshot hotkey to capture text from my screen?
Prompt 2
Show me how to batch process a folder of 100 PNG images with Umi-OCR and export the results as a CSV file.
Prompt 3
How do I use the ignore zone feature to remove watermarks or headers from all images in a batch job?
Prompt 4
Can I integrate Umi-OCR into my Python script using the HTTP API? Show me an example.
Prompt 5
How do I convert a PDF into a searchable PDF with Umi-OCR while keeping the original page layout?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.