explaingit

hiroi-sora/umi-ocr

Analysis updated 2026-06-20

43,964PythonAudience · generalComplexity · 2/5LicenseSetup · easy

TLDR

Free, fully offline OCR tool that extracts text from images, screenshots, PDFs, and scanned documents on your own machine, no internet connection required.

Mindmap

mindmap
  root((Umi-OCR))
    What it does
      Screenshot OCR
      Batch image OCR
      Document OCR
      QR barcode reader
    Tech stack
      Python backend
      PaddleOCR engine
      Qt QML UI
    Use cases
      Copy locked text
      Bulk image export
      Searchable PDFs
    Features
      Fully offline
      Ignore zones
      HTTP API
      CLI support
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Capture any area of your screen with a hotkey and instantly copy the text from apps that don't allow text selection.

USE CASE 2

Batch-process a folder of scanned images and export all extracted text as plain text, CSV, or Markdown files.

USE CASE 3

Convert scanned PDFs into searchable dual-layer PDFs where the original page image is preserved with an invisible text layer.

USE CASE 4

Read QR codes and barcodes from images, supporting 19 protocols including QR Code, EAN, and Data Matrix.

What is it built with?

PythonPaddleOCRRapidOCRQtQML

How does it compare?

hiroi-sora/umi-ocrzhayujie/cowagentsafishamsi/graphify
Stars43,96444,07543,819
LanguagePythonPythonPython
Setup difficultyeasyhardhard
Complexity2/54/53/5
Audiencegeneraldeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min
Free to use, modify, and distribute for any purpose including commercial use.

In plain English

Umi-OCR is a free, open-source, fully offline OCR (Optical Character Recognition) tool that extracts text from images and documents without sending anything to the internet. OCR is the technology that reads text embedded in a picture, for example, turning a screenshot of a webpage into editable text, or extracting data from a scanned form. Most commercial OCR services require uploading your files to a remote server, which raises both cost and privacy concerns. Umi-OCR solves this by running entirely on your own machine. The tool offers four main modes. Screenshot OCR lets you capture any area of your screen with a hotkey and immediately get the text, making it ideal for copying from windows or applications that don't allow text selection. Batch OCR processes folders of image files (JPEG, PNG, WEBP, TIFF, and others) in bulk, outputting results as plain text, Markdown, CSV, or JSONL. Document OCR handles PDF, EPUB, XPS, and other document formats, optionally generating a searchable dual-layer PDF where the original page image is preserved with an invisible text layer beneath it. A QR/barcode feature reads or generates codes from images, supporting 19 protocols including QR Code, EAN, PDF417, and Data Matrix. A particularly practical feature is the "ignore zone", when batch-processing images that all share the same watermark or header/footer position, you draw rectangles over those areas and Umi-OCR automatically discards text found there without affecting the rest of the page. The application runs offline on Windows and Linux, requires no installation (just extract and launch), and auto-detects your system language. Internally it uses PaddleOCR or RapidOCR as the recognition engine, with support for multiple languages. The UI is built with Qt/QML and the back end is Python. An HTTP API and command-line interface are available for integrating Umi-OCR into automated workflows or scripts.

Copy-paste prompts

Prompt 1
I have a folder of scanned invoice images. How do I configure Umi-OCR to batch-process them and export the results as CSV?
Prompt 2
Show me how to call the Umi-OCR HTTP API from a Python script to extract text from images in an automated workflow.
Prompt 3
How do I set up ignore zones in Umi-OCR to automatically skip the watermark region on all my batch-processed images?
Prompt 4
How do I use Umi-OCR to create a searchable PDF from a stack of scanned document pages?
Prompt 5
What languages does Umi-OCR support for text recognition and how do I switch between them?

Frequently asked questions

What is umi-ocr?

Free, fully offline OCR tool that extracts text from images, screenshots, PDFs, and scanned documents on your own machine, no internet connection required.

What language is umi-ocr written in?

Mainly Python. The stack also includes Python, PaddleOCR, RapidOCR.

What license does umi-ocr use?

Free to use, modify, and distribute for any purpose including commercial use.

How hard is umi-ocr to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is umi-ocr for?

Mainly general.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub hiroi-sora on gitmyhub

Verify against the repo before relying on details.