Analysis updated 2026-06-20
Capture any area of your screen with a hotkey and instantly copy the text from apps that don't allow text selection.
Batch-process a folder of scanned images and export all extracted text as plain text, CSV, or Markdown files.
Convert scanned PDFs into searchable dual-layer PDFs where the original page image is preserved with an invisible text layer.
Read QR codes and barcodes from images, supporting 19 protocols including QR Code, EAN, and Data Matrix.
| hiroi-sora/umi-ocr | zhayujie/cowagent | safishamsi/graphify | |
|---|---|---|---|
| Stars | 43,964 | 44,075 | 43,819 |
| Language | Python | Python | Python |
| Setup difficulty | easy | hard | hard |
| Complexity | 2/5 | 4/5 | 3/5 |
| Audience | general | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Umi-OCR is a free, open-source, fully offline OCR (Optical Character Recognition) tool that extracts text from images and documents without sending anything to the internet. OCR is the technology that reads text embedded in a picture, for example, turning a screenshot of a webpage into editable text, or extracting data from a scanned form. Most commercial OCR services require uploading your files to a remote server, which raises both cost and privacy concerns. Umi-OCR solves this by running entirely on your own machine. The tool offers four main modes. Screenshot OCR lets you capture any area of your screen with a hotkey and immediately get the text, making it ideal for copying from windows or applications that don't allow text selection. Batch OCR processes folders of image files (JPEG, PNG, WEBP, TIFF, and others) in bulk, outputting results as plain text, Markdown, CSV, or JSONL. Document OCR handles PDF, EPUB, XPS, and other document formats, optionally generating a searchable dual-layer PDF where the original page image is preserved with an invisible text layer beneath it. A QR/barcode feature reads or generates codes from images, supporting 19 protocols including QR Code, EAN, PDF417, and Data Matrix. A particularly practical feature is the "ignore zone", when batch-processing images that all share the same watermark or header/footer position, you draw rectangles over those areas and Umi-OCR automatically discards text found there without affecting the rest of the page. The application runs offline on Windows and Linux, requires no installation (just extract and launch), and auto-detects your system language. Internally it uses PaddleOCR or RapidOCR as the recognition engine, with support for multiple languages. The UI is built with Qt/QML and the back end is Python. An HTTP API and command-line interface are available for integrating Umi-OCR into automated workflows or scripts.
Free, fully offline OCR tool that extracts text from images, screenshots, PDFs, and scanned documents on your own machine, no internet connection required.
Mainly Python. The stack also includes Python, PaddleOCR, RapidOCR.
Free to use, modify, and distribute for any purpose including commercial use.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly general.
This repo across BitVibe Labs
Verify against the repo before relying on details.