explaingit

andrisgauracs/interfaze_ocr_viewer

16HTMLAudience · dataComplexity · 2/5ActiveSetup · easy

TLDR

Browser-based viewer that overlays OCR bounding boxes and confidence scores from interfaze.ai onto a PDF for visual review.

Mindmap

mindmap
  root((interfaze ocr viewer))
    Inputs
      PDF file
      OCR JSON
      Confidence ranges
    Outputs
      Overlay boxes
      Hover text
      Zoom view
    Use Cases
      Audit OCR output
      Spot low confidence
      Toggle line word view
    Tech Stack
      HTML
      JavaScript
      Python HTTP
      PDF.js

Things people build with this

USE CASE 1

Review interfaze.ai OCR output by overlaying bounding boxes on the original PDF.

USE CASE 2

Spot low-confidence words using the red, yellow, and green color coding.

USE CASE 3

Toggle between line-level and word-level boxes to inspect tokenization choices.

USE CASE 4

Match a custom OCR pipeline's JSON to the documented format and reuse this viewer.

Tech stack

HTMLJavaScriptPythonPDF.js

Getting it running

Difficulty · easy Time to first run · 5min

Local only; needs Python 3 to start the static server and an OCR JSON matching the documented interfaze.ai shape.

In plain English

This repo is a small viewer that runs in the browser and helps a person check the results of an OCR job. OCR, or optical character recognition, is the process where software looks at a PDF or image and tries to read the text from it. The viewer is built specifically for output from a service called interfaze.ai, but the layout it expects is documented in the README so other tools could match it. What the viewer does is straightforward. It renders a PDF page by page, then draws the OCR boxes on top of each page so the user can see exactly where the OCR software thought each line or word was located. Hovering over a box shows the recognized text and the confidence score. The boxes are color coded: green when the OCR was at least 70 percent confident, yellow between 40 and 70 percent, and red below 40 percent. There is a toggle to switch between showing boxes for whole lines and boxes for individual words, and a zoom control that goes from 50 to 300 percent. The README spells out the JSON format it expects. The top-level object has a sections array, with each section matching one page of the PDF. Each section has lines, each line has a bounds rectangle, an average confidence value, and an array of words with their own bounds and confidence. Page width, page height, and total pages live at the top level too. Running the viewer is local and minimal. The user clones the repo, opens a terminal in the folder, starts the built-in Python HTTP server with python3 -m http.server 8765, then opens http://localhost:8765 in any modern browser. From the page, you drag in or pick a PDF and its matching OCR JSON, click View document, and the overlay appears. A button in the toolbar lets you swap in a different file pair without restarting the server. The requirements are short: Python 3, which already ships with macOS, a modern browser like Chrome, Firefox, Safari, or Edge, and a PDF plus its OCR JSON from interfaze.ai.

Copy-paste prompts

Prompt 1
Clone interfaze_ocr_viewer, serve it with python3 -m http.server 8765, and load a PDF plus its OCR JSON.
Prompt 2
Convert my Tesseract OCR output into the JSON layout interfaze_ocr_viewer expects so I can reuse the viewer.
Prompt 3
Modify interfaze_ocr_viewer to add a CSV export of all words below 40 percent confidence.
Prompt 4
Change the confidence color thresholds in interfaze_ocr_viewer from 40 and 70 percent to 50 and 80.
Prompt 5
Walk through the sections, lines, words JSON shape interfaze_ocr_viewer expects and show a minimal one-page example.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.