getomni-ai/zerox

★ 12,230TypeScriptAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((zerox))
    What it does
      PDF to Markdown
      Structured extraction
      Image-based OCR
    AI Providers
      OpenAI
      Azure OpenAI
      AWS Bedrock
      Google Gemini
    Interfaces
      Node.js package
      Python package
    Use Cases
      Invoice parsing
      Research papers
      Table extraction

mindmap root((zerox)) What it does PDF to Markdown Structured extraction Image-based OCR AI Providers OpenAI Azure OpenAI AWS Bedrock Google Gemini Interfaces Node.js package Python package Use Cases Invoice parsing Research papers Table extraction

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Convert a batch of PDF invoices into structured JSON by giving Zerox a schema of the fields you want extracted.

USE CASE 2

Turn PDF research papers or reports into Markdown so you can feed them into an AI chatbot or search index.

USE CASE 3

Process scanned documents with complex tables into clean Markdown that preserves the table structure accurately.

USE CASE 4

Run parallel page processing on large documents to speed up conversion of multi-hundred-page PDFs.

Tech stack

TypeScriptPythonNode.js

Getting it running

Difficulty · moderate Time to first run · 30min

Requires GraphicsMagick and Ghostscript (Node.js version) or Poppler (Python version), plus an API key for an AI provider.

In plain English

Zerox is a library for turning documents into text that AI systems can easily read and work with. The core problem it solves is that PDFs and other document formats often have complex layouts, tables, charts, and mixed content that traditional text extraction tools struggle to handle accurately. Zerox gets around this by converting each page of a document into an image and then sending those images to an AI vision model, which reads the visual content and returns it as Markdown, a simple text format that preserves headings, tables, and lists. The workflow is straightforward: you point the library at a file (PDF, Word document, or image), it converts the file into a sequence of page images, sends each image to an AI model with a request to describe the content as Markdown, and then combines all the responses into a single output. You can also use it to extract structured data by providing a schema, which tells the AI exactly which fields to pull from the document and what format they should be in. Zerox is available as both a Node.js package (installed via npm) and a Python package (installed via pip). Both versions support several AI providers including OpenAI, Azure OpenAI, AWS Bedrock, and Google Gemini. The Node.js version has a few additional features not yet in the Python version, such as structured per-page extraction, automatic orientation correction, and edge trimming. Processing multiple pages in parallel is supported in both versions via a concurrency setting. Installation requires a couple of supporting tools for the PDF conversion step. The Node.js version needs GraphicsMagick and Ghostscript, the Python version needs Poppler. Both of these are standard open-source utilities available through package managers on most operating systems. The README includes example code showing how to call the library with a file URL or a local path, a full list of configuration options, and a sample of the structured output the library returns for each page. A hosted demo is available on the Omni AI website for trying out the OCR without installing anything.

Copy-paste prompts

Prompt 1

Show me how to use the Zerox Node.js package to convert a local PDF to Markdown using the OpenAI vision model.

Prompt 2

How do I use Zerox to extract structured fields, invoice number, date, and total, from a PDF invoice into JSON?

Prompt 3

Install the Zerox Python package and convert a Word document to Markdown using AWS Bedrock as the AI provider.

Prompt 4

Help me configure Zerox with concurrency=5 to process a folder of 50 PDFs in parallel and save each result as a Markdown file.

Open on GitHub → Explain another repo

← getomni-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.