explaingit

adithya-s-k/omniparse

6,817PythonAudience · developerComplexity · 4/5Setup · hard

TLDR

A self-hosted local server that converts almost any file type, PDFs, images, audio, video, presentations, and web pages, into clean markdown text that AI tools and language models can use.

Mindmap

mindmap
  root((repo))
    Input Types
      Documents PDF Word
      Images PNG JPG
      Audio MP3 WAV
      Video MP4 MKV
      Web Pages
    Output
      Structured Markdown
    AI Models Used
      Surya OCR
      Florence-2 Vision
      Whisper Audio
    Interfaces
      REST API
      Gradio UI
    Deployment
      Local Server
      Docker
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Convert a folder of PDFs into clean markdown to feed as context to a language model.

USE CASE 2

Transcribe audio or video files to text using a local Whisper model with no data leaving your machine.

USE CASE 3

Parse a web page by URL into structured markdown for use in an AI summarization pipeline.

USE CASE 4

Use the Gradio interface to test file parsing interactively without writing any code.

Tech stack

PythonGradioDockerWhisperFlorence-2

Getting it running

Difficulty · hard Time to first run · 1h+

Linux only due to specific dependencies, requires downloading several AI models on first run, a GPU is recommended.

License not specified in the explanation.

In plain English

OmniParse is a tool that takes files of almost any type and converts them into clean, structured text that AI systems can use. If you are building an application on top of a language model and need to feed it content from PDFs, presentations, images, audio recordings, videos, or websites, OmniParse handles the conversion step. The output is formatted markdown, which is a simple text format that AI tools and many other programs understand well. The tool runs entirely on your own machine, with no calls to outside services. It uses several AI models internally to do the work: an OCR model called Surya and a vision model called Florence-2 handle documents and images, while a model called Whisper handles audio and video transcription. These models are downloaded when you set up the server. The server itself only runs on Linux, which is noted as a requirement due to specific dependencies. You start OmniParse as a local server, then send files to it through API endpoints. For example, you can post a PDF to one endpoint and get back structured markdown, or post an audio file and get back a text transcript. There is also an endpoint for crawling and parsing a web page by URL. A simple graphical interface built with a library called Gradio is included for interactive use without writing any code. Supported file types include Word documents, PDFs, PowerPoint files, common image formats (PNG, JPG, TIFF, HEIC), video formats (MP4, MKV, AVI, MOV), audio formats (MP3, WAV, AAC), and dynamic web pages. The README also mentions Docker as a deployment option for running the server inside a container. The project is at an early stage and the README notes that integrations with popular AI frameworks are coming soon. It runs on a GPU if one is available, but the documentation notes that a modest GPU is sufficient.

Copy-paste prompts

Prompt 1
I want to use OmniParse to convert a PDF into markdown so I can feed it to a language model. Show me how to start the server and send a POST request to parse the PDF.
Prompt 2
I have a folder of MP4 lecture videos I want to transcribe to text using OmniParse. What endpoint do I call, what parameters do I pass, and what does the response look like?
Prompt 3
How do I deploy OmniParse using Docker on a Linux machine? Show me the docker run command and any required GPU flags.
Prompt 4
I want to crawl a website URL and get back structured markdown using OmniParse. Show me the API call I need to make.
Open on GitHub → Explain another repo

← adithya-s-k on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.