Convert a folder of PDFs into clean markdown to feed as context to a language model.
Transcribe audio or video files to text using a local Whisper model with no data leaving your machine.
Parse a web page by URL into structured markdown for use in an AI summarization pipeline.
Use the Gradio interface to test file parsing interactively without writing any code.
Linux only due to specific dependencies, requires downloading several AI models on first run, a GPU is recommended.
OmniParse is a tool that takes files of almost any type and converts them into clean, structured text that AI systems can use. If you are building an application on top of a language model and need to feed it content from PDFs, presentations, images, audio recordings, videos, or websites, OmniParse handles the conversion step. The output is formatted markdown, which is a simple text format that AI tools and many other programs understand well. The tool runs entirely on your own machine, with no calls to outside services. It uses several AI models internally to do the work: an OCR model called Surya and a vision model called Florence-2 handle documents and images, while a model called Whisper handles audio and video transcription. These models are downloaded when you set up the server. The server itself only runs on Linux, which is noted as a requirement due to specific dependencies. You start OmniParse as a local server, then send files to it through API endpoints. For example, you can post a PDF to one endpoint and get back structured markdown, or post an audio file and get back a text transcript. There is also an endpoint for crawling and parsing a web page by URL. A simple graphical interface built with a library called Gradio is included for interactive use without writing any code. Supported file types include Word documents, PDFs, PowerPoint files, common image formats (PNG, JPG, TIFF, HEIC), video formats (MP4, MKV, AVI, MOV), audio formats (MP3, WAV, AAC), and dynamic web pages. The README also mentions Docker as a deployment option for running the server inside a container. The project is at an early stage and the README notes that integrations with popular AI frameworks are coming soon. It runs on a GPU if one is available, but the documentation notes that a modest GPU is sufficient.
← adithya-s-k on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.