Convert scanned academic papers into editable Markdown files with tables and math formulas accurately preserved.
Process a folder of document images in batch, outputting one Markdown file per image, and resume interrupted runs automatically.
Extract structured text from photographed forms or reports into a searchable, editable format.
Requires a GPU with at least 4 GB of video memory and model weights downloaded separately from Hugging Face before running.
ABot-OCR is an AI model that reads images of document pages and converts them into structured Markdown text. OCR stands for optical character recognition, which is the technology that turns images of text into actual readable text. This particular model goes further than basic OCR by also recognizing mathematical formulas, tables, and the overall layout of the document, then outputting everything in a format that preserves that structure. The practical use case is converting scanned PDFs or photographs of documents, academic papers, or forms into text that can be edited, searched, or processed further. Instead of outputting plain unformatted text, the model produces Markdown where tables are encoded as HTML, math formulas are written in LaTeX notation, and the document structure is retained as much as possible. To use it, you download the model weights from Hugging Face (the files are not included in this repository due to their size) and run a Python inference script. The script uses a library called vLLM to load and run the model efficiently on a GPU. You point it at a folder of images, and it writes one Markdown file per image to an output directory. Images that already have a corresponding output file are skipped, so interrupted runs can be resumed. A GPU with around 4 GB of video memory is needed, though actual requirements depend on image size and how many images you process at once. The README is relatively sparse and still contains placeholder notes where benchmark details and training background are intended to be filled in. The benchmark figure references a dataset called OmniDocBench. The project is from a computer vision lab and cites several earlier open-source OCR projects as influences.
← amap-cvlab on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.