lukas-blecher/latex-ocr

★ 16,374PythonAudience · researcherComplexity · 2/5Setup · moderate

Mindmap

mindmap
  root((latex-ocr))
    What It Does
      Image to LaTeX
      Screenshot support
      Clipboard output
    How It Works
      Vision Transformer
      ResNet backbone
      Token-by-token output
    Interfaces
      CLI tool
      Desktop GUI
      HTTP API
      Python import
    Audience
      Researchers
      Students
      Document builders

mindmap root((latex-ocr)) What It Does Image to LaTeX Screenshot support Clipboard output How It Works Vision Transformer ResNet backbone Token-by-token output Interfaces CLI tool Desktop GUI HTTP API Python import Audience Researchers Students Document builders

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Screenshot a math formula from a paper or textbook and get LaTeX code ready to paste into your document

USE CASE 2

Batch-convert a folder of equation images into LaTeX strings for a document digitization pipeline

USE CASE 3

Add equation recognition to a note-taking app via the HTTP API or Docker image

USE CASE 4

Build a tool that ingests scanned scientific papers and extracts editable mathematical expressions

Tech stack

PythonPyTorchStreamlitDocker

Getting it running

Difficulty · moderate Time to first run · 30min

Install via pip, GPU optional but speeds up inference, CPU inference is slower on long equations.

In plain English

LaTeX-OCR (also called pix2tex) is a Python tool that looks at a picture of a mathematical formula and gives you back the LaTeX source code that would render it. LaTeX is the typesetting language scientists and mathematicians use to write equations cleanly, converting an equation back into that code by hand is tedious, so this project automates it with a machine-learning model. Under the hood it is an image-to-text neural network: a Vision Transformer (ViT) encoder with a ResNet backbone reads the image, and a Transformer decoder writes out the LaTeX token by token. A small extra network first predicts the best resolution to resize the image to, because the main model works better on smaller crops that match its training data. There are several ways to use it once installed via pip: a command-line tool called pix2tex that can read images from disk or your clipboard, a desktop GUI called latexocr that lets you screenshot part of your screen and have the predicted LaTeX rendered with MathJax and copied to your clipboard, a Streamlit web demo plus an HTTP API (also available as a Docker image), and a Python import for use inside your own code. Training your own model is also supported, with scripts for building datasets from paired equation images and LaTeX source, using arXiv and the im2latex-100k dataset. You would reach for this if you are taking notes from a textbook or paper, copying formulas out of slides, or building a tool that ingests scanned scientific documents. Results are not perfect, so the author recommends always double-checking the output. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1

How do I use pix2tex to convert a screenshot of an equation from a research paper into LaTeX code I can paste into Overleaf?

Prompt 2

Write a Python script using the latex-ocr library to process a folder of equation images and save each LaTeX string to a text file

Prompt 3

How do I run the LaTeX-OCR HTTP API with Docker and call it from a JavaScript app?

Prompt 4

I want to use the latexocr GUI to screenshot equations from my screen, walk me through installing and using it on Mac

Prompt 5

How accurate is latex-ocr and what kinds of equations does it struggle with?

Open on GitHub → Explain another repo

← lukas-blecher on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.