explaingit

camelot-dev/camelot

Analysis updated 2026-07-03

3,691PythonAudience · dataComplexity · 2/5LicenseSetup · easy

TLDR

A Python library that pulls tables out of PDF files and converts them into spreadsheet-ready data in just a few lines of code.

Mindmap

mindmap
  root((camelot))
    What it does
      PDF table extraction
      Text-based PDFs only
    Output formats
      CSV
      Excel
      JSON
      SQLite
    Usage
      Python library
      Command-line interface
    Quality metrics
      Accuracy score
      Whitespace score
    Audience
      Data analysts
      Researchers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Extract all tables from a government PDF report and export them to CSV for analysis in Excel or Google Sheets.

USE CASE 2

Convert financial statements stored as PDFs into pandas DataFrames so you can run calculations on the numbers.

USE CASE 3

Use the command-line interface to batch-extract tables from multiple PDF files without writing any Python code.

USE CASE 4

Filter out low-quality extractions automatically using Camelot's built-in accuracy and whitespace scores.

What is it built with?

Pythonpandas

How does it compare?

camelot-dev/camelotpurpleailab/decepticonopenai/glide-text2im
Stars3,6913,6913,690
LanguagePythonPythonPython
Setup difficultyeasymoderateeasy
Complexity2/54/53/5
Audiencedataops devopsresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Only works with text-based PDFs, scanned image PDFs are not supported.

MIT license, use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

Camelot is a Python library for pulling tables out of PDF files and turning them into structured data you can actually work with. PDFs are notoriously difficult to extract data from because the format is designed for display, not data exchange. Camelot solves that problem for text-based PDFs: the kind where you can click and drag to select text in a PDF viewer. A few lines of Python code are all you need to get started. You point the library at a PDF file, it finds the tables, and returns them as pandas DataFrames (a standard Python format for tabular data). From there you can export the results to CSV, JSON, Excel, HTML, Markdown, or SQLite. Each extracted table also comes with quality metrics including an accuracy score and a whitespace score, so you can filter out poorly extracted tables without having to check each one by hand. The library includes a command-line interface as well, so you do not need to write any Python if you just want to run a quick extraction. Installation is available through pip (the standard Python package installer) or through conda for Anaconda users. One important limitation: Camelot only works with text-based PDFs. Scanned documents (PDFs that are essentially images of pages) are not supported. If you cannot select and copy text from a table in your PDF viewer, Camelot will not be able to extract it. The project is licensed under the MIT license and has community wrappers available for PHP and a separate C# implementation.

Copy-paste prompts

Prompt 1
Show me Python code using Camelot to extract all tables from a file called 'report.pdf' and save each one as a separate CSV file.
Prompt 2
I have a PDF with financial data. Write a Camelot script that extracts the tables, prints the accuracy score for each, and only exports tables with accuracy above 90%.
Prompt 3
Give me the Camelot CLI command to extract tables from 'invoice.pdf' and output them as an Excel file.
Prompt 4
I extracted a table from a PDF using Camelot but the columns are misaligned. What Camelot settings should I try to improve the extraction quality?

Frequently asked questions

What is camelot?

A Python library that pulls tables out of PDF files and converts them into spreadsheet-ready data in just a few lines of code.

What language is camelot written in?

Mainly Python. The stack also includes Python, pandas.

What license does camelot use?

MIT license, use freely for any purpose including commercial, as long as you keep the copyright notice.

How hard is camelot to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is camelot for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub camelot-dev on gitmyhub

Verify against the repo before relying on details.