camelot-dev/camelot

Analysis updated 2026-07-03

★ 3,691PythonAudience · dataComplexity · 2/5LicenseSetup · easy

Mindmap

mindmap
  root((camelot))
    What it does
      PDF table extraction
      Text-based PDFs only
    Output formats
      CSV
      Excel
      JSON
      SQLite
    Usage
      Python library
      Command-line interface
    Quality metrics
      Accuracy score
      Whitespace score
    Audience
      Data analysts
      Researchers

mindmap root((camelot)) What it does PDF table extraction Text-based PDFs only Output formats CSV Excel JSON SQLite Usage Python library Command-line interface Quality metrics Accuracy score Whitespace score Audience Data analysts Researchers

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Extract all tables from a government PDF report and export them to CSV for analysis in Excel or Google Sheets.

USE CASE 2

Convert financial statements stored as PDFs into pandas DataFrames so you can run calculations on the numbers.

USE CASE 3

Use the command-line interface to batch-extract tables from multiple PDF files without writing any Python code.

USE CASE 4

Filter out low-quality extractions automatically using Camelot's built-in accuracy and whitespace scores.

What is it built with?

Pythonpandas

How does it compare?

	camelot-dev/camelot	purpleailab/decepticon	openai/glide-text2im
Stars	3,691	3,691	3,690
Language	Python	Python	Python
Setup difficulty	easy	moderate	easy
Complexity	2/5	4/5	3/5
Audience	data	ops devops	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Only works with text-based PDFs, scanned image PDFs are not supported.

MIT license, use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

Camelot is a Python library for pulling tables out of PDF files and turning them into structured data you can actually work with. PDFs are notoriously difficult to extract data from because the format is designed for display, not data exchange. Camelot solves that problem for text-based PDFs: the kind where you can click and drag to select text in a PDF viewer. A few lines of Python code are all you need to get started. You point the library at a PDF file, it finds the tables, and returns them as pandas DataFrames (a standard Python format for tabular data). From there you can export the results to CSV, JSON, Excel, HTML, Markdown, or SQLite. Each extracted table also comes with quality metrics including an accuracy score and a whitespace score, so you can filter out poorly extracted tables without having to check each one by hand. The library includes a command-line interface as well, so you do not need to write any Python if you just want to run a quick extraction. Installation is available through pip (the standard Python package installer) or through conda for Anaconda users. One important limitation: Camelot only works with text-based PDFs. Scanned documents (PDFs that are essentially images of pages) are not supported. If you cannot select and copy text from a table in your PDF viewer, Camelot will not be able to extract it. The project is licensed under the MIT license and has community wrappers available for PHP and a separate C# implementation.

Copy-paste prompts

Prompt 1

Show me Python code using Camelot to extract all tables from a file called 'report.pdf' and save each one as a separate CSV file.

Prompt 2

I have a PDF with financial data. Write a Camelot script that extracts the tables, prints the accuracy score for each, and only exports tables with accuracy above 90%.

Prompt 3

Give me the Camelot CLI command to extract tables from 'invoice.pdf' and output them as an Excel file.

Prompt 4

I extracted a table from a PDF using Camelot but the columns are misaligned. What Camelot settings should I try to improve the extraction quality?

Frequently asked questions

What is camelot?

A Python library that pulls tables out of PDF files and converts them into spreadsheet-ready data in just a few lines of code.

What language is camelot written in?

Mainly Python. The stack also includes Python, pandas.

What license does camelot use?

MIT license, use freely for any purpose including commercial, as long as you keep the copyright notice.

How hard is camelot to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is camelot for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub camelot-dev on gitmyhub

Verify against the repo before relying on details.