explaingit

atlanhq/camelot

Analysis updated 2026-07-03

3,716PythonAudience · dataComplexity · 2/5LicenseSetup · easy

TLDR

Camelot is a Python library that extracts tables from text-based PDF files in a few lines of code, returning each table as a pandas DataFrame with accuracy scores and export options for CSV, JSON, Excel, HTML, and SQLite.

Mindmap

mindmap
  root((Camelot))
    What it does
      PDF table extraction
      Returns DataFrames
      Accuracy scoring
    Export formats
      CSV JSON Excel
      HTML SQLite
    Interfaces
      Python library
      Command line
      Excalibur web UI
    Limits
      Text PDFs only
      No scanned images
    Setup
      pip or conda
      Read the Docs
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Extract all tables from a government report PDF and save them as CSV or Excel files without manual retyping.

USE CASE 2

Convert PDF financial statements into pandas DataFrames for automated data analysis in Python.

USE CASE 3

Use the command-line interface to batch-extract tables from multiple PDFs on a server without writing Python code.

USE CASE 4

Filter out poorly parsed tables using Camelot's built-in accuracy scores before saving or processing results.

What is it built with?

Pythonpandaspipconda

How does it compare?

atlanhq/camelotwookai/paper-tips-and-tricksbytedance/byteps
Stars3,7163,7163,717
LanguagePythonPythonPython
Setup difficultyeasyeasyhard
Complexity2/51/54/5
Audiencedataresearcherresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Only works on text-based PDFs where you can highlight text, scanned image PDFs are not supported.

Use freely for any purpose including commercial projects as long as you keep the copyright notice.

In plain English

Camelot is a Python library for pulling tables out of PDF files. If you have ever opened a PDF report or government document and wished you could copy its tables into a spreadsheet without retyping everything by hand, this library handles that task in a few lines of code. You point Camelot at a PDF file, call one function, and it returns a list of tables. Each table becomes a pandas DataFrame, which is the standard data structure used in Python for working with rows and columns of data. From there you can export to CSV, JSON, Excel, HTML, or SQLite, or continue working with the data directly in Python. Camelot also reports an accuracy score and a whitespace score for each extracted table, which lets you filter out poorly parsed tables without having to inspect each one manually. One important limitation: Camelot only works with text-based PDFs, not scanned images. A text-based PDF is one where you can click and drag to highlight text in a standard PDF viewer. If your document was scanned with a camera or photocopier, the text is stored as an image and Camelot cannot read it. Installation is available through conda or pip. There is also a command-line interface for users who prefer to work in a terminal without writing Python code. A separate companion project called Excalibur provides a web interface for the same extraction functionality, useful if you want to share access with colleagues who do not write code. The project is open source under the MIT license, meaning you can use it freely in personal or commercial projects. Documentation is hosted on Read the Docs and covers installation, usage examples, and a comparison with other PDF table extraction tools.

Copy-paste prompts

Prompt 1
Write a Python script using Camelot that reads every table from a PDF at a given file path and exports each one to a separate CSV file.
Prompt 2
How do I use Camelot to keep only tables with an accuracy score above 90 from a multi-page PDF annual report?
Prompt 3
Show me how to install Camelot via conda, then run a quick table extraction on a sample text-based PDF and print the result as a DataFrame.
Prompt 4
Set up the Excalibur web interface for Camelot so a non-technical colleague can extract PDF tables through a browser without writing code.

Frequently asked questions

What is camelot?

Camelot is a Python library that extracts tables from text-based PDF files in a few lines of code, returning each table as a pandas DataFrame with accuracy scores and export options for CSV, JSON, Excel, HTML, and SQLite.

What language is camelot written in?

Mainly Python. The stack also includes Python, pandas, pip.

What license does camelot use?

Use freely for any purpose including commercial projects as long as you keep the copyright notice.

How hard is camelot to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is camelot for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub atlanhq on gitmyhub

Verify against the repo before relying on details.