Analysis updated 2026-07-03
Extract all tables from a government report PDF and save them as CSV or Excel files without manual retyping.
Convert PDF financial statements into pandas DataFrames for automated data analysis in Python.
Use the command-line interface to batch-extract tables from multiple PDFs on a server without writing Python code.
Filter out poorly parsed tables using Camelot's built-in accuracy scores before saving or processing results.
| atlanhq/camelot | wookai/paper-tips-and-tricks | bytedance/byteps | |
|---|---|---|---|
| Stars | 3,716 | 3,716 | 3,717 |
| Language | Python | Python | Python |
| Setup difficulty | easy | easy | hard |
| Complexity | 2/5 | 1/5 | 4/5 |
| Audience | data | researcher | researcher |
Figures from each repo's GitHub metadata at analysis time.
Only works on text-based PDFs where you can highlight text, scanned image PDFs are not supported.
Camelot is a Python library for pulling tables out of PDF files. If you have ever opened a PDF report or government document and wished you could copy its tables into a spreadsheet without retyping everything by hand, this library handles that task in a few lines of code. You point Camelot at a PDF file, call one function, and it returns a list of tables. Each table becomes a pandas DataFrame, which is the standard data structure used in Python for working with rows and columns of data. From there you can export to CSV, JSON, Excel, HTML, or SQLite, or continue working with the data directly in Python. Camelot also reports an accuracy score and a whitespace score for each extracted table, which lets you filter out poorly parsed tables without having to inspect each one manually. One important limitation: Camelot only works with text-based PDFs, not scanned images. A text-based PDF is one where you can click and drag to highlight text in a standard PDF viewer. If your document was scanned with a camera or photocopier, the text is stored as an image and Camelot cannot read it. Installation is available through conda or pip. There is also a command-line interface for users who prefer to work in a terminal without writing Python code. A separate companion project called Excalibur provides a web interface for the same extraction functionality, useful if you want to share access with colleagues who do not write code. The project is open source under the MIT license, meaning you can use it freely in personal or commercial projects. Documentation is hosted on Read the Docs and covers installation, usage examples, and a comparison with other PDF table extraction tools.
Camelot is a Python library that extracts tables from text-based PDF files in a few lines of code, returning each table as a pandas DataFrame with accuracy scores and export options for CSV, JSON, Excel, HTML, and SQLite.
Mainly Python. The stack also includes Python, pandas, pip.
Use freely for any purpose including commercial projects as long as you keep the copyright notice.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly data.
This repo across BitVibe Labs
Verify against the repo before relying on details.