Extract a table of numbers from a government or financial PDF report into a CSV file for analysis
Convert a multi-page PDF data export into a spreadsheet without copy-paste errors
Run Tabula via Docker to process PDFs in a repeatable automated workflow
Only works on text-based PDFs, scanned images need a separate OCR step first. Requires Java 7 or newer.
Tabula is a desktop application that extracts data tables from PDF files and converts them into spreadsheet-friendly formats like CSV. If you have ever received a PDF containing a table of numbers or a data report and needed that information in a spreadsheet but found copying it out was impossible or produced garbled results, Tabula addresses exactly that problem. You upload the PDF, draw a selection box around the table you want, and Tabula pulls out the rows and columns as structured data you can open in Excel or import into a database. The application runs locally on your machine and works through a browser interface. After launching it, a web page opens at a local address (127.0.0.1:8080) where you do all the work. Your files never leave your computer, which matters when working with confidential documents. The README does note two small exceptions: the app makes a request to check for newer versions and sends a usage count to a statistics counter, both of which can be disabled with command-line flags if needed. Tabula only works with text-based PDFs, not scanned images. A quick test is whether you can click and drag to select text in the PDF using a standard PDF viewer. If you can, Tabula should be able to read it. Scanned pages that contain pictures of text require a separate optical character recognition step before Tabula can help. Installation is available as a packaged app for Windows and macOS, a snap package for Linux, a plain JAR file runnable with Java on any platform, or via Docker Compose. Java 7 or newer is required. A separate command-line library called tabula-java handles the underlying extraction logic and continues to receive occasional updates from the community. The README opens with a note that Tabula is a volunteer project with no active paid development at this time, and the end-user application here is unlikely to see near-term updates.
← tabulapdf on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.