Analysis updated 2026-07-03
Load a CSV dataset into Tablesaw, filter rows by a condition, group by a category column, and compute summary statistics without leaving Java.
Explore a dataset interactively in a Jupyter notebook using IJava and Tablesaw to produce histograms and scatter plots rendered in the browser.
Prepare and clean a dataset in Tablesaw and pass it directly to a Smile or DL4J machine learning library for model training.
Import data from a relational database, join it with a local CSV file, and produce a time series chart without any Python or R tooling.
| jtablesaw/tablesaw | undertow-io/undertow | termux/termux-api | |
|---|---|---|---|
| Stars | 3,751 | 3,749 | 3,755 |
| Language | Java | Java | Java |
| Setup difficulty | easy | moderate | moderate |
| Complexity | 2/5 | 4/5 | 2/5 |
| Audience | data | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Tablesaw is a Java library that lets developers work with data tables directly in their code, similar to what Python programmers use pandas for. It handles the full lifecycle of a dataset: reading data in from files or databases, cleaning and reshaping it, running calculations, and then producing charts for visual exploration. On the data side, Tablesaw can import from CSV, TSV, JSON, HTML, Excel, fixed-width text files, and relational databases, whether stored locally or fetched from the web or cloud storage like S3. Once loaded, you can filter rows, sort and group data, add or remove columns, join multiple tables together, and handle missing values. Export back out works to CSV, JSON, HTML, or fixed-width formats. For statistics, the library covers the standard descriptive measures: mean, median, min, max, sum, standard deviation, variance, percentiles, skewness, kurtosis, and geometric mean. These are built in without needing a separate stats package. Visualization is handled through a wrapper around the Plot.ly JavaScript charting library. The result is that you can produce scatter plots, histograms, box plots, time series charts, heatmaps, pie charts, bubble charts, and more from within Java code, with the charts rendered in a browser or notebook environment. Tablesaw also works inside Jupyter notebooks via integrations with BeakerX and IJava, and in Google Colab, which makes it usable for interactive data exploration in a notebook format. It connects with machine learning libraries like Smile, Tribuo, and DL4J for teams that want to use it as a data preparation step before model training. Adding it to a Maven project requires a single dependency block, and optional companion packages handle Excel, JSON, HTML, and charting separately.
Tablesaw is a Java library for loading, cleaning, analyzing, and charting tabular data, similar to what pandas does for Python, with built-in statistics and Plot.ly-powered visualizations that run in the browser or a notebook.
Mainly Java. The stack also includes Java, Maven, Plot.ly.
No specific license terms were mentioned in the explanation.
Setup difficulty is rated easy, with roughly 30min to a first successful run.
Mainly data.
This repo across BitVibe Labs
Verify against the repo before relying on details.