explaingit

pair-code/facets

7,351Jupyter NotebookAudience · dataComplexity · 2/5Setup · moderate

TLDR

Two browser-based tools that let you visually explore machine learning datasets, one shows statistics per data column to catch quality issues, the other lets you zoom into individual data points across tens of thousands of items.

Mindmap

mindmap
  root((facets))
    Facets Overview
      Feature statistics
      Missing value detection
      Train vs test compare
    Facets Dive
      Individual data points
      Zoom in and out
      Group by feature
    Tech
      TypeScript
      Python pip package
      Web Components
    Integration
      Jupyter notebooks
      Google Colab
      Standalone web page
    Audience
      Data scientists
      ML engineers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Spot missing values, unexpected ranges, or distribution mismatches between your training and test datasets before you start model training.

USE CASE 2

Explore tens of thousands of image or text samples side-by-side in a grid, grouping them by feature values to find patterns in your data.

USE CASE 3

Embed an interactive dataset overview directly inside a Jupyter or Google Colab notebook to share data quality findings with teammates.

Tech stack

TypeScriptPythonJupyter NotebookpipWeb Components

Getting it running

Difficulty · moderate Time to first run · 30min

Visualizations currently work only in Chrome, install the facets-overview pip package to generate statistics for the Overview tool.

Not stated in the explanation, check the repository directly for license terms.

In plain English

Facets is a pair of browser-based visualization tools designed to help people explore and understand machine learning datasets without writing custom analysis code. Both tools can run inside Jupyter notebooks or on standalone web pages, and they are built as web components backed by TypeScript. The first tool, Facets Overview, gives a summary of one or more datasets at the feature level. A feature is any column or attribute in your data, such as age, income, or a category label. Overview computes statistics for each feature and renders them visually so you can quickly spot problems: features with a large number of missing values, unexpected value ranges, or distributions that differ significantly between your training set and your test set. Suspicious features are highlighted in red, and you can sort columns by metrics like the proportion of missing data. A Python package called facets-overview, installable via pip, generates the statistics that the visualization reads. The second tool, Facets Dive, is for hands-on exploration of individual data points rather than column-level statistics. It can display up to tens of thousands of items at once, each rendered as a small tile. You sort and group items by their feature values, creating a grid that reveals patterns across the dataset. Zooming in shows specific examples, zooming out shows the full distribution. The README describes the experience as switching between a high-level view and low-level details using smooth animation. Both tools embed into Google Colab or Jupyter notebooks using HTML tags that load the visualization components. The repository includes example notebooks showing how to connect the tools to a dataset. One known limitation noted in the README is that the visualizations currently work only in Chrome. The disclaimer at the bottom notes this is not an official Google product.

Copy-paste prompts

Prompt 1
Using the pair-code/facets Facets Overview tool, write Python code that loads my CSV dataset, generates feature statistics, and renders the Facets Overview visualization inside a Jupyter notebook.
Prompt 2
I have a training set and a test set as pandas DataFrames. Use the facets-overview pip package to compare their feature distributions and flag any features where the distributions diverge significantly.
Prompt 3
Show me how to embed the Facets Dive visualization in a Colab notebook so I can interactively explore individual examples in my dataset grouped by their label column.
Prompt 4
I have an image classification dataset. Use pair-code/facets to create a Facets Dive grid that displays thumbnail images grouped by class label so I can spot mislabeled examples.
Open on GitHub → Explain another repo

← pair-code on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.