explaingit

meta-pytorch/captum

5,626PythonAudience · researcherComplexity · 3/5LicenseSetup · moderate

TLDR

A Python library that explains why a PyTorch model made a prediction by scoring which inputs or neurons contributed most to the output.

Mindmap

mindmap
  root((Captum))
    What it does
      Explain model predictions
      Input attribution scores
      Neuron analysis
      Concept testing TCAV
    Attribution Methods
      Integrated Gradients
      DeepLift
      SHAP approaches
      Saliency maps
    Tech Stack
      Python
      PyTorch
      pip conda
    Audience
      ML researchers
      Engineers debugging models
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Find out which pixels in an image most influenced an image classifier's prediction to debug surprising results.

USE CASE 2

Score each word in a text input by how much it contributed to a language model's output.

USE CASE 3

Test whether your neural network has learned to associate specific human-defined concepts with its predictions using TCAV.

USE CASE 4

Debug a deployed model that is giving unexpected outputs by tracing which input features drive the bad predictions.

Tech stack

PythonPyTorchpipconda

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Python 3.8+ and PyTorch 1.10+, GPU optional but recommended for large models.

Open-source library from Meta's PyTorch team, free to use and integrate into existing PyTorch projects.

In plain English

Captum is a Python library that helps you understand why a machine learning model made a particular prediction. When a model outputs a result, such as classifying an image or recommending a product, Captum lets you ask which parts of the input, which neurons inside the network, or which training examples contributed most to that output. This is called model interpretability, and it is important both for improving models and for explaining their behavior to others. The library works with PyTorch, a widely used framework for building neural networks. You give Captum your trained model and an input, and it runs one of several attribution algorithms to produce scores indicating how much each input feature influenced the prediction. For example, on an image classification model, it might highlight which pixels most strongly pushed the model toward its chosen label. On a text model, it might score each word by its influence on the output. Captum includes several established methods for computing these attributions, including Integrated Gradients, DeepLift, SHAP-based approaches, and saliency maps. It also supports concept-based explanations through an approach called TCAV, which tests whether a model has learned to associate specific human-defined concepts with its predictions. Beyond attribution, the library includes tools for studying individual layers and neurons inside a network, and for measuring which training examples had the most influence on a given prediction. The library is intended for machine learning researchers working on interpretability methods and for engineers who have deployed models and want to debug unexpected outputs or explain predictions to end users. It integrates with domain-specific PyTorch libraries for vision and text without requiring major changes to existing model code. Installation is via pip or conda. Python 3.8 or later and PyTorch 1.10 or later are required. Tutorials and example notebooks are available in the repository. Captum is developed by Meta's PyTorch team and is currently in beta.

Copy-paste prompts

Prompt 1
I have a trained PyTorch image classifier and want to visualize which pixels influenced its prediction using Captum Integrated Gradients. Show me the code to compute and plot the attribution map.
Prompt 2
Help me use Captum to explain a text classification model's predictions by scoring each word's contribution to the output.
Prompt 3
I want to use Captum's TCAV method to test whether my image model has learned to associate 'stripes' with a certain class. Walk me through setting up the concept test.
Prompt 4
Show me how to use Captum's LayerActivation tool to inspect what neurons in a specific layer are activating on a given input.
Prompt 5
I have a recommendation model in PyTorch and want to measure which training examples had the most influence on a specific prediction using Captum's TracIn approach.
Open on GitHub → Explain another repo

← meta-pytorch on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.