po13on/btom-transformerlens

★ 16PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((btom-transformerlens))
    What it does
      Theory of Mind probing
      Activation capture
      Attribution analysis
      Head clustering
    Tech stack
      Python
      TransformerLens
      PyTorch
      CUDA GPU
    Models studied
      Qwen2.5
      Qwen3
    Dataset
      Hi-ToM
      Nested belief states

mindmap root((btom-transformerlens)) What it does Theory of Mind probing Activation capture Attribution analysis Head clustering Tech stack Python TransformerLens PyTorch CUDA GPU Models studied Qwen2.5 Qwen3 Dataset Hi-ToM Nested belief states

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Probe which attention heads in a Qwen model contribute most to answering nested Theory of Mind questions.

USE CASE 2

Cluster attention heads that behave similarly across many Hi-ToM belief-state examples.

USE CASE 3

Build an attribution graph tracing which internal model components drove a specific answer.

USE CASE 4

Study differences in Theory of Mind reasoning between Qwen2.5 and Qwen3 model families.

Tech stack

PythonPyTorchTransformerLensHuggingFaceCUDAJupyter

Getting it running

Difficulty · hard Time to first run · 1h+

Requires a CUDA-capable GPU and pinned versions of PyTorch, TransformerLens, and HuggingFace Transformers, project README is written in Chinese.

No license is mentioned in the project description.

In plain English

BTOM-TransformerLens is a research workspace for studying the internal behavior of large language models, specifically models from the Qwen2.5 and Qwen3 families. The goal is to understand how these models reason about situations that require understanding what different characters in a story believe or know, a type of reasoning called Theory of Mind. The dataset used for this analysis is called Hi-ToM, which contains questions about nested belief states (what person A thinks person B thinks about something). The analysis uses a library called TransformerLens, which is a tool designed to let researchers look inside transformer-based language models while they are processing text. Rather than just observing what answer a model produces, TransformerLens allows you to capture the values flowing through each layer and attention head at every step. This project builds on that capability to do attribution analysis, which traces which internal components contributed most to a specific output, and clustering, which groups attention heads that behave similarly across many examples. The workflow is centered on a Jupyter notebook (test.ipynb) that walks through loading a model, feeding it Hi-ToM questions, caching internal activations, building an attribution graph, and then visualizing clusters of attention heads. Supporting Python files handle the attribution logic, hook attachment for capturing intermediate values, clustering math, and visualization. A separate file handles quantized model weights for cases where GPU memory is limited. The README is written in Chinese and notes that the project requires Python 3.10 or newer and a CUDA-capable GPU. Specific version pins are listed for the main libraries including PyTorch, TransformerLens, and the Transformers library from HuggingFace. If GPU memory is tight, the README suggests reducing the number of samples, limiting analysis to fewer layers, or running only one of the two supported model loading paths.

Copy-paste prompts

Prompt 1

I loaded Qwen2.5 in BTOM-TransformerLens and ran Hi-ToM questions. How do I read the attribution graph to find which attention heads matter most?

Prompt 2

The test.ipynb notebook is running out of GPU memory on my Qwen3 run. Which setting do I reduce first: number of samples, layer count, or I switch to quantized weights?

Prompt 3

How do I add a new Theory of Mind dataset to this workspace to compare results against Hi-ToM?

Prompt 4

Which Python file in btom-transformerlens handles the attention head clustering math, and what format does it output the cluster assignments in?

Prompt 5

How do I attach a TransformerLens hook to capture activations at a specific layer during Hi-ToM inference?

Open on GitHub → Explain another repo

← po13on on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.