norika1207-lab/mercury-mcp

★ 13PythonAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((Mercury MCP))
    What it does
      LLM internals database
      Cross-model comparison
      Layer similarity queries
    Data Collected
      23 models tested
      13 architecture families
      Two-tier activation data
    Query Tools
      Layer similarity finder
      Active dimension lister
      Cross-model composer
    Findings
      Similar layers at 50-60 percent depth
      54 of 84 pairs above 0.7 similarity
    Setup
      MCP client required
      Large binary data files
      No GPU needed to query

mindmap root((Mercury MCP)) What it does LLM internals database Cross-model comparison Layer similarity queries Data Collected 23 models tested 13 architecture families Two-tier activation data Query Tools Layer similarity finder Active dimension lister Cross-model composer Findings Similar layers at 50-60 percent depth 54 of 84 pairs above 0.7 similarity Setup MCP client required Large binary data files No GPU needed to query

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Ask an AI coding agent which internal layers across two different model families behave most similarly.

USE CASE 2

Identify which activation dimensions are consistently active across the majority of the 23 evaluated models.

USE CASE 3

Explore whether layers from different model families could be composed without interfering with each other.

USE CASE 4

Research cross-architecture LLM similarities without needing a GPU cluster or running models locally.

Tech stack

PythonMCP

Getting it running

Difficulty · moderate Time to first run · 30min

Requires an MCP-compatible client such as Claude Code or Cursor, second-tier data files reach up to 840 MB per model.

No license information is mentioned in the explanation.

In plain English

Mercury MCP is a database of internal observations collected from 23 large language models across 13 architecture families, made accessible to AI coding agents through the Model Context Protocol. The project was built by a solo independent researcher using consumer hardware, with no institutional funding or GPU cluster. The core idea is that AI agents using tools like Claude Code or Cursor currently have no way to inspect the internal structure of the models they are communicating with. Mercury provides that data, exposing seven query tools that an agent can call to ask questions like which layers are functionally similar across different model families, which internal dimensions are consistently active across architectures, or how to compose layers from different models for a specific capability. Data was collected at two levels. The first tier hooks into the output layer of each model to capture which internal dimensions are most active during generation. The second tier is more precise: it runs each model with all intermediate layer outputs exposed, then records activation patterns at each layer in a compact binary format. The second tier took about four hours per model on a Mac mini and produces files ranging from 24 to 840 megabytes depending on model size. The findings so far show that functionally similar layers can be found across different architectures at roughly the same relative depth (around 50 to 60 percent of total layers). Out of 84 pairwise cross-model comparisons using second-tier data, 54 show a similarity score above 0.7. Some architecture families occupy distinctly different internal geometry from others, which the author suggests could allow non-interfering composition of capabilities across model families. The project is a work in progress. The initial claim about a particular internal dimension being universal across all families is being reframed as a candidate signal from the less reliable first-tier data, not a confirmed finding. The author is revising the analysis openly with the intention of publishing a paper.

Copy-paste prompts

Prompt 1

Use Mercury MCP in Claude Code to find which layers in Qwen are most functionally similar to layers in a different architecture family and explain what that means for capability transfer.

Prompt 2

Query Mercury to list the top activation dimensions that score above 0.7 similarity across at least 10 of the 23 models.

Prompt 3

Use the Mercury MCP cross-model composition tool to suggest how to combine layers from two architecture families that occupy different internal geometry.

Open on GitHub → Explain another repo

← norika1207-lab on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.