Ask an AI coding agent which internal layers across two different model families behave most similarly.
Identify which activation dimensions are consistently active across the majority of the 23 evaluated models.
Explore whether layers from different model families could be composed without interfering with each other.
Research cross-architecture LLM similarities without needing a GPU cluster or running models locally.
Requires an MCP-compatible client such as Claude Code or Cursor, second-tier data files reach up to 840 MB per model.
Mercury MCP is a database of internal observations collected from 23 large language models across 13 architecture families, made accessible to AI coding agents through the Model Context Protocol. The project was built by a solo independent researcher using consumer hardware, with no institutional funding or GPU cluster. The core idea is that AI agents using tools like Claude Code or Cursor currently have no way to inspect the internal structure of the models they are communicating with. Mercury provides that data, exposing seven query tools that an agent can call to ask questions like which layers are functionally similar across different model families, which internal dimensions are consistently active across architectures, or how to compose layers from different models for a specific capability. Data was collected at two levels. The first tier hooks into the output layer of each model to capture which internal dimensions are most active during generation. The second tier is more precise: it runs each model with all intermediate layer outputs exposed, then records activation patterns at each layer in a compact binary format. The second tier took about four hours per model on a Mac mini and produces files ranging from 24 to 840 megabytes depending on model size. The findings so far show that functionally similar layers can be found across different architectures at roughly the same relative depth (around 50 to 60 percent of total layers). Out of 84 pairwise cross-model comparisons using second-tier data, 54 show a similarity score above 0.7. Some architecture families occupy distinctly different internal geometry from others, which the author suggests could allow non-interfering composition of capabilities across model families. The project is a work in progress. The initial claim about a particular internal dimension being universal across all families is being reframed as a candidate signal from the less reliable first-tier data, not a confirmed finding. The author is revising the analysis openly with the intention of publishing a paper.
← norika1207-lab on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.