autotrustai/paperguru-benchmark

Analysis updated 2026-07-03 · repo last pushed 2026-06-08

★ 1,281TeXAudience · researcherComplexity · 4/5ActiveSetup · hard

Mindmap

mindmap
  root((repo))
    What it does
      Long-term memory for AI
      Tracks outdated information
      Links connected evidence
      Traces claims to sources
    Tech stack
      TeX
      Compact memory index
      Query routing
    Use cases
      Reproduce research papers
      Write literature surveys
      Multi-day AI research
      Deep research agents
    Audience
      Researchers
      Software engineers
      Founders
    Notable results
      66 pct on PaperBench
      94 pct on survey benchmark
      10 peer-reviewed manuscripts

mindmap root((repo)) What it does Long-term memory for AI Tracks outdated information Links connected evidence Traces claims to sources Tech stack TeX Compact memory index Query routing Use cases Reproduce research papers Write literature surveys Multi-day AI research Deep research agents Audience Researchers Software engineers Founders Notable results 66 pct on PaperBench 94 pct on survey benchmark 10 peer-reviewed manuscripts

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Build an AI agent that reads a research paper and writes runnable code to reproduce its results.

USE CASE 2

Have an AI automatically write academic literature surveys from multiple source papers.

USE CASE 3

Give a long-running AI agent structured memory so it never loses track of project details.

USE CASE 4

Inspect 23 reproduced code projects and 20 generated surveys included in the repo.

What is it built with?

TeXLifecycle-Aware Memory

How does it compare?

	autotrustai/paperguru-benchmark	chungyuandye/ntou_thesis	xiongqi123123/awesome-rebuttal
Stars	1,281	32	24
Language	TeX	TeX	TeX
Last pushed	2026-06-08	—	—
Maintenance	Active	—	—
Setup difficulty	hard	moderate	moderate
Complexity	4/5	2/5	2/5
Audience	researcher	writer	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Contains benchmark outputs and reproduced projects to inspect, building an agent with this memory system requires understanding the architecture and integrating it into an AI workflow.

No license information is provided in the repository, so usage rights are unclear.

In plain English

PaperGuru is a memory system designed for AI agents that work on long, complex tasks like reproducing research papers or writing academic literature reviews. Instead of an AI forgetting what it just read or getting confused when a source document is updated, this system gives the agent a structured, long-term memory. The practical benefit is that AI can now handle multi-day research and software engineering projects without losing track of the details. The project is built around a concept called Lifecycle-Aware Memory. The core idea is that a good memory system must handle four things: tracking when older information becomes outdated or replaced, understanding how different pieces of evidence connect to each other, keeping search costs manageable even as the archive grows, and ensuring every claim the AI makes can be traced back to a verified source. The system routes queries through a compact index before pulling the full text, so it can efficiently find the right evidence without getting bogged down by massive amounts of data. Researchers, software engineers, and founders building AI agents for deep research would find this useful. For example, if you want an AI to read a dense research paper and automatically write the runnable code to replicate its results, the system needs to remember complex technical details across many steps. PaperGuru was tested on exactly this kind of task, scoring 66% on a benchmark called PaperBench, where the previous best was around 36%. It also scored 94.66% on a survey-writing benchmark, showing it can synthesize long academic reviews effectively. What makes this project notable is its real-world track record beyond just benchmark scores. The system has helped produce ten manuscripts that were formally accepted at peer-reviewed academic conferences and journals. The repository itself contains the actual outputs from these tests, including 23 reproduced code projects and 20 generated academic surveys in multiple formats, allowing anyone to inspect the quality of the work directly.

Copy-paste prompts

Prompt 1

Using PaperGuru's lifecycle-aware memory approach, design an AI agent workflow that reads a research paper, tracks which details become outdated as the paper is updated, and writes runnable reproduction code.

Prompt 2

Based on PaperGuru's benchmark results and reproduced projects, outline a plan to build an AI literature survey generator that synthesizes multiple papers into a long academic review with traceable citations.

Prompt 3

Implement a compact index query-routing system inspired by PaperGuru that lets an AI agent efficiently retrieve evidence from a growing document archive without searching full text every time.

Prompt 4

Using PaperGuru as a reference, create a memory architecture for a multi-day research agent that tracks information lifecycle, connects evidence, and ensures every claim links to a verified source.

Frequently asked questions

What is paperguru-benchmark?

A memory system for AI agents working on long research tasks like reproducing papers or writing literature reviews. It helps the agent remember details across multi-day projects without losing track or citing outdated info.

What language is paperguru-benchmark written in?

Mainly TeX. The stack also includes TeX, Lifecycle-Aware Memory.

Is paperguru-benchmark actively maintained?

Active — commit in last 30 days (last push 2026-06-08).

What license does paperguru-benchmark use?

No license information is provided in the repository, so usage rights are unclear.

How hard is paperguru-benchmark to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is paperguru-benchmark for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub autotrustai on gitmyhub

Verify against the repo before relying on details.