Analysis updated 2026-07-03 · repo last pushed 2026-06-08
Build an AI agent that reads a research paper and writes runnable code to reproduce its results.
Have an AI automatically write academic literature surveys from multiple source papers.
Give a long-running AI agent structured memory so it never loses track of project details.
Inspect 23 reproduced code projects and 20 generated surveys included in the repo.
| autotrustai/paperguru-benchmark | chungyuandye/ntou_thesis | xiongqi123123/awesome-rebuttal | |
|---|---|---|---|
| Stars | 1,281 | 32 | 24 |
| Language | TeX | TeX | TeX |
| Last pushed | 2026-06-08 | — | — |
| Maintenance | Active | — | — |
| Setup difficulty | hard | moderate | moderate |
| Complexity | 4/5 | 2/5 | 2/5 |
| Audience | researcher | writer | researcher |
Figures from each repo's GitHub metadata at analysis time.
Contains benchmark outputs and reproduced projects to inspect, building an agent with this memory system requires understanding the architecture and integrating it into an AI workflow.
PaperGuru is a memory system designed for AI agents that work on long, complex tasks like reproducing research papers or writing academic literature reviews. Instead of an AI forgetting what it just read or getting confused when a source document is updated, this system gives the agent a structured, long-term memory. The practical benefit is that AI can now handle multi-day research and software engineering projects without losing track of the details. The project is built around a concept called Lifecycle-Aware Memory. The core idea is that a good memory system must handle four things: tracking when older information becomes outdated or replaced, understanding how different pieces of evidence connect to each other, keeping search costs manageable even as the archive grows, and ensuring every claim the AI makes can be traced back to a verified source. The system routes queries through a compact index before pulling the full text, so it can efficiently find the right evidence without getting bogged down by massive amounts of data. Researchers, software engineers, and founders building AI agents for deep research would find this useful. For example, if you want an AI to read a dense research paper and automatically write the runnable code to replicate its results, the system needs to remember complex technical details across many steps. PaperGuru was tested on exactly this kind of task, scoring 66% on a benchmark called PaperBench, where the previous best was around 36%. It also scored 94.66% on a survey-writing benchmark, showing it can synthesize long academic reviews effectively. What makes this project notable is its real-world track record beyond just benchmark scores. The system has helped produce ten manuscripts that were formally accepted at peer-reviewed academic conferences and journals. The repository itself contains the actual outputs from these tests, including 23 reproduced code projects and 20 generated academic surveys in multiple formats, allowing anyone to inspect the quality of the work directly.
A memory system for AI agents working on long research tasks like reproducing papers or writing literature reviews. It helps the agent remember details across multi-day projects without losing track or citing outdated info.
Mainly TeX. The stack also includes TeX, Lifecycle-Aware Memory.
Active — commit in last 30 days (last push 2026-06-08).
No license information is provided in the repository, so usage rights are unclear.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.