autolearnmem/automem

Analysis updated 2026-05-18

★ 32PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((AutoMem))
    What it does
      Memory as trainable skill
      File-system memory actions
      Long-horizon game agents
    Loop 1 Scaffold
      Meta-LLM reads traces
      Rewrites agent code
      Keeps if score improves
    Loop 2 Training
      Select memory examples
      LoRA fine-tune specialist
      Two-model inference
    Games Evaluated
      Crafter crafting tasks
      MiniHack dungeons
      NetHack roguelike
    Requirements
      vLLM base model server
      LLaMA-Factory for LoRA
      Claude Code meta-LLM

mindmap root((AutoMem)) What it does Memory as trainable skill File-system memory actions Long-horizon game agents Loop 1 Scaffold Meta-LLM reads traces Rewrites agent code Keeps if score improves Loop 2 Training Select memory examples LoRA fine-tune specialist Two-model inference Games Evaluated Crafter crafting tasks MiniHack dungeons NetHack roguelike Requirements vLLM base model server LLaMA-Factory for LoRA Claude Code meta-LLM

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run AutoMem's pre-evolved scaffolds on Crafter or MiniHack to reproduce the paper's results using a locally served Qwen2.5-32B model.

USE CASE 2

Apply Loop 1 scaffold optimization to automatically improve an LLM agent's memory management code on a custom task environment.

USE CASE 3

Use Loop 2 to fine-tune a memory specialist model on your agent's own traces, then evaluate the two-model configuration against the baseline.

What is it built with?

PythonPyTorchLoRAvLLMLLaMA-FactoryQwen2.5

How does it compare?

	autolearnmem/automem	cortex-ai-network/crypto-arbitrage-bot-automated-trading	madguyevans-creator/resale-agent-skill-hub
Stars	32	32	32
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	5/5	2/5	3/5
Audience	researcher	general	vibe coder

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires three separate component installs (BALROG, LLaMA-Factory, Claude Code CLI) plus a GPU server running vLLM for the base model.

No license is stated in this repository.

In plain English

AutoMem is an AI research project that asks a specific question: can a language model agent learn to manage its own memory as a skill? Instead of storing information in a fixed, pre-designed memory system, the agent maintains a directory of text files and decides for itself what to record, when to look something up, and how to organize what it knows. These file operations (logging what just happened, consulting past notes before acting) are part of the agent's action space alongside the actual task actions. Two outer improvement loops run over time. The first loop (scaffold optimization) has a powerful meta-LLM read the agent's complete game traces, diagnose where memory use went wrong, and rewrite the agent's code, prompts, and memory schema to fix the problems. A revision is only kept if it improves average task performance on a fixed test set. The second loop (memory-proficiency training) uses the meta-LLM to select good examples of memory operations from the base model's own traces, then fine-tunes a separate smaller model (a memory specialist) on those examples using LoRA. At inference time, the memory specialist handles logging and consulting notes, while the original unmodified model handles the actual task actions. The system was evaluated on three challenging long-horizon games: Crafter (a 2D crafting game), MiniHack (procedurally generated dungeons), and NetHack (a complex roguelike). Using Qwen2.5-32B-Instruct as the base model, AutoMem achieved performance competitive with frontier systems by improving memory alone, without changing how the model handles gameplay decisions. Setting up AutoMem requires three components: the BALROG benchmark harness for running the game environments, LLaMA-Factory for LoRA fine-tuning in Loop 2, and the Claude Code CLI for the meta-LLM that drives both optimization loops. The base model is served via vLLM. This is academic research code. It is not a plug-in for general LLM applications, it is a research framework for studying how agents can learn to use memory more effectively in long-horizon sequential tasks.

Copy-paste prompts

Prompt 1

I want to reproduce AutoMem's Crafter results. Walk me through setting up BALROG, serving Qwen2.5-32B with vLLM, and running the crafter_v5 scaffold evaluation.

Prompt 2

How does AutoMem's Loop 1 scaffold optimization work? What does the meta-LLM change in each iteration and how does it decide whether to keep a revision?

Prompt 3

Explain AutoMem's Loop 2 training engine: how does it select training examples from base model traces, and how does two-model inference work at test time?

Prompt 4

I want to apply AutoMem's memory-as-filesystem approach to a new text-based task. What parts of the inner_agent_v0 scaffold do I need to adapt?

Prompt 5

What LoRA configuration does AutoMem's training engine use for the memory specialist and how does the meta-LLM choose the data-selection logic each iteration?

Frequently asked questions

What is automem?

AutoMem is an AI research framework that teaches LLM agents to manage memory as a trainable skill using two optimization loops: one that rewrites the agent scaffold and one that fine-tunes a dedicated memory specialist with LoRA.

What language is automem written in?

Mainly Python. The stack also includes Python, PyTorch, LoRA.

What license does automem use?

No license is stated in this repository.

How hard is automem to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is automem for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub autolearnmem on gitmyhub

Verify against the repo before relying on details.