Add a reusable context cache to an LLM agent that answers many questions over the same big corpus
Compare PEEK against full context prompting on a long document QA workload
Plug a local vLLM or Ollama backend into PEEK by implementing the LMClient interface
Needs an API key for at least one of OpenAI, Anthropic, or Gemini, or a custom LMClient implementation for a local model.
PEEK is the code release for a research paper about a method for helping AI agents work more efficiently with very long external contexts, such as large document collections or whole code repositories. The README links to the arXiv paper and to a blog post explaining the idea. The core concept is a small summary, called a context map, that captures reusable orientation knowledge about the larger external context. This map sits inside the prompt as a kind of cache, in the same spirit that operating systems and databases keep small caches of much larger storage. The authors describe the system as agent and model agnostic and unsupervised. It makes no assumption about the agent's architecture, it works with both open and closed source language models, and it does not need labeled ground truth answers. It uses signals available at inference time to decide what should go into the map and returns an updated version that can be prepended to the next call. The README says it works with most current frontier models. Installation is via pip. The base package is peek-ai. There are optional extras for OpenAI, Anthropic, and Gemini providers, plus an all option that installs every extra. The minimal usage example in the README shows how to wrap your own agent. You create an OpenAIClient with a chosen model, build a CachePolicy with a token budget and an evolve steps value, then loop over your stream of questions. For each question you build a system prompt that includes the current map, run your agent against the external context, and call policy.update with the resulting trajectory. The current map can be saved as JSON for reuse. The project lets you plug in other model providers. Any object that satisfies the peek.LMClient interface, with a completion method and a last_usage method, can act as the backbone. Three reference clients ship with the package for OpenAI, Anthropic, and Gemini, and the README mentions vLLM, Together, Ollama, and local stubs as other examples that would fit the same interface. The rest of the README covers a standard contribution flow with fork, branch, commit, push, and pull request, a paper citation block in BibTeX, and contact information that points at the paper authors, a feedback form, and GitHub issues.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.