explaingit

zhuohangu/peek

41PythonAudience · researcherComplexity · 4/5ActiveSetup · moderate

TLDR

Research code for PEEK, a method that learns a reusable context map cache to help LLM agents work efficiently over very long external contexts.

Mindmap

mindmap
  root((peek))
    Inputs
      External context
      Question stream
      Cache policy
    Outputs
      Updated context map
      Agent answers
      JSON saved map
    Use Cases
      Chat with a big repo
      Query a doc collection
      Cache agent orientation
    Tech Stack
      Python
      OpenAI client
      Anthropic client
      Gemini client

Things people build with this

USE CASE 1

Add a reusable context cache to an LLM agent that answers many questions over the same big corpus

USE CASE 2

Compare PEEK against full context prompting on a long document QA workload

USE CASE 3

Plug a local vLLM or Ollama backend into PEEK by implementing the LMClient interface

Tech stack

PythonOpenAIAnthropicGemini

Getting it running

Difficulty · moderate Time to first run · 30min

Needs an API key for at least one of OpenAI, Anthropic, or Gemini, or a custom LMClient implementation for a local model.

In plain English

PEEK is the code release for a research paper about a method for helping AI agents work more efficiently with very long external contexts, such as large document collections or whole code repositories. The README links to the arXiv paper and to a blog post explaining the idea. The core concept is a small summary, called a context map, that captures reusable orientation knowledge about the larger external context. This map sits inside the prompt as a kind of cache, in the same spirit that operating systems and databases keep small caches of much larger storage. The authors describe the system as agent and model agnostic and unsupervised. It makes no assumption about the agent's architecture, it works with both open and closed source language models, and it does not need labeled ground truth answers. It uses signals available at inference time to decide what should go into the map and returns an updated version that can be prepended to the next call. The README says it works with most current frontier models. Installation is via pip. The base package is peek-ai. There are optional extras for OpenAI, Anthropic, and Gemini providers, plus an all option that installs every extra. The minimal usage example in the README shows how to wrap your own agent. You create an OpenAIClient with a chosen model, build a CachePolicy with a token budget and an evolve steps value, then loop over your stream of questions. For each question you build a system prompt that includes the current map, run your agent against the external context, and call policy.update with the resulting trajectory. The current map can be saved as JSON for reuse. The project lets you plug in other model providers. Any object that satisfies the peek.LMClient interface, with a completion method and a last_usage method, can act as the backbone. Three reference clients ship with the package for OpenAI, Anthropic, and Gemini, and the README mentions vLLM, Together, Ollama, and local stubs as other examples that would fit the same interface. The rest of the README covers a standard contribution flow with fork, branch, commit, push, and pull request, a paper citation block in BibTeX, and contact information that points at the paper authors, a feedback form, and GitHub issues.

Copy-paste prompts

Prompt 1
Install peek-ai with the all extras and run the README example against a small folder of markdown files
Prompt 2
Write a peek.LMClient adapter for Ollama and run the PEEK loop against a local Llama model
Prompt 3
Pick a token budget and evolve steps for PEEK that fit a 50 page PDF QA workload and explain the trade off
Prompt 4
Save the PEEK context map as JSON after one session and reload it in a new run without re processing the corpus
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.