Infinite Context
Rekall's progressive disclosure pattern that lets agents access unlimited memory without exhausting their context window.
LLM context windows are finite, but memory stores can be massive. A coding agent might accumulate thousands of memories across sessions — project decisions, file contents, debugging traces, user preferences. Traditional memory search returns the full content of every result, burning through tokens quickly and leaving little room for the actual task.
Infinite Context solves this with a two-step retrieval pattern called Progressive Disclosure. Instead of loading everything at once, you first retrieve a lightweight index of matching memories, then selectively load only the ones you need. The result: agents can preview dozens or even hundreds of memories in the token cost of just a few full results.
How It Works
Progressive disclosure splits memory retrieval into two explicit steps:
Step 1memory_search_index / recallIndex()
Returns a lightweight index containing memory IDs, relevance scores, short snippets (~200 characters), token estimates, and memory type. Uses 5-10x fewer tokens than loading full content.
Step 2memory_get_batch / recallFull()
Loads the full content only for the specific memories you selected from the index. You choose exactly which memories are worth the token cost.
Token Savings
The progressive disclosure pattern dramatically reduces token usage compared to loading full content for every search result.
| Approach | Tokens | Details |
|---|---|---|
| memory_search | ~8,000 | 20 results with full content |
| memory_search_index | ~800 | 20 results, index only (IDs + snippets + scores) |
| memory_get_batch | ~2,000 | Load 5 relevant memories at full content |
| Progressive Total | ~2,800 | 65% savings vs. full search |
At scale
With progressive disclosure you can preview 50+ memories in roughly 500 tokens. The savings grow as your memory store grows — exactly when it matters most.
When to Use
Choose the right retrieval approach based on your scenario:
| Scenario | Approach | Why |
|---|---|---|
| Quick lookup (< 5 results) | memory_search / recall() | Simpler, one step |
| Large search / research | memory_search_index → memory_get_batch | Token efficient |
| Automatic budget management | smartRecall() | Handles both steps automatically |
| Store tool outputs | memory_observe / observe() | Auto-indexed for later retrieval |
| Session start | memory_session_context | Resume context |
Quick Example
import { RekallAgent } from '@rekall/agent-sdk';const agent = new RekallAgent({ apiKey: 'rk_...', agentId: 'agent_123' });// Step 1: Get lightweight indexconst index = await agent.recallIndex('project requirements');console.log(`Found ${index.index.length} memories using ${index.indexTokens} tokens`);// Step 2: Review snippets, select relevantconst relevant = index.index.filter(entry => entry.score > 0.7).map(entry => entry.id);// Step 3: Load full content for selected onlyconst memories = await agent.recallFull(relevant);// Or use smartRecall for automatic handlingconst auto = await agent.smartRecall('project requirements', {tokenBudget: 4000,progressive: true,});
Observations
The memory_observe / observe() method captures tool outputs, file reads, API responses, and other external data your agent encounters during a session.
These observations are automatically indexed and become searchable via progressive disclosure in future sessions. This is the key mechanism for building up a searchable knowledge base over time — every piece of information your agent encounters can be stored cheaply, then retrieved efficiently later through the two-step index-then-load pattern.
Building knowledge over time
An agent that consistently observes tool outputs across sessions builds a rich, searchable memory store. Combined with progressive disclosure, this means the agent can always find what it needs without paying the token cost of loading everything into context.
Token Budget
Token budgets control how much of the context window memory retrieval is allowed to consume.
- •Default budget: 4,000 tokens
- •Configurable via preferences: Set
rekall_token_budgetin user preferences to adjust globally - •Per-request control: The
tokenBudgetparameter on search controls how many index entries to return - •Automatic management:
smartRecall()automatically manages the budget across both the index and full-load steps
Budget strategy
Start with the default 4,000-token budget. If your agent frequently needs deeper context, increase it. If you're running on smaller models with limited context windows, reduce it. The progressive disclosure pattern means you're always getting the most relevant memories first.
MCP Prompts
MCP clients (Claude Desktop, Cursor, Windsurf, etc.) can request built-in prompts that are injected at session start. These give the agent context on how to use memory efficiently from the very first message.
rekall-guide
Complete guide for efficiently using Rekall memory tools. Covers all available tools, best practices, and common workflows.
infinite-context
How to use progressive disclosure for unlimited context. Explains the two-step retrieval pattern, token budgets, and when to use each approach.
memory-workflow
Optimal workflow for memory-intensive tasks. Covers session start, mid-session retrieval, observation patterns, and session end best practices.
Prompt injection
These prompts are designed to be injected into the system prompt or early in the conversation. They teach the agent how to use Rekall's memory tools effectively, including the progressive disclosure pattern, without requiring manual setup by the user.
Next Steps
Guide: Infinite Context
Practical tutorial for implementing progressive disclosure in your agent
MCP Server
Connect Rekall to Claude Desktop, Cursor, and other MCP clients
TypeScript SDK
Full reference for recallIndex, recallFull, smartRecall, and observe
Python SDK
Full reference for recall_index, recall_full, smart_recall, and observe
