Infinite Context

Rekall's progressive disclosure pattern that lets agents access unlimited memory without exhausting their context window.

LLM context windows are finite, but memory stores can be massive. A coding agent might accumulate thousands of memories across sessions — project decisions, file contents, debugging traces, user preferences. Traditional memory search returns the full content of every result, burning through tokens quickly and leaving little room for the actual task.

Infinite Context solves this with a two-step retrieval pattern called Progressive Disclosure. Instead of loading everything at once, you first retrieve a lightweight index of matching memories, then selectively load only the ones you need. The result: agents can preview dozens or even hundreds of memories in the token cost of just a few full results.

How It Works

Progressive disclosure splits memory retrieval into two explicit steps:

Step 1memory_search_index / recallIndex()

Returns a lightweight index containing memory IDs, relevance scores, short snippets (~200 characters), token estimates, and memory type. Uses 5-10x fewer tokens than loading full content.

Step 2memory_get_batch / recallFull()

Loads the full content only for the specific memories you selected from the index. You choose exactly which memories are worth the token cost.

searchSearch
listIndex (lightweight)
filter_altSelect
downloadLoad (full content)

Token Savings

The progressive disclosure pattern dramatically reduces token usage compared to loading full content for every search result.

ApproachTokensDetails
memory_search~8,00020 results with full content
memory_search_index~80020 results, index only (IDs + snippets + scores)
memory_get_batch~2,000Load 5 relevant memories at full content
Progressive Total~2,80065% savings vs. full search

At scale

With progressive disclosure you can preview 50+ memories in roughly 500 tokens. The savings grow as your memory store grows — exactly when it matters most.

When to Use

Choose the right retrieval approach based on your scenario:

ScenarioApproachWhy
Quick lookup (< 5 results)memory_search / recall()Simpler, one step
Large search / researchmemory_search_index → memory_get_batchToken efficient
Automatic budget managementsmartRecall()Handles both steps automatically
Store tool outputsmemory_observe / observe()Auto-indexed for later retrieval
Session startmemory_session_contextResume context

Quick Example

Progressive disclosure with Rekall
import { RekallAgent } from '@rekall/agent-sdk';
const agent = new RekallAgent({ apiKey: 'rk_...', agentId: 'agent_123' });
// Step 1: Get lightweight index
const index = await agent.recallIndex('project requirements');
console.log(`Found ${index.index.length} memories using ${index.indexTokens} tokens`);
// Step 2: Review snippets, select relevant
const relevant = index.index
.filter(entry => entry.score > 0.7)
.map(entry => entry.id);
// Step 3: Load full content for selected only
const memories = await agent.recallFull(relevant);
// Or use smartRecall for automatic handling
const auto = await agent.smartRecall('project requirements', {
tokenBudget: 4000,
progressive: true,
});

Observations

The memory_observe / observe() method captures tool outputs, file reads, API responses, and other external data your agent encounters during a session.

These observations are automatically indexed and become searchable via progressive disclosure in future sessions. This is the key mechanism for building up a searchable knowledge base over time — every piece of information your agent encounters can be stored cheaply, then retrieved efficiently later through the two-step index-then-load pattern.

Building knowledge over time

An agent that consistently observes tool outputs across sessions builds a rich, searchable memory store. Combined with progressive disclosure, this means the agent can always find what it needs without paying the token cost of loading everything into context.

Token Budget

Token budgets control how much of the context window memory retrieval is allowed to consume.

  • Default budget: 4,000 tokens
  • Configurable via preferences: Set rekall_token_budget in user preferences to adjust globally
  • Per-request control: The tokenBudget parameter on search controls how many index entries to return
  • Automatic management: smartRecall() automatically manages the budget across both the index and full-load steps

Budget strategy

Start with the default 4,000-token budget. If your agent frequently needs deeper context, increase it. If you're running on smaller models with limited context windows, reduce it. The progressive disclosure pattern means you're always getting the most relevant memories first.

MCP Prompts

MCP clients (Claude Desktop, Cursor, Windsurf, etc.) can request built-in prompts that are injected at session start. These give the agent context on how to use memory efficiently from the very first message.

rekall-guide

Complete guide for efficiently using Rekall memory tools. Covers all available tools, best practices, and common workflows.

infinite-context

How to use progressive disclosure for unlimited context. Explains the two-step retrieval pattern, token budgets, and when to use each approach.

memory-workflow

Optimal workflow for memory-intensive tasks. Covers session start, mid-session retrieval, observation patterns, and session end best practices.

Prompt injection

These prompts are designed to be injected into the system prompt or early in the conversation. They teach the agent how to use Rekall's memory tools effectively, including the progressive disclosure pattern, without requiring manual setup by the user.

Next Steps

Rekall
rekall