Infinite Context

Rekall's progressive disclosure pattern that lets agents access unlimited memory without exhausting their context window.

LLM context windows are finite, but memory stores can be massive. A coding agent might accumulate thousands of memories across sessions — project decisions, file contents, debugging traces, user preferences. Traditional memory search returns the full content of every result, burning through tokens quickly and leaving little room for the actual task.

Infinite Context solves this with a two-step retrieval pattern called Progressive Disclosure. Instead of loading everything at once, you first retrieve a lightweight index of matching memories, then selectively load only the ones you need. The result: agents can preview dozens or even hundreds of memories in the token cost of just a few full results.

How It Works

Progressive disclosure splits memory retrieval into two explicit steps:

Step 1`memory_search_index` / `recallIndex()`

Returns a lightweight index containing memory IDs, relevance scores, short snippets (~200 characters), token estimates, and memory type. Uses 5-10x fewer tokens than loading full content.

Step 2`memory_get_batch` / `recallFull()`

Loads the full content only for the specific memories you selected from the index. You choose exactly which memories are worth the token cost.

searchSearch

→

listIndex (lightweight)

→

filter_altSelect

→

downloadLoad (full content)

Token Savings

The progressive disclosure pattern dramatically reduces token usage compared to loading full content for every search result.

Approach	Tokens	Details
memory_search	~8,000	20 results with full content
memory_search_index	~800	20 results, index only (IDs + snippets + scores)
memory_get_batch	~2,000	Load 5 relevant memories at full content
Progressive Total	~2,800	65% savings vs. full search

At scale

With progressive disclosure you can preview 50+ memories in roughly 500 tokens. The savings grow as your memory store grows — exactly when it matters most.

When to Use

Choose the right retrieval approach based on your scenario:

Scenario	Approach	Why
Quick lookup (< 5 results)	memory_search / recall()	Simpler, one step
Large search / research	memory_search_index → memory_get_batch	Token efficient
Automatic budget management	smartRecall()	Handles both steps automatically
Store tool outputs	memory_observe / observe()	Auto-indexed for later retrieval
Session start	memory_session_context	Resume context

Quick Example

Progressive disclosure with Rekall

import { RekallAgent } from '@rekall/agent-sdk';

const agent = new RekallAgent({ apiKey: 'rk_...', agentId: 'agent_123' });

// Step 1: Get lightweight index
const index = await agent.recallIndex('project requirements');
console.log(`Found ${index.index.length} memories using ${index.indexTokens} tokens`);

// Step 2: Review snippets, select relevant
const relevant = index.index
  .filter(entry => entry.score > 0.7)
  .map(entry => entry.id);

// Step 3: Load full content for selected only
const memories = await agent.recallFull(relevant);

// Or use smartRecall for automatic handling
const auto = await agent.smartRecall('project requirements', {
  tokenBudget: 4000,
  progressive: true,
});

Observations

The memory_observe / observe() method captures tool outputs, file reads, API responses, and other external data your agent encounters during a session.

These observations are automatically indexed and become searchable via progressive disclosure in future sessions. This is the key mechanism for building up a searchable knowledge base over time — every piece of information your agent encounters can be stored cheaply, then retrieved efficiently later through the two-step index-then-load pattern.

Building knowledge over time

An agent that consistently observes tool outputs across sessions builds a rich, searchable memory store. Combined with progressive disclosure, this means the agent can always find what it needs without paying the token cost of loading everything into context.

Token Budget

Token budgets control how much of the context window memory retrieval is allowed to consume.

•Default budget: 4,000 tokens
•Configurable via preferences: Set rekall_token_budget in user preferences to adjust globally
•Per-request control: The tokenBudget parameter on search controls how many index entries to return
•Automatic management: smartRecall() automatically manages the budget across both the index and full-load steps

Budget strategy

Start with the default 4,000-token budget. If your agent frequently needs deeper context, increase it. If you're running on smaller models with limited context windows, reduce it. The progressive disclosure pattern means you're always getting the most relevant memories first.

MCP Prompts

MCP clients (Claude Desktop, Cursor, Windsurf, etc.) can request built-in prompts that are injected at session start. These give the agent context on how to use memory efficiently from the very first message.

`rekall-guide`

Complete guide for efficiently using Rekall memory tools. Covers all available tools, best practices, and common workflows.

`infinite-context`

How to use progressive disclosure for unlimited context. Explains the two-step retrieval pattern, token budgets, and when to use each approach.

`memory-workflow`

Optimal workflow for memory-intensive tasks. Covers session start, mid-session retrieval, observation patterns, and session end best practices.

Prompt injection

These prompts are designed to be injected into the system prompt or early in the conversation. They teach the agent how to use Rekall's memory tools effectively, including the progressive disclosure pattern, without requiring manual setup by the user.

Next Steps

menu_book

Guide: Infinite Context

Practical tutorial for implementing progressive disclosure in your agent

hub

MCP Server

Connect Rekall to Claude Desktop, Cursor, and other MCP clients

code

TypeScript SDK

Full reference for recallIndex, recallFull, smartRecall, and observe

code

Python SDK

Full reference for recall_index, recall_full, smart_recall, and observe