Using Infinite Context
A practical guide to progressive disclosure -- Rekall's pattern for accessing unlimited memory within finite context windows.
Prerequisites
This guide assumes you understand the concepts behind progressive disclosure. If you haven't already, read the Infinite Context concepts page first. You'll also need the Rekall SDK installed and an API key configured.
Basic Pattern
The core of infinite context is a two-step process: first retrieve a lightweight index of matching memories, then selectively load only the ones you need. This keeps token usage predictable and minimal.
// Step 1: Get lightweight index (uses ~800 tokens for 20 results)const index = await agent.recallIndex('deployment config');// Step 2: Load only what you need (uses ~400 tokens per memory)const topIds = index.index.slice(0, 5).map(e => e.id);const memories = await agent.recallFull(topIds);
The index response includes a snippet and relevance score for each result, so you can make informed decisions about which memories to load in full. This is the fundamental pattern that all other techniques in this guide build on.
Filtering by Score
Each entry in the index includes a relevance score between 0 and 1. Use this to filter out low-relevance results before loading full content, saving tokens for the memories that actually matter.
const index = await agent.recallIndex('user preferences');// Only load memories with high relevanceconst relevant = index.index.filter(e => e.score > 0.7);console.log(`${relevant.length} of ${index.index.length} memories are highly relevant`);const memories = await agent.recallFull(relevant.map(e => e.id));
Score thresholds
A score above 0.7 generally indicates a strong match. For broader searches, lower to 0.5. For precision lookups, raise to 0.85.
Filtering by Type
You can scope your search to specific memory types at query time, or filter the index results after the fact. This is useful when you know you only need workflows (procedural) or long-term knowledge (ltm).
const index = await agent.recallIndex('project setup', {types: ['procedural', 'ltm'], // Only workflows and long-term knowledge});// Can also filter after the factconst procedures = index.index.filter(e => e.type === 'procedural');
Using smartRecall
If you want the SDK to handle the index-select-load cycle automatically, use smartRecall. It fetches the index, picks the best results that fit within your token budget, and returns full content -- all in one call.
// Automatically handles index → select → load within budgetconst memories = await agent.smartRecall('user preferences', {tokenBudget: 4000,progressive: true,});// Returns full content for as many memories as fit in 4000 tokens
smartRecall vs manual recall
Use smartRecall when you want convenience and don't need fine-grained control over which memories get loaded. Use the manual recallIndex + recallFull pattern when you need to inspect snippets, apply custom filtering logic, or manage token budgets across multiple queries.
Capturing Observations
The observe() method captures tool outputs and other data as memories. These observations become searchable via recallIndex in future sessions, building up a knowledge base over time.
// Capture a file readconst content = await fs.readFile('config.json', 'utf8');await agent.observe(content, {toolName: 'file_read',tags: ['config', 'json', 'deployment'],type: 'file_read',});// Capture an API responseconst response = await fetch('https://api.example.com/status');const data = await response.json();await agent.observe(JSON.stringify(data), {toolName: 'api_call',tags: ['status', 'monitoring'],type: 'api_response',});// These are now searchable via recallIndex in future sessionsconst index = await agent.recallIndex('deployment config');
Tag strategically
Good tags make observations easier to find later. Use consistent naming conventions across your codebase -- for example, always tag config files with config and API responses with api.
Token Budget Management
Each index entry includes a fullTokens estimate, telling you how many tokens the full content will use. Combine this with the index-level indexTokens count to stay within your budget.
// Check token estimates before loadingconst index = await agent.recallIndex('research notes', { tokenBudget: 2000 });let tokensUsed = index.indexTokens;const budgetRemaining = 4000 - tokensUsed;// Select memories that fit within remaining budgetconst toLoad: string[] = [];for (const entry of index.index) {if (tokensUsed + entry.fullTokens <= 4000) {toLoad.push(entry.id);tokensUsed += entry.fullTokens;}}const memories = await agent.recallFull(toLoad);
Budget overflows
Token estimates are approximate. Leave a 10-15% buffer in your budget to account for variance between estimated and actual token counts.
MCP Usage
If you're using Rekall through an MCP client like Claude Desktop or Cursor, the same two-step pattern applies -- but you call MCP tools directly instead of using the SDK.
// In Claude Desktop or Cursor, the agent calls MCP tools directly:// Step 1: Search indexmemory_search_index(query: "project architecture", tokenBudget: 2000)// Returns: { index: [...], indexTokens: 423, totalFullTokens: 12500 }// Step 2: Load selected memoriesmemory_get_batch(ids: ["mem_abc", "mem_def"])// Returns: [{ id: "mem_abc", content: "...", ... }, ...]
The MCP server also exposes prompts that guide agents through the progressive disclosure workflow:
- •
rekall-guide-- General usage instructions for the Rekall MCP server - •
infinite-context-- Step-by-step instructions for the progressive disclosure pattern - •
memory-workflow-- End-to-end workflow for session memory management
Session Workflow
Here is a complete workflow showing how infinite context fits into a typical agent session -- from startup through research and learning.
// At session startconst context = await agent.getSessionContext();// During work — capture important outputsawait agent.observe(fileContent, { toolName: 'file_read' });await agent.observe(apiResponse, { toolName: 'api_call' });// When you need to researchconst index = await agent.recallIndex('relevant topic');const memories = await agent.recallFull(index.index.filter(e => e.score > 0.6).map(e => e.id));// Store key learningsawait agent.learn('Important finding: ...', {metadata: { source: 'research', topic: 'architecture' }});
Session lifecycle
Start each session by loading context, capture observations as you work, recall when you need prior knowledge, and store learnings at the end. This cycle builds a growing, searchable knowledge base across sessions.
Best Practices
Follow these guidelines to get the most out of progressive disclosure and keep your token usage efficient.
- •Always review snippets before loading full content -- The index includes enough context to decide whether a memory is worth the token cost
- •Use token estimates to stay within budget -- Check
fullTokenson each entry andindexTokenson the response - •Load in batches rather than one-by-one -- Use
recallFullwith an array of IDs instead of making separate calls for each memory - •Combine with memory_timeline for chronological context -- When you need to understand the order of events, use timeline queries alongside recall
- •Use observations to build searchable knowledge over time -- Every tool output you capture with
observe()becomes a future search result - •Set appropriate token budgets per use case -- Research tasks benefit from larger budgets (4000+), while quick lookups need only 1000 tokens
Next Steps
- •Infinite Context Concepts -- Deep dive into the theory behind progressive disclosure
- •MCP Server -- Set up and configure the Rekall MCP server
- •SDK Reference -- Full documentation for the Rekall agent SDK
- •Memories API Reference -- Full endpoint documentation
