Using Infinite Context

A practical guide to progressive disclosure -- Rekall's pattern for accessing unlimited memory within finite context windows.

Prerequisites

This guide assumes you understand the concepts behind progressive disclosure. If you haven't already, read the Infinite Context concepts page first. You'll also need the Rekall SDK installed and an API key configured.

Basic Pattern

The core of infinite context is a two-step process: first retrieve a lightweight index of matching memories, then selectively load only the ones you need. This keeps token usage predictable and minimal.

The two-step recall pattern

// Step 1: Get lightweight index (uses ~800 tokens for 20 results)
const index = await agent.recallIndex('deployment config');

// Step 2: Load only what you need (uses ~400 tokens per memory)
const topIds = index.index.slice(0, 5).map(e => e.id);
const memories = await agent.recallFull(topIds);

The index response includes a snippet and relevance score for each result, so you can make informed decisions about which memories to load in full. This is the fundamental pattern that all other techniques in this guide build on.

Filtering by Score

Each entry in the index includes a relevance score between 0 and 1. Use this to filter out low-relevance results before loading full content, saving tokens for the memories that actually matter.

Filter index entries by relevance score

const index = await agent.recallIndex('user preferences');

// Only load memories with high relevance
const relevant = index.index.filter(e => e.score > 0.7);
console.log(`${relevant.length} of ${index.index.length} memories are highly relevant`);

const memories = await agent.recallFull(relevant.map(e => e.id));

Score thresholds

A score above 0.7 generally indicates a strong match. For broader searches, lower to 0.5. For precision lookups, raise to 0.85.

Filtering by Type

You can scope your search to specific memory types at query time, or filter the index results after the fact. This is useful when you know you only need workflows (procedural) or long-term knowledge (ltm).

Filter by memory type

const index = await agent.recallIndex('project setup', {
  types: ['procedural', 'ltm'],  // Only workflows and long-term knowledge
});

// Can also filter after the fact
const procedures = index.index.filter(e => e.type === 'procedural');

Using smartRecall

If you want the SDK to handle the index-select-load cycle automatically, use smartRecall. It fetches the index, picks the best results that fit within your token budget, and returns full content -- all in one call.

Automatic progressive disclosure with smartRecall

// Automatically handles index → select → load within budget
const memories = await agent.smartRecall('user preferences', {
  tokenBudget: 4000,
  progressive: true,
});
// Returns full content for as many memories as fit in 4000 tokens

smartRecall vs manual recall

Use smartRecall when you want convenience and don't need fine-grained control over which memories get loaded. Use the manual recallIndex + recallFull pattern when you need to inspect snippets, apply custom filtering logic, or manage token budgets across multiple queries.

Capturing Observations

The observe() method captures tool outputs and other data as memories. These observations become searchable via recallIndex in future sessions, building up a knowledge base over time.

Capture tool outputs as observations

// Capture a file read
const content = await fs.readFile('config.json', 'utf8');
await agent.observe(content, {
  toolName: 'file_read',
  tags: ['config', 'json', 'deployment'],
  type: 'file_read',
});

// Capture an API response
const response = await fetch('https://api.example.com/status');
const data = await response.json();
await agent.observe(JSON.stringify(data), {
  toolName: 'api_call',
  tags: ['status', 'monitoring'],
  type: 'api_response',
});

// These are now searchable via recallIndex in future sessions
const index = await agent.recallIndex('deployment config');

Tag strategically

Good tags make observations easier to find later. Use consistent naming conventions across your codebase -- for example, always tag config files with config and API responses with api.

Token Budget Management

Each index entry includes a fullTokens estimate, telling you how many tokens the full content will use. Combine this with the index-level indexTokens count to stay within your budget.

Budget-aware memory loading

// Check token estimates before loading
const index = await agent.recallIndex('research notes', { tokenBudget: 2000 });

let tokensUsed = index.indexTokens;
const budgetRemaining = 4000 - tokensUsed;

// Select memories that fit within remaining budget
const toLoad: string[] = [];
for (const entry of index.index) {
  if (tokensUsed + entry.fullTokens <= 4000) {
    toLoad.push(entry.id);
    tokensUsed += entry.fullTokens;
  }
}

const memories = await agent.recallFull(toLoad);

Budget overflows

Token estimates are approximate. Leave a 10-15% buffer in your budget to account for variance between estimated and actual token counts.

MCP Usage

If you're using Rekall through an MCP client like Claude Desktop or Cursor, the same two-step pattern applies -- but you call MCP tools directly instead of using the SDK.

Progressive disclosure via MCP tool calls

// In Claude Desktop or Cursor, the agent calls MCP tools directly:

// Step 1: Search index
memory_search_index(query: "project architecture", tokenBudget: 2000)
// Returns: { index: [...], indexTokens: 423, totalFullTokens: 12500 }

// Step 2: Load selected memories
memory_get_batch(ids: ["mem_abc", "mem_def"])
// Returns: [{ id: "mem_abc", content: "...", ... }, ...]

The MCP server also exposes prompts that guide agents through the progressive disclosure workflow:

•rekall-guide -- General usage instructions for the Rekall MCP server
•infinite-context -- Step-by-step instructions for the progressive disclosure pattern
•memory-workflow -- End-to-end workflow for session memory management

Session Workflow

Here is a complete workflow showing how infinite context fits into a typical agent session -- from startup through research and learning.

Complete session workflow

// At session start
const context = await agent.getSessionContext();

// During work — capture important outputs
await agent.observe(fileContent, { toolName: 'file_read' });
await agent.observe(apiResponse, { toolName: 'api_call' });

// When you need to research
const index = await agent.recallIndex('relevant topic');
const memories = await agent.recallFull(
  index.index.filter(e => e.score > 0.6).map(e => e.id)
);

// Store key learnings
await agent.learn('Important finding: ...', {
  metadata: { source: 'research', topic: 'architecture' }
});

Session lifecycle

Start each session by loading context, capture observations as you work, recall when you need prior knowledge, and store learnings at the end. This cycle builds a growing, searchable knowledge base across sessions.

Best Practices

Follow these guidelines to get the most out of progressive disclosure and keep your token usage efficient.

•Always review snippets before loading full content -- The index includes enough context to decide whether a memory is worth the token cost
•Use token estimates to stay within budget -- Check fullTokens on each entry and indexTokens on the response
•Load in batches rather than one-by-one -- Use recallFull with an array of IDs instead of making separate calls for each memory
•Combine with memory_timeline for chronological context -- When you need to understand the order of events, use timeline queries alongside recall
•Use observations to build searchable knowledge over time -- Every tool output you capture with observe() becomes a future search result
•Set appropriate token budgets per use case -- Research tasks benefit from larger budgets (4000+), while quick lookups need only 1000 tokens

Next Steps

•Infinite Context Concepts -- Deep dive into the theory behind progressive disclosure
•MCP Server -- Set up and configure the Rekall MCP server
•SDK Reference -- Full documentation for the Rekall agent SDK
•Memories API Reference -- Full endpoint documentation