About

Lumetra builds memory infrastructure for AI agents. Our product, Engram, is a memory layer that lets agents remember what users told them, recall it accurately across sessions and clients, and show their work on every recall.

What Engram does: Engram ingests conversation, extracts atomic facts and relationships, and stores them where they can be retrieved by meaning, not just by keyword. Retrieval fuses three signals: BM25 keyword search, semantic vector search, and traversal of an automatically maintained knowledge graph. Recall doesn't fail when a question is rephrased, when context shifts to a new session, or when the answer depends on an implicit connection between memories. Every recall response includes an explanation: the stored memories that informed the answer, the graph edges that connected them, and the canonical bucket profile (people, events, preferences, facts) that shaped the synthesis. If a recall is wrong, you can see why.

Benchmarks: Engram scored 91.6% on LongMemEval-S (458/500), the public long-context memory benchmark from Wu et al. Subscores include 93.2% on temporal reasoning and 83.5% on multi-session, where most published baselines sit near 50%. Methodology and per-question results are published at lumetra.io/engram-on-longmemeval. The benchmark is fully reproducible from the writeup at lumetra.io/reproducing-the-91-percent.

Bring your own model: Engram is bring-your-own-model by default. Customers configure the LLM that handles extraction and synthesis (OpenAI, Anthropic, Groq, or any OpenAI-compatible endpoint) by adding a provider API key in their Lumetra portal. Inference lands on the customer's provider account directly, so there are no token markups, no inference lock-in, and two clean invoices instead of one with markup baked in. This is not BYOK at parity with vendor inference. There is no vendor inference. Lumetra meters only on memories stored and retrievals served.

Integration paths: Engram offers three ways to plug in. The MCP server at mcp.lumetra.io/mcp/sse supports OAuth 2.1 Dynamic Client Registration for Claude.ai web custom connectors and bearer-token auth for every other major MCP client (Claude Desktop, Claude Code, Cursor, Windsurf, Codex, ChatGPT, OpenClaw). The REST API at api.lumetra.io/v1 provides standard endpoints for ingest, query, memory management, and usage stats. Official SDKs are available in TypeScript (@lumetra/engram on npm), Python
 (lumetra-engram on PyPI), and Go, along with a first-party Vercel AI SDK integration and a Claude Code plugin at github.com/lumetra-io/engram-claude-plugin. The same memory bucket follows users across every connected client. A fact stored from a Cursor session is recallable inside Claude.ai, ChatGPT, or any custom agent built on the SDKs.

Pricing: Engram is priced on memories stored and retrievals served, never on inference tokens. The Free tier includes 10K memories and 50K retrievals per month, no card required. Indie at $29/month raises that to 100K memories and 500K retrievals. Team at $99/month covers 1M memories and 5M retrievals. Enterprise is custom and includes SSO, on-prem options, and per-component model routing. Bring-your-own-model applies on every public tier, including Free.

Trust and security: Engram never trains on customer data. Customer-provided model keys are encrypted at rest with AES-256-GCM, per-row versioned, and only accessed through a single routing module that does not log request bodies. Lumetra publishes its v44 composer prompt under an MIT license and documents its retrieval pipeline openly.

Company: Lumetra was founded in 2025 by Ben Meyerson and Jacob Davis, previously on the AWS IoT team at Amazon Web Services. The company is headquartered in Seattle, WA. Engram is its first product. Lumetra ran a private design-partner program through 2025 and opened public signups in early 2026. Engram v3, the general-availability release, shipped in May 2026 with Claude.ai web support, official SDKs, BYOK on every public tier, and a free tier.