How to Give AI Persistent Memory

April 2026 · 10 min read · Fran Olivares, Founder of OlivaresAI

Every AI conversation starts from zero. Your assistant forgets your name, your project, your preferences — every single time. This is the fundamental limitation of stateless AI, and it is the single biggest reason AI feels like a tool instead of a collaborator. This guide walks you through three concrete approaches to solving this problem, from zero-code setup to full API integration.

The Problem: Stateless AI

When you use ChatGPT, Claude, or any AI chat, context disappears when the conversation ends. You explain the same things over and over: your tech stack, your coding style, your project architecture, your preferences. This wastes time and produces worse results because the AI never builds a deep understanding of who you are or what you are working on.

Platform-native memory features (ChatGPT Memory, Claude Projects) help, but they are limited in capacity, locked to a single platform, and offer no developer API. If you are building an AI-powered product, you need an independent memory layer.

Option 1: MCP Server (No Code Required)

The Model Context Protocol (MCP) is the fastest path. If your AI runs in Claude Desktop, Cursor, Windsurf, Claude Code, or any MCP-compatible client, you can add persistent memory in under 5 minutes.

Step 1: Sign up at alma.olivares.ai and generate an API key in Settings.

Step 2: Add @olivaresai/alma-mcp to your MCP client config with your API key. For Claude Desktop, edit claude_desktop_config.json. For Cursor, use the MCP settings panel.

Step 3: Restart your client. The server exposes 35 tools: alma_remember (save a memory), alma_recall (search memories), alma_assemble (build context from all memory layers), alma_extract (extract facts from text), and more. Your AI can now read from and write to a persistent memory store that survives across every conversation.

MCP is ideal for personal workflows — Claude Desktop for general AI work, Cursor for coding, Claude Code for terminal-based development. One memory, everywhere.

Option 2: JavaScript SDK (Programmatic Control)

The JavaScript SDK (@olivaresai/alma-sdk) gives you full programmatic control for custom applications. The core integration pattern has three steps:

Before the LLM call: Call client.context.assemble({ query }) to get a system prompt enriched with relevant memories, episodes, procedures, and soul blocks.
Pass to any LLM: The assembled context is a plain string. Pass it as the system prompt to Anthropic, OpenAI, Gemini, or any model. Your memory layer is model-agnostic.
After the LLM call: Call client.memories.extract({ text }) to save new facts from the conversation. Or create memories directly with client.memories.create().

The SDK wraps all 140+ API endpoints with full TypeScript types. Install with npm install @olivaresai/alma-sdk. It is ESM-only and requires Node.js 18+.

Option 3: REST API (Any Language)

The REST API provides direct HTTP access from any language or platform. Key endpoints:

POST /api/v1/context/assemble — Build a context prompt from memories, episodes, procedures, and soul blocks
POST /api/v1/memories — Create a memory with content, category, importance, and confidence
GET /api/v1/memories/search?q=query&mode=hybrid — Hybrid semantic + keyword search
POST /api/v1/memories/extract — LLM-powered extraction of facts from text
POST /api/v1/blocks — Configure Soul Engine blocks for AI identity

Authentication is via API key (X-API-Key header). Base URL: https://alma.olivares.ai/api/v1.

How Memory Works Under the Hood

Alma's three-layer architecture separates knowledge into three types:

Memories — Discrete facts and preferences, semantically indexed with vector embeddings. Each has importance, confidence, category, and source metadata.
Episodes — Compressed conversation summaries. What was discussed, decided, and learned.
Procedures — Learned step-by-step workflows and behavioral patterns.

When you start a conversation, context assembly searches all three layers using hybrid search, scores results by relevance (50%), importance (15%), confidence (15%), recency (10%), and frequency (10%), then injects the top-ranked context into the system prompt — all in under 100ms.

Memories are automatically extracted from conversations every 4 messages. The extractor identifies 0-30 facts per conversation using Claude Haiku. Duplicates are detected via Jaccard similarity (60% threshold) and merged. Stale memories with low importance expire after 120 days of inactivity.

Adding AI Identity with the Soul Engine

Memory alone gives your AI facts. The Soul Engine gives it identity. Configure structured blocks — personality, expertise, communication style, rules, and context — that persist across every conversation. Unlike a single system prompt that gets diluted in long conversations, Soul Engine blocks are versioned, organized, and always injected with priority.

Environments: Separate Contexts

Environments let you isolate memory contexts. Keep work, personal, and client-specific memories completely separate. Each environment has its own memories, episodes, procedures, and soul blocks. The AI switches personality and knowledge when you switch environments.

Get Started

Sign up free at alma.olivares.ai. The free plan includes 500 memories, 1 environment, and full chat access. All integration methods — MCP, SDK, API — work on every plan. No credit card required.

For more depth: AI Memory Management: Complete Guide 2026 · Building AI Assistants That Remember Everything · Persistent Memory vs RAG

Get Started Free