April 2026 · 10 min read · Fran Olivares, Founder of OlivaresAI
Every AI conversation starts from zero. Your assistant forgets your name, your project, your preferences — every single time. This is the fundamental limitation of stateless AI, and it is the single biggest reason AI feels like a tool instead of a collaborator. This guide walks you through three concrete approaches to solving this problem, from zero-code setup to full API integration.
When you use ChatGPT, Claude, or any AI chat, context disappears when the conversation ends. You explain the same things over and over: your tech stack, your coding style, your project architecture, your preferences. This wastes time and produces worse results because the AI never builds a deep understanding of who you are or what you are working on.
Platform-native memory features (ChatGPT Memory, Claude Projects) help, but they are limited in capacity, locked to a single platform, and offer no developer API. If you are building an AI-powered product, you need an independent memory layer.
The Model Context Protocol (MCP) is the fastest path. If your AI runs in Claude Desktop, Cursor, Windsurf, Claude Code, or any MCP-compatible client, you can add persistent memory in under 5 minutes.
Step 1: Sign up at alma.olivares.ai and generate an API key in Settings.
Step 2: Add @olivaresai/alma-mcp to your MCP client config with your API key. For Claude Desktop, edit claude_desktop_config.json. For Cursor, use the MCP settings panel.
Step 3: Restart your client. The server exposes 35 tools: alma_remember (save a memory), alma_recall (search memories), alma_assemble (build context from all memory layers), alma_extract (extract facts from text), and more. Your AI can now read from and write to a persistent memory store that survives across every conversation.
MCP is ideal for personal workflows — Claude Desktop for general AI work, Cursor for coding, Claude Code for terminal-based development. One memory, everywhere.
The JavaScript SDK (@olivaresai/alma-sdk) gives you full programmatic control for custom applications. The core integration pattern has three steps:
client.context.assemble({ query }) to get a system prompt enriched with relevant memories, episodes, procedures, and soul blocks.client.memories.extract({ text }) to save new facts from the conversation. Or create memories directly with client.memories.create().The SDK wraps all 140+ API endpoints with full TypeScript types. Install with npm install @olivaresai/alma-sdk. It is ESM-only and requires Node.js 18+.
The REST API provides direct HTTP access from any language or platform. Key endpoints:
POST /api/v1/context/assemble — Build a context prompt from memories, episodes, procedures, and soul blocksPOST /api/v1/memories — Create a memory with content, category, importance, and confidenceGET /api/v1/memories/search?q=query&mode=hybrid — Hybrid semantic + keyword searchPOST /api/v1/memories/extract — LLM-powered extraction of facts from textPOST /api/v1/blocks — Configure Soul Engine blocks for AI identityAuthentication is via API key (X-API-Key header). Base URL: https://alma.olivares.ai/api/v1.
Alma's three-layer architecture separates knowledge into three types:
When you start a conversation, context assembly searches all three layers using hybrid search, scores results by relevance (50%), importance (15%), confidence (15%), recency (10%), and frequency (10%), then injects the top-ranked context into the system prompt — all in under 100ms.
Memories are automatically extracted from conversations every 4 messages. The extractor identifies 0-30 facts per conversation using Claude Haiku. Duplicates are detected via Jaccard similarity (60% threshold) and merged. Stale memories with low importance expire after 120 days of inactivity.
Memory alone gives your AI facts. The Soul Engine gives it identity. Configure structured blocks — personality, expertise, communication style, rules, and context — that persist across every conversation. Unlike a single system prompt that gets diluted in long conversations, Soul Engine blocks are versioned, organized, and always injected with priority.
Environments let you isolate memory contexts. Keep work, personal, and client-specific memories completely separate. Each environment has its own memories, episodes, procedures, and soul blocks. The AI switches personality and knowledge when you switch environments.
Sign up free at alma.olivares.ai. The free plan includes 500 memories, 1 environment, and full chat access. All integration methods — MCP, SDK, API — work on every plan. No credit card required.
For more depth: AI Memory Management: Complete Guide 2026 · Building AI Assistants That Remember Everything · Persistent Memory vs RAG