audrey 0.16.0 → 0.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,68 +1,81 @@
1
1
  # Audrey
2
2
 
3
- Biological memory architecture for AI agents. Memory that decays, consolidates, feels, and learns — not just a database.
3
+ [![CI](https://github.com/Evilander/Audrey/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/Evilander/Audrey/actions/workflows/ci.yml)
4
+ [![npm version](https://img.shields.io/npm/v/audrey.svg)](https://www.npmjs.com/package/audrey)
5
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
4
6
 
5
- ## Why Audrey Exists
7
+ Persistent memory for Claude Code and AI agents. Two commands, every session remembers.
6
8
 
7
- Every AI memory tool today (Mem0, Zep, LangChain Memory) is a filing cabinet. Store stuff, retrieve stuff. None of them do what biological memory actually does:
9
+ ```bash
10
+ npx audrey install # 13 MCP memory tools
11
+ npx audrey hooks install # automatic memory in every session
12
+ ```
13
+
14
+ That's it. Claude Code now wakes up knowing what happened yesterday, recalls relevant context per-prompt, and consolidates learnings when the session ends. No cloud, no config files, no infrastructure — one SQLite file.
15
+
16
+ Audrey also works as a standalone SDK, MCP server, and REST API for any AI agent framework.
17
+
18
+ > **On `/dream`** — Anthropic recently shipped `/dream` for Claude Code memory maintenance. Audrey predates it and goes further: episodic-to-semantic consolidation, contradiction detection, confidence decay, emotional affect, causal reasoning, and source reliability weighting. `/dream` is a maintenance pass. Audrey is a cognitive memory architecture.
19
+
20
+ ## Why Audrey
21
+
22
+ Most AI memory tools are storage wrappers. They save facts, retrieve facts, and keep everything forever. That leaves real production problems unsolved:
8
23
 
9
- - Memories don't decay. A fact from 6 months ago has the same weight as one from today.
10
- - No consolidation. Raw events never become general principles.
11
- - No contradiction detection. Conflicting facts coexist silently.
12
- - No self-defense. If an agent hallucinates and encodes the hallucination, it becomes "truth."
24
+ - Old information stays weighted like new information.
25
+ - Raw events never become reusable operating knowledge.
26
+ - Conflicting facts quietly coexist.
27
+ - Model-generated mistakes can get reinforced into false "truth."
13
28
 
14
- Audrey fixes all of this by modeling memory the way the brain does:
29
+ Audrey models memory as a working system instead of a filing cabinet.
15
30
 
16
31
  | Brain Structure | Audrey Component | What It Does |
17
32
  |---|---|---|
18
33
  | Hippocampus | Episodic Memory | Fast capture of raw events and observations |
19
34
  | Neocortex | Semantic Memory | Consolidated principles and patterns |
20
35
  | Cerebellum | Procedural Memory | Learned workflows and conditional behaviors |
21
- | Sleep Replay | Dream Cycle | Consolidates episodes into principles, applies decay |
22
- | Prefrontal Cortex | Validation Engine | Truth-checking, contradiction detection |
23
- | Amygdala | Affect System | Emotional encoding, arousal-salience coupling, mood-congruent recall |
36
+ | Sleep Replay | Dream Cycle | Consolidates episodes into principles and applies decay |
37
+ | Prefrontal Cortex | Validation Engine | Truth-checking and contradiction detection |
38
+ | Amygdala | Affect System | Emotional encoding, arousal-salience coupling, and mood-congruent recall |
39
+
40
+ ## What You Get
41
+
42
+ - Local SQLite-backed memory with `sqlite-vec`
43
+ - MCP server for Claude Code with 13 memory tools
44
+ - **Claude Code hooks integration** — automatic memory in every session (`npx audrey hooks install`)
45
+ - JavaScript SDK for direct application use
46
+ - **Git-friendly versioning** via JSON snapshots (`npx audrey snapshot` / `restore`)
47
+ - **REST API server** - any language, any framework (`npx audrey serve`)
48
+ - Health checks via `npx audrey status --json`
49
+ - Benchmark harness with retrieval and lifecycle-operation tracks via `npm run bench:memory`
50
+ - Regression gate for benchmark quality via `npm run bench:memory:check`
51
+ - Optional local embeddings and optional hosted LLM providers
52
+ - Strongest production fit today in financial services ops and healthcare ops
24
53
 
25
54
  ## Install
26
55
 
27
- ### MCP Server for Claude Code (one command)
56
+ ### MCP Server for Claude Code
28
57
 
29
58
  ```bash
30
- npx audrey install
59
+ npx audrey install # Register 13 MCP memory tools
60
+ npx audrey hooks install # Wire automatic memory into session lifecycle
31
61
  ```
32
62
 
33
- That's it. Audrey auto-detects API keys from your environment:
34
-
35
- - `GOOGLE_API_KEY` or `GEMINI_API_KEY` set? Uses Gemini embeddings (3072d).
36
- - Neither? Runs with local embeddings (384d, MiniLM via @huggingface/transformers — zero API key, works offline).
37
- - `AUDREY_EMBEDDING_PROVIDER=openai` for explicit OpenAI embeddings (1536d).
38
- - `ANTHROPIC_API_KEY` set? Enables LLM-powered consolidation, contradiction detection, and reflection.
39
-
40
- ```bash
41
- # Check status
42
- npx audrey status
43
-
44
- # Uninstall
45
- npx audrey uninstall
46
- ```
63
+ Audrey auto-detects providers from your environment:
47
64
 
48
- Every Claude Code session now has 13 memory tools: `memory_encode`, `memory_recall`, `memory_consolidate`, `memory_dream`, `memory_introspect`, `memory_resolve_truth`, `memory_export`, `memory_import`, `memory_forget`, `memory_decay`, `memory_status`, `memory_reflect`, `memory_greeting`.
65
+ - `GOOGLE_API_KEY` or `GEMINI_API_KEY` -> Gemini embeddings (3072d)
66
+ - no embedding key -> local embeddings (384d, MiniLM, offline-capable)
67
+ - `AUDREY_EMBEDDING_PROVIDER=openai` -> explicit OpenAI embeddings (1536d)
68
+ - `ANTHROPIC_API_KEY` -> LLM-powered consolidation, contradiction detection, and reflection
49
69
 
50
- ### CLI Subcommands
70
+ Quick checks:
51
71
 
52
72
  ```bash
53
- npx audrey install # Register MCP server with Claude Code
54
- npx audrey uninstall # Remove MCP server registration
55
- npx audrey status # Show memory store health and stats
56
- npx audrey greeting # Output session briefing (mood, principles, recent memories)
57
- npx audrey greeting "auth" # Briefing + context-relevant memories for "auth"
58
- npx audrey reflect # Reflect on conversation + dream cycle (reads turns from stdin)
59
- npx audrey dream # Run consolidation + decay cycle
60
- npx audrey reembed # Re-embed all memories with current provider
73
+ npx audrey status
74
+ npx audrey status --json
75
+ npx audrey status --json --fail-on-unhealthy
61
76
  ```
62
77
 
63
- `greeting` and `reflect` are designed for Claude Code hooks — wire them into SessionStart and Stop events for automatic memory lifecycle.
64
-
65
- ### SDK in Your Code
78
+ ### SDK
66
79
 
67
80
  ```bash
68
81
  npm install audrey
@@ -70,739 +83,393 @@ npm install audrey
70
83
 
71
84
  Zero external infrastructure. One SQLite file.
72
85
 
73
- ## Usage
86
+ ## Quick Start
74
87
 
75
88
  ```js
76
89
  import { Audrey } from 'audrey';
77
90
 
78
- // 1. Create a brain
79
91
  const brain = new Audrey({
80
92
  dataDir: './agent-memory',
81
- agent: 'my-agent',
82
- embedding: { provider: 'local', dimensions: 384 }, // or 'gemini', 'openai'
93
+ agent: 'support-agent',
94
+ embedding: { provider: 'local', dimensions: 384 },
83
95
  });
84
96
 
85
- // 2. Encode observations — with optional emotional context
86
97
  await brain.encode({
87
- content: 'Stripe API returns 429 above 100 req/s',
98
+ content: 'Stripe API returned 429 above 100 req/s',
88
99
  source: 'direct-observation',
89
100
  tags: ['stripe', 'rate-limit'],
101
+ context: { task: 'debugging', domain: 'payments' },
90
102
  affect: { valence: -0.4, arousal: 0.7, label: 'frustration' },
91
103
  });
92
104
 
93
- // 3. Recall what you know — mood-congruent retrieval
94
105
  const memories = await brain.recall('stripe rate limits', {
95
106
  limit: 5,
96
- mood: { valence: -0.3 }, // frustrated right now? memories encoded in frustration surface first
107
+ context: { task: 'debugging', domain: 'payments' },
97
108
  });
98
109
 
99
- // 4. Filtered recall — by tag, source, or date range
100
- const recent = await brain.recall('stripe', {
101
- tags: ['rate-limit'],
102
- sources: ['direct-observation'],
103
- after: '2026-02-01T00:00:00Z',
104
- context: { task: 'debugging', domain: 'payments' }, // context-dependent retrieval
105
- });
106
-
107
- // 5. Dream — the biological sleep cycle
108
110
  const dream = await brain.dream();
109
- // Consolidates episodes into principles, applies forgetting curves, reports health
110
-
111
- // 6. Reflect on a conversation — form lasting memories
112
- const result = await brain.reflect([
113
- { role: 'user', content: 'How do I handle rate limits?' },
114
- { role: 'assistant', content: 'Use exponential backoff with jitter...' },
115
- ]);
116
- // LLM extracts what matters, encodes it as lasting memories
117
-
118
- // 7. Session greeting — wake up with context
119
111
  const briefing = await brain.greeting({ context: 'debugging stripe' });
120
- // Returns mood, principles, recent memories, identity, unresolved threads
121
-
122
- // 8. Forget something
123
- brain.forget(memoryId); // soft-delete
124
- brain.forget(memoryId, { purge: true }); // hard-delete
125
- await brain.forgetByQuery('old API endpoint', { minSimilarity: 0.9 });
126
-
127
- // 9. Check brain health
128
- const stats = brain.introspect();
129
- // { episodic: 47, semantic: 12, procedural: 3, dormant: 8, ... }
130
112
 
131
- // 10. Clean up
113
+ await brain.waitForIdle();
132
114
  brain.close();
133
115
  ```
134
116
 
135
- ### Configuration
136
-
137
- ```js
138
- const brain = new Audrey({
139
- dataDir: './audrey-data', // SQLite database directory
140
- agent: 'my-agent', // Agent identifier
141
-
142
- // Embedding provider (required)
143
- embedding: {
144
- provider: 'local', // 'mock' (test), 'local' (384d MiniLM), 'gemini' (3072d), 'openai' (1536d)
145
- dimensions: 384, // Must match provider
146
- apiKey: '...', // Required for gemini/openai
147
- device: 'gpu', // 'gpu' or 'cpu' — for local provider only
148
- },
149
-
150
- // LLM provider (optional — enables smart consolidation + contradiction detection + reflection)
151
- llm: {
152
- provider: 'anthropic', // 'mock', 'anthropic', or 'openai'
153
- apiKey: '...', // Required for anthropic/openai
154
- model: 'claude-sonnet-4-6', // Optional model override
155
- },
156
-
157
- // Consolidation settings
158
- consolidation: {
159
- minEpisodes: 3, // Minimum cluster size for principle extraction
160
- },
161
-
162
- // Context-dependent retrieval
163
- context: {
164
- enabled: true, // Enable encoding-specificity principle
165
- weight: 0.3, // Max 30% confidence boost on full context match
166
- },
167
-
168
- // Emotional memory
169
- affect: {
170
- enabled: true, // Enable affect system
171
- weight: 0.2, // Max 20% mood-congruence boost
172
- arousalWeight: 0.3, // Yerkes-Dodson arousal-salience coupling
173
- resonance: { // Detect emotional echoes across experiences
174
- enabled: true,
175
- k: 5, // How many past episodes to check
176
- threshold: 0.5, // Semantic similarity threshold
177
- affectThreshold: 0.6, // Emotional similarity threshold
178
- },
179
- },
180
-
181
- // Interference-based forgetting
182
- interference: {
183
- enabled: true, // New episodes suppress similar existing memories
184
- weight: 0.15, // Suppression strength
185
- },
186
-
187
- // Decay settings
188
- decay: {
189
- dormantThreshold: 0.1, // Below this confidence = dormant
190
- },
191
- });
192
- ```
193
-
194
- **Without an LLM provider**, consolidation uses a default text-based extractor and contradiction detection is similarity-only. **With an LLM provider**, Audrey extracts real generalized principles (semantic and procedural), detects semantic contradictions, resolves context-dependent truths, and reflects on conversations to form lasting memories.
117
+ ## MCP Tools
195
118
 
196
- ### Environment Variables (MCP Server)
119
+ Every Claude Code session gets these tools after `npx audrey install`:
197
120
 
198
- | Variable | Default | Purpose |
199
- |---|---|---|
200
- | `AUDREY_DATA_DIR` | `~/.audrey/data` | SQLite database directory |
201
- | `AUDREY_AGENT` | `claude-code` | Agent identifier |
202
- | `AUDREY_EMBEDDING_PROVIDER` | auto-detect | `local`, `gemini`, `openai`, or `mock` |
203
- | `AUDREY_LLM_PROVIDER` | auto-detect | `anthropic`, `openai`, or `mock` |
204
- | `AUDREY_DEVICE` | `gpu` | Device for local embedding provider |
205
- | `GOOGLE_API_KEY` | — | Gemini embeddings (auto-selected when present) |
206
- | `ANTHROPIC_API_KEY` | — | Anthropic LLM (consolidation, reflection, contradiction detection) |
207
- | `OPENAI_API_KEY` | — | OpenAI embeddings/LLM (must be explicitly selected for embeddings) |
121
+ - `memory_encode`
122
+ - `memory_recall`
123
+ - `memory_consolidate`
124
+ - `memory_dream`
125
+ - `memory_introspect`
126
+ - `memory_resolve_truth`
127
+ - `memory_export`
128
+ - `memory_import`
129
+ - `memory_forget`
130
+ - `memory_decay`
131
+ - `memory_status`
132
+ - `memory_reflect`
133
+ - `memory_greeting`
208
134
 
209
- ## Core Concepts
135
+ ## CLI
210
136
 
211
- ### Four Memory Types
212
-
213
- **Episodic** (hot, fast decay) Raw events. "Stripe returned 429 at 3pm." Immutable. Append-only. Never modified.
214
-
215
- **Semantic** (warm, slow decay) Consolidated principles. "Stripe enforces 100 req/s rate limit." Extracted automatically from clusters of episodic memories.
216
-
217
- **Procedural** (cold, slowest decay) — Learned workflows. "When Stripe rate-limits, implement exponential backoff." Skills the agent has acquired. Routed automatically when the LLM identifies a principle as procedural.
218
-
219
- **Causal** — Why things happened. Not just "A then B" but "A caused B because of mechanism C." Prevents correlation-as-causation.
220
-
221
- ### Confidence Formula
222
-
223
- Every memory has a compositional confidence score:
224
-
225
- ```
226
- C(m, t) = w_s * S + w_e * E + w_r * R(t) + w_ret * Ret(t)
227
- ```
228
-
229
- | Component | What It Measures | Default Weight |
230
- |---|---|---|
231
- | **S** — Source reliability | How trustworthy is the origin? | 0.30 |
232
- | **E** — Evidence agreement | Do observations agree or contradict? | 0.35 |
233
- | **R(t)** — Recency decay | How old is the memory? (Ebbinghaus curve) | 0.20 |
234
- | **Ret(t)** — Retrieval reinforcement | How often is this memory accessed? | 0.15 |
235
-
236
- Source reliability hierarchy:
237
-
238
- | Source Type | Reliability |
239
- |---|---|
240
- | `direct-observation` | 0.95 |
241
- | `told-by-user` | 0.90 |
242
- | `tool-result` | 0.85 |
243
- | `inference` | 0.60 |
244
- | `model-generated` | 0.40 (capped at 0.6 confidence) |
245
-
246
- The `model-generated` cap prevents circular self-confirmation — an agent can't boost its own hallucinations into high-confidence "facts."
247
-
248
- ### Decay (Forgetting Curves)
249
-
250
- Unreinforced memories lose confidence over time following Ebbinghaus exponential decay:
251
-
252
- | Memory Type | Half-Life | Rationale |
253
- |---|---|---|
254
- | Episodic | 7 days | Raw events go stale fast |
255
- | Semantic | 30 days | Principles are hard-won |
256
- | Procedural | 90 days | Skills are slowest to forget |
257
-
258
- Retrieval resets the decay clock. Frequently accessed memories persist. Memories below the dormant threshold (0.1) become dormant — still searchable with `includeDormant: true`, but excluded from default recall.
259
-
260
- ### Dream Cycle (The "Sleep" Cycle)
261
-
262
- `brain.dream()` runs the full biological sleep analog:
263
-
264
- 1. **Consolidate** — Cluster similar episodic memories via KNN, extract principles via LLM, route to semantic or procedural tables
265
- 2. **Decay** — Apply forgetting curves, transition low-confidence memories to dormant
266
- 3. **Introspect** — Report memory system health
267
-
268
- The pipeline is fully transactional — if any cluster fails mid-run, all writes roll back. Consolidation is idempotent. Re-running on the same data produces no duplicates.
269
-
270
- ### Consolidation Routing
271
-
272
- When the LLM extracts a principle, it classifies it:
273
-
274
- - `type: 'semantic'` → goes to the `semantics` table (general knowledge)
275
- - `type: 'procedural'` → goes to the `procedures` table with `trigger_conditions` (actionable skills)
137
+ ```bash
138
+ # Setup
139
+ npx audrey install # Register MCP server with Claude Code
140
+ npx audrey uninstall # Remove MCP server registration
141
+ npx audrey hooks install # Wire Audrey into Claude Code hooks (automatic memory)
142
+ npx audrey hooks uninstall # Remove Audrey hooks
276
143
 
277
- ### Contradiction Handling
144
+ # Health and monitoring
145
+ npx audrey status # Human-readable health report
146
+ npx audrey status --json # Machine-readable health output
147
+ npx audrey status --json --fail-on-unhealthy # CI gate
278
148
 
279
- When memories conflict, Audrey doesn't force a winner. Contradictions have a lifecycle:
149
+ # Session lifecycle (used by hooks automatically)
150
+ npx audrey greeting # Load identity, principles, mood
151
+ npx audrey greeting "auth" # With context-aware recall
152
+ npx audrey recall "query" # Semantic memory search (returns hook-compatible JSON)
153
+ npx audrey reflect # Consolidate learnings from stdin conversation + dream
280
154
 
281
- ```
282
- open -> resolved | context_dependent | reopened
283
- ```
155
+ # Maintenance
156
+ npx audrey dream # Full consolidation + decay cycle
157
+ npx audrey reembed # Re-embed all memories after provider/dimension change
284
158
 
285
- Context-dependent truths are modeled explicitly:
159
+ # Versioning
160
+ npx audrey snapshot # Export memories to timestamped JSON file
161
+ npx audrey snapshot backup.json # Export to specific file
162
+ npx audrey restore backup.json # Restore from snapshot (re-embeds with current provider)
163
+ npx audrey restore backup.json --force # Overwrite existing memories
286
164
 
287
- ```js
288
- // "Stripe rate limit is 100 req/s" (live keys)
289
- // "Stripe rate limit is 25 req/s" (test keys)
290
- // Both true — under different conditions
165
+ # REST API server
166
+ npx audrey serve # Start HTTP server on port 3487
167
+ npx audrey serve 8080 # Custom port
291
168
  ```
292
169
 
293
- New high-confidence evidence can reopen resolved disputes.
294
-
295
- ### Forget and Purge
296
-
297
- Memories can be explicitly forgotten — by ID or by semantic query:
298
-
299
- **Soft-delete** (default) — Marks the memory as forgotten/superseded and removes its vector index. The record stays in the database but is excluded from recall. Reversible via direct database access.
300
-
301
- **Hard-delete** (`purge: true`) — Permanently removes the memory from both the main table and the vector index. Irreversible.
302
-
303
- **Bulk purge** — Removes all forgotten, dormant, superseded, and rolled-back memories in one operation. Useful for GDPR compliance or storage cleanup.
304
-
305
- ### Rollback
170
+ ## Hooks Integration
306
171
 
307
- Bad consolidation? Undo it:
172
+ Audrey integrates directly into Claude Code's hook lifecycle for automatic, zero-config memory in every session:
308
173
 
309
- ```js
310
- const history = brain.consolidationHistory();
311
- brain.rollback(history[0].id);
312
- // Semantic memories -> rolled_back state
313
- // Source episodes -> un-consolidated
314
- // Full audit trail preserved
174
+ ```bash
175
+ npx audrey hooks install
315
176
  ```
316
177
 
317
- ### Circular Self-Confirmation Defense
318
-
319
- The most dangerous exploit in AI memory: agent hallucinates X, encodes it, later retrieves it, "reinforcement" boosts confidence, X eventually consolidates as "established truth."
320
-
321
- Audrey's defenses:
322
-
323
- 1. **Source diversity requirement** — Consolidation requires evidence from 2+ distinct source types
324
- 2. **Model-generated cap** — Memories from `model-generated` sources are capped at 0.6 confidence
325
- 3. **Source lineage tracking** — Provenance chains detect when all evidence traces back to a single inference
326
- 4. **Source diversity score** — Every semantic memory tracks how many different source types contributed
327
-
328
- ## API Reference
178
+ This configures four hooks in `~/.claude/settings.json`:
329
179
 
330
- ### `new Audrey(config)`
180
+ | Hook Event | Command | What Happens |
181
+ |---|---|---|
182
+ | **SessionStart** | `npx audrey greeting` | Loads identity, learned principles, current mood, and recent memories |
183
+ | **UserPromptSubmit** | `npx audrey recall` | Semantic search on every prompt — injects relevant memories as context |
184
+ | **Stop** | `npx audrey reflect` | Extracts lasting learnings from the conversation, then runs a dream cycle |
185
+ | **PostCompact** | `npx audrey greeting` | Re-injects critical memories after context window compaction |
331
186
 
332
- See [Configuration](#configuration) above for all options.
187
+ With hooks installed, Claude Code sessions automatically wake up with context, recall relevant memories per-prompt, and consolidate learnings when the session ends. No manual tool calls needed.
333
188
 
334
- ### `brain.encode(params)` -> `Promise<string>`
189
+ ## REST API Server
335
190
 
336
- Encode an episodic memory. Returns the memory ID.
191
+ Turn Audrey into an HTTP service that any language or framework can use:
337
192
 
338
- ```js
339
- const id = await brain.encode({
340
- content: 'What happened', // Required. Non-empty string, max 50000 chars.
341
- source: 'direct-observation', // Required. See source types above.
342
- salience: 0.8, // Optional. 0-1. Default: 0.5
343
- causal: { // Optional. What caused this / what it caused.
344
- trigger: 'batch-processing',
345
- consequence: 'queue-backed-up',
346
- },
347
- tags: ['stripe', 'production'], // Optional. Array of strings.
348
- supersedes: 'previous-id', // Optional. ID of episode this corrects.
349
- context: { task: 'debugging' }, // Optional. Situational context for retrieval.
350
- affect: { // Optional. Emotional context.
351
- valence: -0.5, // -1 (negative) to 1 (positive)
352
- arousal: 0.7, // 0 (calm) to 1 (activated)
353
- label: 'frustration', // Human-readable emotion label
354
- },
355
- private: true, // Optional. If true, excluded from public recall.
356
- });
193
+ ```bash
194
+ npx audrey serve # Start on port 3487
195
+ npx audrey serve 8080 # Custom port
196
+ AUDREY_API_KEY=secret npx audrey serve # With Bearer token auth
357
197
  ```
358
198
 
359
- Episodes are **immutable**. Corrections create new records with `supersedes` links. The original is preserved.
199
+ Endpoints:
360
200
 
361
- ### `brain.encodeBatch(paramsList)` -> `Promise<string[]>`
201
+ | Method | Path | Description |
202
+ |--------|------|-------------|
203
+ | `GET` | `/health` | Liveness probe |
204
+ | `GET` | `/status` | Memory stats (introspect) |
205
+ | `POST` | `/encode` | Store a memory (`{ content, source, tags?, context?, affect? }`) |
206
+ | `POST` | `/recall` | Semantic search (`{ query, limit?, context? }`) |
207
+ | `POST` | `/dream` | Full consolidation + decay cycle |
208
+ | `POST` | `/consolidate` | Run consolidation only |
209
+ | `POST` | `/forget` | Forget by `{ id }` or `{ query }` |
210
+ | `POST` | `/snapshot` | Export all memories as JSON |
211
+ | `POST` | `/restore` | Wipe and reimport from snapshot |
362
212
 
363
- Encode multiple episodes in one call. Same params as `encode()`, but as an array.
213
+ Example from any language:
364
214
 
365
- ```js
366
- const ids = await brain.encodeBatch([
367
- { content: 'Stripe returned 429', source: 'direct-observation' },
368
- { content: 'Redis timed out', source: 'tool-result' },
369
- { content: 'User reports slow checkout', source: 'told-by-user' },
370
- ]);
371
- ```
372
-
373
- ### `brain.recall(query, options)` -> `Promise<Memory[]>`
374
-
375
- Retrieve memories ranked by `similarity * confidence`.
215
+ ```bash
216
+ # Store a memory
217
+ curl -X POST http://localhost:3487/encode \
218
+ -H "Content-Type: application/json" \
219
+ -d '{"content": "The deploy failed due to OOM", "source": "direct-observation"}'
376
220
 
377
- ```js
378
- const memories = await brain.recall('stripe rate limits', {
379
- limit: 5, // Max results (default 10, max 50)
380
- minConfidence: 0.5, // Filter below this confidence
381
- types: ['semantic'], // Filter by memory type
382
- includeProvenance: true, // Include evidence chains
383
- includeDormant: false, // Include dormant memories
384
- tags: ['rate-limit'], // Only episodic memories with these tags
385
- sources: ['direct-observation'], // Only episodic memories from these sources
386
- after: '2026-02-01T00:00:00Z', // Only memories created after this date
387
- before: '2026-03-01T00:00:00Z', // Only memories created before this date
388
- context: { task: 'debugging' }, // Boost memories encoded in matching context
389
- mood: { valence: -0.3, arousal: 0.5 }, // Mood-congruent retrieval
390
- });
221
+ # Search memories
222
+ curl -X POST http://localhost:3487/recall \
223
+ -H "Content-Type: application/json" \
224
+ -d '{"query": "deploy failures", "limit": 5}'
391
225
  ```
392
226
 
393
- Tag and source filters only apply to episodic memories (semantic and procedural memories don't have tags or sources). Date filters apply to all memory types. Recall gracefully degrades — if one memory type's vector search fails, the others still return results.
394
-
395
- Each result:
396
-
397
- ```js
398
- {
399
- id: '01ABC...',
400
- content: 'Stripe enforces ~100 req/s rate limit',
401
- type: 'semantic',
402
- confidence: 0.87,
403
- score: 0.74, // similarity * confidence
404
- source: 'consolidation',
405
- state: 'active',
406
- contextMatch: 0.8, // When retrieval context provided
407
- moodCongruence: 0.7, // When mood provided
408
- provenance: { // When includeProvenance: true
409
- evidenceEpisodeIds: ['01XYZ...', '01DEF...'],
410
- evidenceCount: 3,
411
- supportingCount: 3,
412
- contradictingCount: 0,
413
- },
414
- }
415
- ```
227
+ ## Versioning
416
228
 
417
- Retrieval automatically reinforces matched memories (boosts confidence, resets decay clock).
229
+ Audrey stores memories in SQLite with WAL mode, which isn't git-friendly. Instead, use JSON snapshots:
418
230
 
419
- ### `brain.recallStream(query, options)` -> `AsyncGenerator<Memory>`
231
+ ```bash
232
+ # Save a checkpoint
233
+ npx audrey snapshot
420
234
 
421
- Streaming version of `recall()`. Yields results one at a time. Supports early `break`. Same options as `recall()`.
235
+ # Commit it
236
+ git add audrey-snapshot-*.json && git commit -m "memory checkpoint"
422
237
 
423
- ```js
424
- for await (const memory of brain.recallStream('stripe issues', { limit: 10 })) {
425
- console.log(memory.content, memory.score);
426
- if (memory.score > 0.9) break;
427
- }
238
+ # Restore on another machine or after a reset
239
+ npx audrey restore audrey-snapshot-2026-03-24_15-30-00.json
428
240
  ```
429
241
 
430
- ### `brain.dream(options)` -> `Promise<DreamResult>`
242
+ Snapshots are human-readable, diffable, and provider-agnostic. Embeddings are re-generated on import, so you can switch providers (e.g., local to Gemini) and restore seamlessly.
431
243
 
432
- Run the full biological sleep cycle: consolidate + decay + introspect.
244
+ ## Production Fit
433
245
 
434
- ```js
435
- const result = await brain.dream({
436
- minClusterSize: 3, // Min episodes per cluster
437
- similarityThreshold: 0.85, // KNN clustering threshold
438
- dormantThreshold: 0.1, // Below this = dormant
439
- });
440
- // {
441
- // consolidation: { episodesEvaluated, clustersFound, principlesExtracted, semanticsCreated, proceduresCreated },
442
- // decay: { totalEvaluated, transitionedToDormant },
443
- // stats: { episodic, semantic, procedural, ... },
444
- // }
445
- ```
246
+ Audrey is strongest today in workflows where memory must stay local, reviewable, and durable:
446
247
 
447
- ### `brain.reflect(turns)` -> `Promise<ReflectResult>`
248
+ - **Financial services operations**: payments ops, fraud and dispute workflows, KYC/KYB review, internal policy assistants
249
+ - **Healthcare operations**: care coordination, prior-auth workflows, intake and referral routing, internal staff knowledge assistants
448
250
 
449
- Feed a conversation to the LLM and extract lasting memories. Requires an LLM provider.
251
+ Audrey is a memory layer, not a compliance boundary. For regulated environments, pair it with application-level access control, encryption, retention, audit logging, and data-minimization rules.
450
252
 
451
- ```js
452
- const result = await brain.reflect([
453
- { role: 'user', content: 'How do I handle rate limits?' },
454
- { role: 'assistant', content: 'Use exponential backoff...' },
455
- ]);
456
- // { encoded: 2, memories: [...] }
457
- ```
253
+ Production guide: [docs/production-readiness.md](docs/production-readiness.md)
458
254
 
459
- ### `brain.greeting(options)` -> `Promise<GreetingResult>`
255
+ Industry demos:
460
256
 
461
- Session-start briefing. Returns mood, principles, identity, recent memories, and unresolved threads.
257
+ - [examples/fintech-ops-demo.js](examples/fintech-ops-demo.js)
258
+ - [examples/healthcare-ops-demo.js](examples/healthcare-ops-demo.js)
462
259
 
463
- ```js
464
- const briefing = await brain.greeting({
465
- context: 'debugging stripe', // Optional — also returns relevant memories
466
- recentLimit: 10,
467
- principleLimit: 5,
468
- identityLimit: 5,
469
- });
470
- // { recent, principles, mood, unresolved, identity, contextual }
471
- ```
260
+ ## Core Concepts
472
261
 
473
- ### `brain.forget(id, options)` -> `ForgetResult`
262
+ ### Memory Types
474
263
 
475
- Forget a memory by ID. Works on any memory type (episodic, semantic, procedural).
264
+ - **Episodic**: raw events and observations
265
+ - **Semantic**: consolidated principles
266
+ - **Procedural**: reusable workflows and actions
267
+ - **Causal**: relationships that explain why something happened
476
268
 
477
- ```js
478
- brain.forget(memoryId); // soft-delete
479
- brain.forget(memoryId, { purge: true }); // hard-delete (permanent)
480
- // { id, type: 'episodic', purged: false }
481
- ```
482
-
483
- ### `brain.forgetByQuery(query, options)` -> `Promise<ForgetResult | null>`
269
+ ### Confidence
484
270
 
485
- Find the closest matching memory by semantic search and forget it. Searches all three memory types, picks the best match.
271
+ Audrey scores memories using source reliability, evidence agreement, recency decay, and retrieval reinforcement. That helps keep direct observations above guesses and keeps stale or weakly supported knowledge from dominating recall.
486
272
 
487
- ```js
488
- const result = await brain.forgetByQuery('old API endpoint', {
489
- minSimilarity: 0.9, // Threshold for match (default 0.9)
490
- purge: false, // Hard-delete? (default false)
491
- });
492
- // null if no match above threshold
493
- ```
273
+ ### Dream Cycle
494
274
 
495
- ### `brain.purge()` -> `PurgeCounts`
275
+ `brain.dream()` runs the full maintenance path:
496
276
 
497
- Bulk hard-delete all dead memories: forgotten episodes, dormant/superseded/rolled-back semantics and procedures.
277
+ 1. Consolidate related episodes into principles.
278
+ 2. Apply decay so low-value memories lose weight over time.
279
+ 3. Report memory health and current stats.
498
280
 
499
- ```js
500
- const counts = brain.purge();
501
- // { episodes: 12, semantics: 3, procedures: 0 }
502
- ```
281
+ ### Contradiction Handling
503
282
 
504
- ### `brain.consolidate(options)` -> `Promise<ConsolidationResult>`
283
+ When evidence conflicts, Audrey tracks the contradiction instead of silently picking a winner. Resolutions can stay open, be marked resolved, or become context-dependent.
505
284
 
506
- Run the consolidation engine manually. Fully transactional — if any cluster fails, all writes roll back.
285
+ ## Configuration
507
286
 
508
287
  ```js
509
- const result = await brain.consolidate({
510
- minClusterSize: 3,
511
- similarityThreshold: 0.80,
512
- extractPrinciple: (episodes) => ({ // Optional LLM callback
513
- content: 'Extracted principle text',
514
- type: 'semantic', // or 'procedural'
515
- conditions: ['trigger conditions'], // for procedural only
516
- }),
288
+ const brain = new Audrey({
289
+ dataDir: './audrey-data',
290
+ agent: 'my-agent',
291
+ embedding: {
292
+ provider: 'local', // mock | local | gemini | openai
293
+ dimensions: 384,
294
+ device: 'gpu',
295
+ },
296
+ llm: {
297
+ provider: 'anthropic', // mock | anthropic | openai
298
+ apiKey: process.env.ANTHROPIC_API_KEY,
299
+ },
300
+ consolidation: {
301
+ minEpisodes: 3,
302
+ },
303
+ context: {
304
+ enabled: true,
305
+ weight: 0.3,
306
+ },
307
+ affect: {
308
+ enabled: true,
309
+ weight: 0.2,
310
+ },
311
+ decay: {
312
+ dormantThreshold: 0.1,
313
+ },
517
314
  });
518
- // { runId, status, episodesEvaluated, clustersFound, principlesExtracted, semanticsCreated, proceduresCreated }
519
- ```
520
-
521
- ### `brain.decay(options)` -> `DecayResult`
522
-
523
- Apply forgetting curves. Transitions low-confidence memories to dormant.
524
-
525
- ```js
526
- const result = brain.decay({ dormantThreshold: 0.1 });
527
- // { totalEvaluated, transitionedToDormant, timestamp }
528
315
  ```
529
316
 
530
- ### `brain.memoryStatus()` -> `HealthStatus`
317
+ ## Operations
531
318
 
532
- Check brain health: vector index sync, dimension consistency, re-embed recommendations.
319
+ Recommended production workflow:
533
320
 
534
- ```js
535
- brain.memoryStatus();
536
- // { healthy, vec_episodes, searchable_episodes, vec_semantics, ..., reembed_recommended }
537
- ```
538
-
539
- ### `brain.rollback(runId)` -> `RollbackResult`
321
+ ```bash
322
+ # Health checks
323
+ npx audrey status
324
+ npx audrey status --json --fail-on-unhealthy
540
325
 
541
- Undo a consolidation run.
326
+ # Scheduled maintenance
327
+ npx audrey dream
542
328
 
543
- ```js
544
- brain.rollback('01ABC...');
545
- // { rolledBackMemories: 3, restoredEpisodes: 9 }
546
- ```
329
+ # Repair vector/index drift after provider or dimension changes
330
+ npx audrey reembed
547
331
 
548
- ### `brain.resolveTruth(contradictionId)` -> `Promise<Resolution>`
332
+ # Version control your memories
333
+ npx audrey snapshot
334
+ npx audrey restore <file> --force
549
335
 
550
- Resolve an open contradiction using LLM reasoning. Requires an LLM provider configured.
336
+ # Run the benchmark harness
337
+ npm run bench:memory
551
338
 
552
- ```js
553
- const resolution = await brain.resolveTruth('contradiction-id');
554
- // { resolution: 'context_dependent', conditions: { a: 'live keys', b: 'test keys' }, explanation: '...' }
339
+ # Fail CI if Audrey drops below benchmark guardrails
340
+ npm run bench:memory:check
555
341
  ```
556
342
 
557
- ### `brain.introspect()` -> `Stats`
343
+ ## Benchmarking
558
344
 
559
- Get memory system health stats.
345
+ Audrey now ships with a memory benchmark harness built for three purposes:
560
346
 
561
- ```js
562
- brain.introspect();
563
- // {
564
- // episodic: 247, semantic: 31, procedural: 8,
565
- // causalLinks: 42, dormant: 15,
566
- // contradictions: { open: 2, resolved: 7, context_dependent: 3, reopened: 0 },
567
- // lastConsolidation: '2026-02-18T22:00:00Z',
568
- // totalConsolidationRuns: 14,
569
- // }
570
- ```
347
+ - measure Audrey against naive local baselines on LongMemEval-style memory abilities plus privacy and abstention checks
348
+ - measure Audrey on lifecycle operations that other memory systems usually hand-wave: update, overwrite, delete, merge, and abstain
349
+ - keep Audrey grounded against published LoCoMo results from leading memory systems
571
350
 
572
- ### `brain.consolidationHistory()` -> `ConsolidationRun[]`
351
+ Run it with:
573
352
 
574
- Full audit trail of all consolidation runs.
575
-
576
- ### `brain.export()` / `brain.import(snapshot)`
577
-
578
- Export all memories as a JSON snapshot, or import from one. Full-fidelity: preserves consolidation metrics, run metadata, and config. Import re-embeds everything with the current provider in a single atomic transaction.
579
-
580
- ```js
581
- const snapshot = brain.export(); // { version, episodes, semantics, procedures, consolidationMetrics, ... }
582
- await brain.import(snapshot); // Re-embeds everything with current provider
583
- ```
584
-
585
- ### Events
586
-
587
- ```js
588
- brain.on('encode', ({ id, content, source }) => { ... });
589
- brain.on('reinforcement', ({ episodeId, targetId, similarity }) => { ... });
590
- brain.on('contradiction', ({ episodeId, contradictionId, semanticId, resolution }) => { ... });
591
- brain.on('consolidation', ({ runId, principlesExtracted }) => { ... });
592
- brain.on('decay', ({ totalEvaluated, transitionedToDormant }) => { ... });
593
- brain.on('dream', ({ consolidation, decay, stats }) => { ... });
594
- brain.on('rollback', ({ runId, rolledBackMemories }) => { ... });
595
- brain.on('forget', ({ id, type, purged }) => { ... });
596
- brain.on('purge', ({ episodes, semantics, procedures }) => { ... });
597
- brain.on('interference', ({ newEpisodeId, suppressedId, similarity }) => { ... });
598
- brain.on('resonance', ({ episodeId, resonances }) => { ... });
599
- brain.on('migration', ({ episodes, semantics, procedures }) => { ... });
600
- brain.on('error', (err) => { ... });
353
+ ```bash
354
+ npm run bench:memory
601
355
  ```
602
356
 
603
- ### `brain.close()`
357
+ Artifacts land in `benchmarks/output/` as JSON, SVG charts, and an HTML report.
604
358
 
605
- Close the database connection.
359
+ For CI and release gates:
606
360
 
607
- ## Architecture
608
-
609
- ```
610
- audrey-data/
611
- audrey.db <- Single SQLite file. WAL mode. That's your brain.
612
- ```
613
-
614
- ```
615
- src/
616
- audrey.js Main class. EventEmitter. Public API surface.
617
- causal.js Causal graph management. LLM-powered mechanism articulation.
618
- confidence.js Compositional confidence formula. Pure math.
619
- consolidate.js "Sleep" cycle. KNN clustering -> LLM extraction -> promote.
620
- db.js SQLite + sqlite-vec. Schema, vec0 tables, migrations.
621
- decay.js Ebbinghaus forgetting curves.
622
- embedding.js Pluggable providers (Mock, Local/MiniLM, Gemini, OpenAI). Batch embedding.
623
- encode.js Immutable episodic memory creation + vec0 writes.
624
- affect.js Emotional memory: arousal-salience coupling, mood-congruent recall, resonance.
625
- context.js Context-dependent retrieval modifier (encoding specificity).
626
- interference.js Competitive memory suppression (engram competition).
627
- forget.js Soft-delete, hard-delete, query-based forget, bulk purge.
628
- introspect.js Health dashboard queries.
629
- llm.js Pluggable LLM providers (Mock, Anthropic, OpenAI).
630
- prompts.js Structured prompt templates for LLM operations.
631
- recall.js KNN retrieval + confidence scoring + filtered recall + streaming.
632
- rollback.js Undo consolidation runs.
633
- utils.js Date math, safe JSON parse.
634
- validate.js KNN validation + LLM contradiction detection.
635
- migrate.js Dimension migration re-embedding.
636
- adaptive.js Adaptive consolidation parameter suggestions.
637
- export.js Memory export (JSON snapshots with consolidation metrics).
638
- import.js Memory import with batch re-embedding in atomic transactions.
639
- index.js SDK barrel export (all providers, database utilities).
640
-
641
- mcp-server/
642
- index.js MCP tool server (13 tools, stdio transport) + CLI subcommands.
643
- config.js Shared config (env var parsing, provider resolution, install arg builder).
361
+ ```bash
362
+ npm run bench:memory:check
644
363
  ```
645
364
 
646
- ### Database Schema
365
+ That command fails if Audrey drops below its minimum local score, local pass rate, or required margin over the strongest naive baseline.
647
366
 
648
- | Table | Purpose |
649
- |---|---|
650
- | `episodes` | Immutable raw events (content, source, salience, causal context, affect, private flag) |
651
- | `semantics` | Consolidated principles (content, state, evidence chain) |
652
- | `procedures` | Learned workflows (trigger conditions, success/failure counts) |
653
- | `causal_links` | Causal relationships (cause, effect, mechanism, link type) |
654
- | `contradictions` | Dispute tracking (claims, state, resolution) |
655
- | `consolidation_runs` | Audit trail (inputs, outputs, status, checkpoint cursor) |
656
- | `consolidation_metrics` | Per-run metrics and confidence deltas |
657
- | `vec_episodes` | sqlite-vec KNN index for episode embeddings |
658
- | `vec_semantics` | sqlite-vec KNN index for semantic embeddings |
659
- | `vec_procedures` | sqlite-vec KNN index for procedural embeddings |
660
- | `audrey_config` | Dimension configuration, embedding model info, metadata |
661
-
662
- All mutations use SQLite transactions. CHECK constraints enforce valid states and source types. Vector search uses sqlite-vec with cosine distance.
663
-
664
- ## Running Tests
367
+ For track-specific runs:
665
368
 
666
369
  ```bash
667
- npm test # 463 tests across 29 files
668
- npm run test:watch
370
+ npm run bench:memory:retrieval
371
+ npm run bench:memory:operations
669
372
  ```
670
373
 
671
- ## Running the Demo
374
+ For committed GitHub-friendly charts:
672
375
 
673
376
  ```bash
674
- node examples/stripe-demo.js
377
+ npm run bench:memory:readme-assets
675
378
  ```
676
379
 
677
- Demonstrates the full pipeline: encode 3 rate-limit observations, consolidate into principle, recall proactively.
678
-
679
- ---
680
-
681
- ## Changelog
682
-
683
- ### v0.16.0 (current)
684
-
685
- - Version bump for npm publish with all v0.15.0 features included
686
- - 463 tests across 29 test files
687
-
688
- ### v0.15.0 — Production Hardening + Dream Cycle
689
-
690
- - `dream()` method: consolidation + decay + introspect (biological sleep analog)
691
- - `memory_dream` MCP tool with configurable thresholds
692
- - `greeting` and `reflect` CLI subcommands for hook integration
693
- - Consolidation routes procedural principles to `procedures` table (previously all went to semantics)
694
- - Fully transactional consolidation — mid-run failures roll back all writes
695
- - Recall gracefully degrades per memory type (independent try/catch per KNN search)
696
- - sqlite-vec crash guard for empty vector tables
697
- - LLM JSON parsing strips markdown code fences from any provider
698
- - Input validation: empty content rejected, 50K char limit, forget requires exactly one target
699
- - Full-fidelity export/import: preserves consolidation metrics, run metadata, config
700
- - Import uses batch embedding in a single atomic transaction
701
- - Expanded SDK exports: all embedding/LLM providers, database utilities
702
- - Shared `resolveLLMConfig()` for CLI commands
703
- - 463 tests across 29 test files
704
-
705
- ### v0.14.0 — Memory Intelligence
380
+ ### README Snapshot
706
381
 
707
- - `memory_reflect` MCP tool — form lasting memories from conversation turns
708
- - `memory_greeting` MCP tool — session-start context briefing
709
- - `greeting()` method: mood, principles, identity, recent memories, unresolved threads
710
- - `reflect()` method: LLM-powered conversation analysis and memory formation
711
- - Rewritten consolidation prompt for deeper principle extraction
712
- - Rewritten reflection prompt for relational and emotional depth
713
- - `npx audrey status` shows last consolidation time
382
+ Local Audrey-vs-baseline results:
714
383
 
715
- ### v0.13.0 GPU-Accelerated Embeddings
384
+ ![Audrey local memory benchmark](docs/assets/benchmarks/local-benchmark.svg)
716
385
 
717
- - GPU device configuration for LocalEmbeddingProvider
718
- - True single-forward-pass batch embedding for LocalEmbeddingProvider
719
- - Gemini `batchEmbedContents` API for batch embedding
720
- - `reembedAll` uses `embedBatch` for performance
721
- - `AUDREY_DEVICE` env var, `memoryStatus` reports device
386
+ Lifecycle operations benchmark:
722
387
 
723
- ### v0.11.0 Multi-Provider Embeddings + Privacy
388
+ ![Audrey memory operations benchmark](docs/assets/benchmarks/operations-benchmark.svg)
724
389
 
725
- - `LocalEmbeddingProvider` 384d MiniLM via @huggingface/transformers (zero API key, works offline)
726
- - `GeminiEmbeddingProvider` — 3072d via Google text-embedding-004
727
- - `private: true` memory flag — memories visible to AI only, excluded from public recall
728
- - Auto-select embedding provider: local -> gemini (if API key present) -> explicit openai
729
- - `npx audrey reembed` CLI subcommand for provider migration
730
- - `reflect()` method for post-conversation memory formation
731
- - 409 tests across 29 test files
390
+ Published comparison anchors from current LLM memory systems:
732
391
 
733
- ### v0.9.0 Emotional Memory
392
+ ![Published LLM memory benchmark comparison](docs/assets/benchmarks/published-memory-standards.svg)
734
393
 
735
- - Valence-arousal affect model (Russell's circumplex) on every episode
736
- - Arousal-salience coupling via Yerkes-Dodson inverted-U curve
737
- - Mood-congruent recall — matching emotional state boosts retrieval confidence
738
- - Emotional resonance detection — new experiences that echo past emotional patterns emit events
739
- - MCP server: `memory_encode` accepts `affect`, `memory_recall` accepts `mood`
394
+ **Current deterministic CI snapshot** (`node benchmarks/run.js --provider mock --dimensions 64`):
740
395
 
741
- ### v0.8.0 Context-Dependent Retrieval
742
-
743
- - Encoding specificity principle: context stored with memory, matching context boosts recall
744
- - MCP server: `memory_encode` and `memory_recall` accept `context`
745
-
746
- ### v0.7.0 — Interference + Salience
747
-
748
- - Interference-based forgetting: new memories competitively suppress similar existing ones
749
- - Salience-weighted confidence: high-salience memories resist decay
750
- - Spaced-repetition reconsolidation: retrieval intervals affect reinforcement strength
751
-
752
- ### v0.6.0 — Filtered Recall + Forget
753
-
754
- - Filtered recall: tag, source, and date-range filters on `recall()` and `recallStream()`
755
- - `forget()`, `forgetByQuery()`, `purge()`
756
- - `memory_forget` and `memory_decay` MCP tools
757
-
758
- ### v0.5.0 — Feature Depth
759
-
760
- - Configurable confidence weights and decay rates per instance
761
- - Memory export/import (JSON snapshots with re-embedding)
762
- - `memory_export` and `memory_import` MCP tools
763
- - Auto-consolidation scheduling
764
- - Adaptive consolidation parameter suggestions
765
-
766
- ### v0.3.1 — MCP Server
767
-
768
- - MCP tool server via `@modelcontextprotocol/sdk` with stdio transport
769
- - One-command install: `npx audrey install` (auto-detects API keys)
770
- - CLI subcommands: `install`, `uninstall`, `status`
771
-
772
- ### v0.3.0 — Vector Performance
396
+ | Local track | Audrey | Best Baseline |
397
+ |---|---|---|
398
+ | Combined local benchmark | **100.0%** | 41.7% |
399
+ | Retrieval capabilities | **100.0%** | 56.3% |
400
+ | Memory operations | **100.0%** | 25.0% |
773
401
 
774
- - sqlite-vec native vector indexing (vec0 virtual tables with cosine distance)
775
- - KNN queries for recall, validation, and consolidation clustering
776
- - Batch encoding API and streaming recall with async generators
402
+ Retrieval-family breakdown:
777
403
 
778
- ### v0.2.0 LLM Integration
404
+ | Category | Audrey | Vector Only | Best Baseline |
405
+ |---|---|---|---|
406
+ | Information Extraction | 100% | 100% | 100% |
407
+ | Knowledge Updates | 100% | 50% | 50% |
408
+ | Multi-Session Reasoning | 100% | 100% | 100% |
409
+ | Temporal Reasoning | 100% | 100% | 100% |
410
+ | Abstention | 100% | 50% | 50% |
411
+ | Conflict Resolution | 100% | 50% | 50% |
412
+ | Procedural Learning | 100% | 0% | 0% |
413
+ | Privacy | 100% | 0% | 0% |
779
414
 
780
- - LLM-powered principle extraction, contradiction detection, causal articulation
781
- - Context-dependent truth resolution
782
- - Configurable LLM providers (Mock, Anthropic, OpenAI)
415
+ Operation-family breakdown:
783
416
 
784
- ### v0.1.0 Foundation
417
+ | Operation | Audrey | Vector Only | Best Baseline |
418
+ |---|---|---|---|
419
+ | Update / Overwrite | 100% | 50% | 50% |
420
+ | Delete + Abstain | 100% | 0% | 50% |
421
+ | Semantic Merge | 100% | 0% | 0% |
422
+ | Procedural Merge | 100% | 0% | 0% |
785
423
 
786
- - Immutable episodic memory, compositional confidence, Ebbinghaus forgetting curves
787
- - Consolidation engine, contradiction lifecycle, rollback
788
- - Circular self-confirmation defense, causal context, introspection
424
+ Published comparison anchors from the field (different benchmarks and conditions - included for field context, not direct comparison):
789
425
 
790
- ## Design Decisions
426
+ | System | Benchmark | Score | What it represents |
427
+ |---|---|---|---|
428
+ | **Audrey** | Internal retrieval + operations benchmark | **100.0%** | Update, overwrite, delete, merge, abstention, consolidation, privacy |
429
+ | MIRIX | Published LoCoMo | 85.4% | Typed multimodal memory |
430
+ | Letta Filesystem | Published LoCoMo | 74.0% | Context-engineering |
431
+ | Mem0 Graph Memory | Published LoCoMo | 68.5% | Graph memory |
432
+ | Mem0 | Published LoCoMo | 66.9% | Production baseline |
791
433
 
792
- **Why SQLite, not Postgres?** Zero infrastructure. `npm install` and you have a brain. The adapter pattern means you can migrate to pgvector when you need to scale.
434
+ Primary comparison sources:
793
435
 
794
- **Why append-only episodes?** Immutability creates a reliable audit trail. Corrections use `supersedes` links rather than mutations. You can always trace back to what actually happened.
436
+ - [MIRIX paper](https://arxiv.org/abs/2507.07957)
437
+ - [Mem0 paper](https://arxiv.org/abs/2504.19413)
438
+ - [Letta benchmark write-up](https://www.letta.com/blog/benchmarking-ai-agent-memory)
439
+ - [LongMemEval paper](https://arxiv.org/abs/2410.10813)
795
440
 
796
- **Why Ebbinghaus curves?** Biological forgetting is an adaptive feature, not a bug. It prevents cognitive overload, maintains relevance, and enables generalization. Audrey's forgetting works the same way.
441
+ Benchmark guide: [docs/benchmarking.md](docs/benchmarking.md)
797
442
 
798
- **Why model-generated cap at 0.6?** Prevents the most dangerous exploit in AI memory: circular self-confirmation where an agent's own inferences bootstrap themselves into high-confidence "facts" through repeated retrieval.
443
+ ## Repository
799
444
 
800
- **Why soft-delete by default?** Hard-deletes are irreversible. Soft-delete preserves data integrity and audit trails while excluding the memory from recall. Use `purge: true` or `brain.purge()` when you need permanent removal (GDPR, storage cleanup).
445
+ - Contributing guide: [CONTRIBUTING.md](CONTRIBUTING.md)
446
+ - Security policy: [SECURITY.md](SECURITY.md)
447
+ - CI workflow: [.github/workflows/ci.yml](.github/workflows/ci.yml)
448
+ - Benchmarking guide: [docs/benchmarking.md](docs/benchmarking.md)
801
449
 
802
- **Why emotional memory?** Every memory system stores facts. Biological memory stores facts with emotional context — and that context changes how memories are retrieved. Emotional arousal modulates encoding strength (amygdala-hippocampal interaction). Current mood biases which memories surface (Bower, 1981). This isn't a novelty feature — it's the foundation for AI that remembers like it cares.
450
+ ## Development
803
451
 
804
- **Why a dream cycle?** Biological sleep isn't downtime — it's when the brain consolidates episodic memories into long-term semantic knowledge, prunes weak connections, and strengthens important ones. Audrey's `dream()` does the same: cluster episodes, extract principles, apply decay, report health. Wire it into session hooks and your agent gets smarter every time it sleeps.
452
+ ```bash
453
+ npm ci
454
+ npm test
455
+ npm run pack:check
456
+ npm run bench:memory
457
+ npm run bench:memory:retrieval
458
+ npm run bench:memory:operations
459
+ npm run bench:memory:check
460
+ npm run bench:memory:readme-assets
461
+ ```
462
+
463
+ Current validated baseline:
464
+
465
+ - `npm test`
466
+ - `npm run pack:check`
467
+ - `npm run bench:memory`
468
+ - `npm run bench:memory:retrieval`
469
+ - `npm run bench:memory:operations`
470
+ - `npm run bench:memory:check`
471
+ - `npm run bench:memory:readme-assets`
805
472
 
806
473
  ## License
807
474
 
808
- MIT
475
+ MIT. See [LICENSE](LICENSE).