audrey 0.16.1 → 0.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -21
- package/README.md +310 -643
- package/benchmarks/baselines.js +169 -0
- package/benchmarks/cases.js +421 -0
- package/benchmarks/reference-results.js +70 -0
- package/benchmarks/report.js +255 -0
- package/benchmarks/run.js +514 -0
- package/docs/assets/benchmarks/local-benchmark.svg +45 -0
- package/docs/assets/benchmarks/operations-benchmark.svg +45 -0
- package/docs/assets/benchmarks/published-memory-standards.svg +50 -0
- package/docs/benchmarking.md +151 -0
- package/docs/production-readiness.md +96 -0
- package/examples/fintech-ops-demo.js +67 -0
- package/examples/healthcare-ops-demo.js +67 -0
- package/examples/stripe-demo.js +105 -0
- package/mcp-server/config.js +80 -27
- package/mcp-server/index.js +611 -75
- package/mcp-server/serve.js +482 -0
- package/package.json +24 -5
- package/src/audrey.js +51 -13
- package/src/consolidate.js +70 -54
- package/src/db.js +22 -1
- package/src/embedding.js +16 -12
- package/src/encode.js +8 -2
- package/src/fts.js +134 -0
- package/src/import.js +28 -0
- package/src/llm.js +6 -3
- package/src/migrate.js +2 -2
- package/src/recall.js +253 -32
- package/src/utils.js +25 -0
- package/types/index.d.ts +434 -0
package/README.md
CHANGED
|
@@ -1,68 +1,81 @@
|
|
|
1
1
|
# Audrey
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://github.com/Evilander/Audrey/actions/workflows/ci.yml)
|
|
4
|
+
[](https://www.npmjs.com/package/audrey)
|
|
5
|
+
[](LICENSE)
|
|
4
6
|
|
|
5
|
-
|
|
7
|
+
Persistent memory for Claude Code and AI agents. Two commands, every session remembers.
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
```bash
|
|
10
|
+
npx audrey install # 13 MCP memory tools
|
|
11
|
+
npx audrey hooks install # automatic memory in every session
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
That's it. Claude Code now wakes up knowing what happened yesterday, recalls relevant context per-prompt, and consolidates learnings when the session ends. No cloud, no config files, no infrastructure — one SQLite file.
|
|
15
|
+
|
|
16
|
+
Audrey also works as a standalone SDK, MCP server, and REST API for any AI agent framework.
|
|
17
|
+
|
|
18
|
+
> **On `/dream`** — Anthropic recently shipped `/dream` for Claude Code memory maintenance. Audrey predates it and goes further: episodic-to-semantic consolidation, contradiction detection, confidence decay, emotional affect, causal reasoning, and source reliability weighting. `/dream` is a maintenance pass. Audrey is a cognitive memory architecture.
|
|
19
|
+
|
|
20
|
+
## Why Audrey
|
|
21
|
+
|
|
22
|
+
Most AI memory tools are storage wrappers. They save facts, retrieve facts, and keep everything forever. That leaves real production problems unsolved:
|
|
8
23
|
|
|
9
|
-
-
|
|
10
|
-
-
|
|
11
|
-
-
|
|
12
|
-
-
|
|
24
|
+
- Old information stays weighted like new information.
|
|
25
|
+
- Raw events never become reusable operating knowledge.
|
|
26
|
+
- Conflicting facts quietly coexist.
|
|
27
|
+
- Model-generated mistakes can get reinforced into false "truth."
|
|
13
28
|
|
|
14
|
-
Audrey
|
|
29
|
+
Audrey models memory as a working system instead of a filing cabinet.
|
|
15
30
|
|
|
16
31
|
| Brain Structure | Audrey Component | What It Does |
|
|
17
32
|
|---|---|---|
|
|
18
33
|
| Hippocampus | Episodic Memory | Fast capture of raw events and observations |
|
|
19
34
|
| Neocortex | Semantic Memory | Consolidated principles and patterns |
|
|
20
35
|
| Cerebellum | Procedural Memory | Learned workflows and conditional behaviors |
|
|
21
|
-
| Sleep Replay | Dream Cycle | Consolidates episodes into principles
|
|
22
|
-
| Prefrontal Cortex | Validation Engine | Truth-checking
|
|
23
|
-
| Amygdala | Affect System | Emotional encoding, arousal-salience coupling, mood-congruent recall |
|
|
36
|
+
| Sleep Replay | Dream Cycle | Consolidates episodes into principles and applies decay |
|
|
37
|
+
| Prefrontal Cortex | Validation Engine | Truth-checking and contradiction detection |
|
|
38
|
+
| Amygdala | Affect System | Emotional encoding, arousal-salience coupling, and mood-congruent recall |
|
|
39
|
+
|
|
40
|
+
## What You Get
|
|
41
|
+
|
|
42
|
+
- Local SQLite-backed memory with `sqlite-vec`
|
|
43
|
+
- MCP server for Claude Code with 13 memory tools
|
|
44
|
+
- **Claude Code hooks integration** — automatic memory in every session (`npx audrey hooks install`)
|
|
45
|
+
- JavaScript SDK for direct application use
|
|
46
|
+
- **Git-friendly versioning** via JSON snapshots (`npx audrey snapshot` / `restore`)
|
|
47
|
+
- **REST API server** - any language, any framework (`npx audrey serve`)
|
|
48
|
+
- Health checks via `npx audrey status --json`
|
|
49
|
+
- Benchmark harness with retrieval and lifecycle-operation tracks via `npm run bench:memory`
|
|
50
|
+
- Regression gate for benchmark quality via `npm run bench:memory:check`
|
|
51
|
+
- Optional local embeddings and optional hosted LLM providers
|
|
52
|
+
- Strongest production fit today in financial services ops and healthcare ops
|
|
24
53
|
|
|
25
54
|
## Install
|
|
26
55
|
|
|
27
|
-
### MCP Server for Claude Code
|
|
56
|
+
### MCP Server for Claude Code
|
|
28
57
|
|
|
29
58
|
```bash
|
|
30
|
-
npx audrey install
|
|
59
|
+
npx audrey install # Register 13 MCP memory tools
|
|
60
|
+
npx audrey hooks install # Wire automatic memory into session lifecycle
|
|
31
61
|
```
|
|
32
62
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
- `GOOGLE_API_KEY` or `GEMINI_API_KEY` set? Uses Gemini embeddings (3072d).
|
|
36
|
-
- Neither? Runs with local embeddings (384d, MiniLM via @huggingface/transformers — zero API key, works offline).
|
|
37
|
-
- `AUDREY_EMBEDDING_PROVIDER=openai` for explicit OpenAI embeddings (1536d).
|
|
38
|
-
- `ANTHROPIC_API_KEY` set? Enables LLM-powered consolidation, contradiction detection, and reflection.
|
|
39
|
-
|
|
40
|
-
```bash
|
|
41
|
-
# Check status
|
|
42
|
-
npx audrey status
|
|
43
|
-
|
|
44
|
-
# Uninstall
|
|
45
|
-
npx audrey uninstall
|
|
46
|
-
```
|
|
63
|
+
Audrey auto-detects providers from your environment:
|
|
47
64
|
|
|
48
|
-
|
|
65
|
+
- `GOOGLE_API_KEY` or `GEMINI_API_KEY` -> Gemini embeddings (3072d)
|
|
66
|
+
- no embedding key -> local embeddings (384d, MiniLM, offline-capable)
|
|
67
|
+
- `AUDREY_EMBEDDING_PROVIDER=openai` -> explicit OpenAI embeddings (1536d)
|
|
68
|
+
- `ANTHROPIC_API_KEY` -> LLM-powered consolidation, contradiction detection, and reflection
|
|
49
69
|
|
|
50
|
-
|
|
70
|
+
Quick checks:
|
|
51
71
|
|
|
52
72
|
```bash
|
|
53
|
-
npx audrey
|
|
54
|
-
npx audrey
|
|
55
|
-
npx audrey status
|
|
56
|
-
npx audrey greeting # Output session briefing (mood, principles, recent memories)
|
|
57
|
-
npx audrey greeting "auth" # Briefing + context-relevant memories for "auth"
|
|
58
|
-
npx audrey reflect # Reflect on conversation + dream cycle (reads turns from stdin)
|
|
59
|
-
npx audrey dream # Run consolidation + decay cycle
|
|
60
|
-
npx audrey reembed # Re-embed all memories with current provider
|
|
73
|
+
npx audrey status
|
|
74
|
+
npx audrey status --json
|
|
75
|
+
npx audrey status --json --fail-on-unhealthy
|
|
61
76
|
```
|
|
62
77
|
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
### SDK in Your Code
|
|
78
|
+
### SDK
|
|
66
79
|
|
|
67
80
|
```bash
|
|
68
81
|
npm install audrey
|
|
@@ -70,739 +83,393 @@ npm install audrey
|
|
|
70
83
|
|
|
71
84
|
Zero external infrastructure. One SQLite file.
|
|
72
85
|
|
|
73
|
-
##
|
|
86
|
+
## Quick Start
|
|
74
87
|
|
|
75
88
|
```js
|
|
76
89
|
import { Audrey } from 'audrey';
|
|
77
90
|
|
|
78
|
-
// 1. Create a brain
|
|
79
91
|
const brain = new Audrey({
|
|
80
92
|
dataDir: './agent-memory',
|
|
81
|
-
agent: '
|
|
82
|
-
embedding: { provider: 'local', dimensions: 384 },
|
|
93
|
+
agent: 'support-agent',
|
|
94
|
+
embedding: { provider: 'local', dimensions: 384 },
|
|
83
95
|
});
|
|
84
96
|
|
|
85
|
-
// 2. Encode observations — with optional emotional context
|
|
86
97
|
await brain.encode({
|
|
87
|
-
content: 'Stripe API
|
|
98
|
+
content: 'Stripe API returned 429 above 100 req/s',
|
|
88
99
|
source: 'direct-observation',
|
|
89
100
|
tags: ['stripe', 'rate-limit'],
|
|
101
|
+
context: { task: 'debugging', domain: 'payments' },
|
|
90
102
|
affect: { valence: -0.4, arousal: 0.7, label: 'frustration' },
|
|
91
103
|
});
|
|
92
104
|
|
|
93
|
-
// 3. Recall what you know — mood-congruent retrieval
|
|
94
105
|
const memories = await brain.recall('stripe rate limits', {
|
|
95
106
|
limit: 5,
|
|
96
|
-
|
|
107
|
+
context: { task: 'debugging', domain: 'payments' },
|
|
97
108
|
});
|
|
98
109
|
|
|
99
|
-
// 4. Filtered recall — by tag, source, or date range
|
|
100
|
-
const recent = await brain.recall('stripe', {
|
|
101
|
-
tags: ['rate-limit'],
|
|
102
|
-
sources: ['direct-observation'],
|
|
103
|
-
after: '2026-02-01T00:00:00Z',
|
|
104
|
-
context: { task: 'debugging', domain: 'payments' }, // context-dependent retrieval
|
|
105
|
-
});
|
|
106
|
-
|
|
107
|
-
// 5. Dream — the biological sleep cycle
|
|
108
110
|
const dream = await brain.dream();
|
|
109
|
-
// Consolidates episodes into principles, applies forgetting curves, reports health
|
|
110
|
-
|
|
111
|
-
// 6. Reflect on a conversation — form lasting memories
|
|
112
|
-
const result = await brain.reflect([
|
|
113
|
-
{ role: 'user', content: 'How do I handle rate limits?' },
|
|
114
|
-
{ role: 'assistant', content: 'Use exponential backoff with jitter...' },
|
|
115
|
-
]);
|
|
116
|
-
// LLM extracts what matters, encodes it as lasting memories
|
|
117
|
-
|
|
118
|
-
// 7. Session greeting — wake up with context
|
|
119
111
|
const briefing = await brain.greeting({ context: 'debugging stripe' });
|
|
120
|
-
// Returns mood, principles, recent memories, identity, unresolved threads
|
|
121
|
-
|
|
122
|
-
// 8. Forget something
|
|
123
|
-
brain.forget(memoryId); // soft-delete
|
|
124
|
-
brain.forget(memoryId, { purge: true }); // hard-delete
|
|
125
|
-
await brain.forgetByQuery('old API endpoint', { minSimilarity: 0.9 });
|
|
126
|
-
|
|
127
|
-
// 9. Check brain health
|
|
128
|
-
const stats = brain.introspect();
|
|
129
|
-
// { episodic: 47, semantic: 12, procedural: 3, dormant: 8, ... }
|
|
130
112
|
|
|
131
|
-
|
|
113
|
+
await brain.waitForIdle();
|
|
132
114
|
brain.close();
|
|
133
115
|
```
|
|
134
116
|
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
```js
|
|
138
|
-
const brain = new Audrey({
|
|
139
|
-
dataDir: './audrey-data', // SQLite database directory
|
|
140
|
-
agent: 'my-agent', // Agent identifier
|
|
141
|
-
|
|
142
|
-
// Embedding provider (required)
|
|
143
|
-
embedding: {
|
|
144
|
-
provider: 'local', // 'mock' (test), 'local' (384d MiniLM), 'gemini' (3072d), 'openai' (1536d)
|
|
145
|
-
dimensions: 384, // Must match provider
|
|
146
|
-
apiKey: '...', // Required for gemini/openai
|
|
147
|
-
device: 'gpu', // 'gpu' or 'cpu' — for local provider only
|
|
148
|
-
},
|
|
149
|
-
|
|
150
|
-
// LLM provider (optional — enables smart consolidation + contradiction detection + reflection)
|
|
151
|
-
llm: {
|
|
152
|
-
provider: 'anthropic', // 'mock', 'anthropic', or 'openai'
|
|
153
|
-
apiKey: '...', // Required for anthropic/openai
|
|
154
|
-
model: 'claude-sonnet-4-6', // Optional model override
|
|
155
|
-
},
|
|
156
|
-
|
|
157
|
-
// Consolidation settings
|
|
158
|
-
consolidation: {
|
|
159
|
-
minEpisodes: 3, // Minimum cluster size for principle extraction
|
|
160
|
-
},
|
|
161
|
-
|
|
162
|
-
// Context-dependent retrieval
|
|
163
|
-
context: {
|
|
164
|
-
enabled: true, // Enable encoding-specificity principle
|
|
165
|
-
weight: 0.3, // Max 30% confidence boost on full context match
|
|
166
|
-
},
|
|
167
|
-
|
|
168
|
-
// Emotional memory
|
|
169
|
-
affect: {
|
|
170
|
-
enabled: true, // Enable affect system
|
|
171
|
-
weight: 0.2, // Max 20% mood-congruence boost
|
|
172
|
-
arousalWeight: 0.3, // Yerkes-Dodson arousal-salience coupling
|
|
173
|
-
resonance: { // Detect emotional echoes across experiences
|
|
174
|
-
enabled: true,
|
|
175
|
-
k: 5, // How many past episodes to check
|
|
176
|
-
threshold: 0.5, // Semantic similarity threshold
|
|
177
|
-
affectThreshold: 0.6, // Emotional similarity threshold
|
|
178
|
-
},
|
|
179
|
-
},
|
|
180
|
-
|
|
181
|
-
// Interference-based forgetting
|
|
182
|
-
interference: {
|
|
183
|
-
enabled: true, // New episodes suppress similar existing memories
|
|
184
|
-
weight: 0.15, // Suppression strength
|
|
185
|
-
},
|
|
186
|
-
|
|
187
|
-
// Decay settings
|
|
188
|
-
decay: {
|
|
189
|
-
dormantThreshold: 0.1, // Below this confidence = dormant
|
|
190
|
-
},
|
|
191
|
-
});
|
|
192
|
-
```
|
|
193
|
-
|
|
194
|
-
**Without an LLM provider**, consolidation uses a default text-based extractor and contradiction detection is similarity-only. **With an LLM provider**, Audrey extracts real generalized principles (semantic and procedural), detects semantic contradictions, resolves context-dependent truths, and reflects on conversations to form lasting memories.
|
|
117
|
+
## MCP Tools
|
|
195
118
|
|
|
196
|
-
|
|
119
|
+
Every Claude Code session gets these tools after `npx audrey install`:
|
|
197
120
|
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
121
|
+
- `memory_encode`
|
|
122
|
+
- `memory_recall`
|
|
123
|
+
- `memory_consolidate`
|
|
124
|
+
- `memory_dream`
|
|
125
|
+
- `memory_introspect`
|
|
126
|
+
- `memory_resolve_truth`
|
|
127
|
+
- `memory_export`
|
|
128
|
+
- `memory_import`
|
|
129
|
+
- `memory_forget`
|
|
130
|
+
- `memory_decay`
|
|
131
|
+
- `memory_status`
|
|
132
|
+
- `memory_reflect`
|
|
133
|
+
- `memory_greeting`
|
|
208
134
|
|
|
209
|
-
##
|
|
135
|
+
## CLI
|
|
210
136
|
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
**Procedural** (cold, slowest decay) — Learned workflows. "When Stripe rate-limits, implement exponential backoff." Skills the agent has acquired. Routed automatically when the LLM identifies a principle as procedural.
|
|
218
|
-
|
|
219
|
-
**Causal** — Why things happened. Not just "A then B" but "A caused B because of mechanism C." Prevents correlation-as-causation.
|
|
220
|
-
|
|
221
|
-
### Confidence Formula
|
|
222
|
-
|
|
223
|
-
Every memory has a compositional confidence score:
|
|
224
|
-
|
|
225
|
-
```
|
|
226
|
-
C(m, t) = w_s * S + w_e * E + w_r * R(t) + w_ret * Ret(t)
|
|
227
|
-
```
|
|
228
|
-
|
|
229
|
-
| Component | What It Measures | Default Weight |
|
|
230
|
-
|---|---|---|
|
|
231
|
-
| **S** — Source reliability | How trustworthy is the origin? | 0.30 |
|
|
232
|
-
| **E** — Evidence agreement | Do observations agree or contradict? | 0.35 |
|
|
233
|
-
| **R(t)** — Recency decay | How old is the memory? (Ebbinghaus curve) | 0.20 |
|
|
234
|
-
| **Ret(t)** — Retrieval reinforcement | How often is this memory accessed? | 0.15 |
|
|
235
|
-
|
|
236
|
-
Source reliability hierarchy:
|
|
237
|
-
|
|
238
|
-
| Source Type | Reliability |
|
|
239
|
-
|---|---|
|
|
240
|
-
| `direct-observation` | 0.95 |
|
|
241
|
-
| `told-by-user` | 0.90 |
|
|
242
|
-
| `tool-result` | 0.85 |
|
|
243
|
-
| `inference` | 0.60 |
|
|
244
|
-
| `model-generated` | 0.40 (capped at 0.6 confidence) |
|
|
245
|
-
|
|
246
|
-
The `model-generated` cap prevents circular self-confirmation — an agent can't boost its own hallucinations into high-confidence "facts."
|
|
247
|
-
|
|
248
|
-
### Decay (Forgetting Curves)
|
|
249
|
-
|
|
250
|
-
Unreinforced memories lose confidence over time following Ebbinghaus exponential decay:
|
|
251
|
-
|
|
252
|
-
| Memory Type | Half-Life | Rationale |
|
|
253
|
-
|---|---|---|
|
|
254
|
-
| Episodic | 7 days | Raw events go stale fast |
|
|
255
|
-
| Semantic | 30 days | Principles are hard-won |
|
|
256
|
-
| Procedural | 90 days | Skills are slowest to forget |
|
|
257
|
-
|
|
258
|
-
Retrieval resets the decay clock. Frequently accessed memories persist. Memories below the dormant threshold (0.1) become dormant — still searchable with `includeDormant: true`, but excluded from default recall.
|
|
259
|
-
|
|
260
|
-
### Dream Cycle (The "Sleep" Cycle)
|
|
261
|
-
|
|
262
|
-
`brain.dream()` runs the full biological sleep analog:
|
|
263
|
-
|
|
264
|
-
1. **Consolidate** — Cluster similar episodic memories via KNN, extract principles via LLM, route to semantic or procedural tables
|
|
265
|
-
2. **Decay** — Apply forgetting curves, transition low-confidence memories to dormant
|
|
266
|
-
3. **Introspect** — Report memory system health
|
|
267
|
-
|
|
268
|
-
The pipeline is fully transactional — if any cluster fails mid-run, all writes roll back. Consolidation is idempotent. Re-running on the same data produces no duplicates.
|
|
269
|
-
|
|
270
|
-
### Consolidation Routing
|
|
271
|
-
|
|
272
|
-
When the LLM extracts a principle, it classifies it:
|
|
273
|
-
|
|
274
|
-
- `type: 'semantic'` → goes to the `semantics` table (general knowledge)
|
|
275
|
-
- `type: 'procedural'` → goes to the `procedures` table with `trigger_conditions` (actionable skills)
|
|
137
|
+
```bash
|
|
138
|
+
# Setup
|
|
139
|
+
npx audrey install # Register MCP server with Claude Code
|
|
140
|
+
npx audrey uninstall # Remove MCP server registration
|
|
141
|
+
npx audrey hooks install # Wire Audrey into Claude Code hooks (automatic memory)
|
|
142
|
+
npx audrey hooks uninstall # Remove Audrey hooks
|
|
276
143
|
|
|
277
|
-
|
|
144
|
+
# Health and monitoring
|
|
145
|
+
npx audrey status # Human-readable health report
|
|
146
|
+
npx audrey status --json # Machine-readable health output
|
|
147
|
+
npx audrey status --json --fail-on-unhealthy # CI gate
|
|
278
148
|
|
|
279
|
-
|
|
149
|
+
# Session lifecycle (used by hooks automatically)
|
|
150
|
+
npx audrey greeting # Load identity, principles, mood
|
|
151
|
+
npx audrey greeting "auth" # With context-aware recall
|
|
152
|
+
npx audrey recall "query" # Semantic memory search (returns hook-compatible JSON)
|
|
153
|
+
npx audrey reflect # Consolidate learnings from stdin conversation + dream
|
|
280
154
|
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
155
|
+
# Maintenance
|
|
156
|
+
npx audrey dream # Full consolidation + decay cycle
|
|
157
|
+
npx audrey reembed # Re-embed all memories after provider/dimension change
|
|
284
158
|
|
|
285
|
-
|
|
159
|
+
# Versioning
|
|
160
|
+
npx audrey snapshot # Export memories to timestamped JSON file
|
|
161
|
+
npx audrey snapshot backup.json # Export to specific file
|
|
162
|
+
npx audrey restore backup.json # Restore from snapshot (re-embeds with current provider)
|
|
163
|
+
npx audrey restore backup.json --force # Overwrite existing memories
|
|
286
164
|
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
// Both true — under different conditions
|
|
165
|
+
# REST API server
|
|
166
|
+
npx audrey serve # Start HTTP server on port 3487
|
|
167
|
+
npx audrey serve 8080 # Custom port
|
|
291
168
|
```
|
|
292
169
|
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
### Forget and Purge
|
|
296
|
-
|
|
297
|
-
Memories can be explicitly forgotten — by ID or by semantic query:
|
|
298
|
-
|
|
299
|
-
**Soft-delete** (default) — Marks the memory as forgotten/superseded and removes its vector index. The record stays in the database but is excluded from recall. Reversible via direct database access.
|
|
300
|
-
|
|
301
|
-
**Hard-delete** (`purge: true`) — Permanently removes the memory from both the main table and the vector index. Irreversible.
|
|
302
|
-
|
|
303
|
-
**Bulk purge** — Removes all forgotten, dormant, superseded, and rolled-back memories in one operation. Useful for GDPR compliance or storage cleanup.
|
|
304
|
-
|
|
305
|
-
### Rollback
|
|
170
|
+
## Hooks Integration
|
|
306
171
|
|
|
307
|
-
|
|
172
|
+
Audrey integrates directly into Claude Code's hook lifecycle for automatic, zero-config memory in every session:
|
|
308
173
|
|
|
309
|
-
```
|
|
310
|
-
|
|
311
|
-
brain.rollback(history[0].id);
|
|
312
|
-
// Semantic memories -> rolled_back state
|
|
313
|
-
// Source episodes -> un-consolidated
|
|
314
|
-
// Full audit trail preserved
|
|
174
|
+
```bash
|
|
175
|
+
npx audrey hooks install
|
|
315
176
|
```
|
|
316
177
|
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
The most dangerous exploit in AI memory: agent hallucinates X, encodes it, later retrieves it, "reinforcement" boosts confidence, X eventually consolidates as "established truth."
|
|
320
|
-
|
|
321
|
-
Audrey's defenses:
|
|
322
|
-
|
|
323
|
-
1. **Source diversity requirement** — Consolidation requires evidence from 2+ distinct source types
|
|
324
|
-
2. **Model-generated cap** — Memories from `model-generated` sources are capped at 0.6 confidence
|
|
325
|
-
3. **Source lineage tracking** — Provenance chains detect when all evidence traces back to a single inference
|
|
326
|
-
4. **Source diversity score** — Every semantic memory tracks how many different source types contributed
|
|
327
|
-
|
|
328
|
-
## API Reference
|
|
178
|
+
This configures four hooks in `~/.claude/settings.json`:
|
|
329
179
|
|
|
330
|
-
|
|
180
|
+
| Hook Event | Command | What Happens |
|
|
181
|
+
|---|---|---|
|
|
182
|
+
| **SessionStart** | `npx audrey greeting` | Loads identity, learned principles, current mood, and recent memories |
|
|
183
|
+
| **UserPromptSubmit** | `npx audrey recall` | Semantic search on every prompt — injects relevant memories as context |
|
|
184
|
+
| **Stop** | `npx audrey reflect` | Extracts lasting learnings from the conversation, then runs a dream cycle |
|
|
185
|
+
| **PostCompact** | `npx audrey greeting` | Re-injects critical memories after context window compaction |
|
|
331
186
|
|
|
332
|
-
|
|
187
|
+
With hooks installed, Claude Code sessions automatically wake up with context, recall relevant memories per-prompt, and consolidate learnings when the session ends. No manual tool calls needed.
|
|
333
188
|
|
|
334
|
-
|
|
189
|
+
## REST API Server
|
|
335
190
|
|
|
336
|
-
|
|
191
|
+
Turn Audrey into an HTTP service that any language or framework can use:
|
|
337
192
|
|
|
338
|
-
```
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
salience: 0.8, // Optional. 0-1. Default: 0.5
|
|
343
|
-
causal: { // Optional. What caused this / what it caused.
|
|
344
|
-
trigger: 'batch-processing',
|
|
345
|
-
consequence: 'queue-backed-up',
|
|
346
|
-
},
|
|
347
|
-
tags: ['stripe', 'production'], // Optional. Array of strings.
|
|
348
|
-
supersedes: 'previous-id', // Optional. ID of episode this corrects.
|
|
349
|
-
context: { task: 'debugging' }, // Optional. Situational context for retrieval.
|
|
350
|
-
affect: { // Optional. Emotional context.
|
|
351
|
-
valence: -0.5, // -1 (negative) to 1 (positive)
|
|
352
|
-
arousal: 0.7, // 0 (calm) to 1 (activated)
|
|
353
|
-
label: 'frustration', // Human-readable emotion label
|
|
354
|
-
},
|
|
355
|
-
private: true, // Optional. If true, excluded from public recall.
|
|
356
|
-
});
|
|
193
|
+
```bash
|
|
194
|
+
npx audrey serve # Start on port 3487
|
|
195
|
+
npx audrey serve 8080 # Custom port
|
|
196
|
+
AUDREY_API_KEY=secret npx audrey serve # With Bearer token auth
|
|
357
197
|
```
|
|
358
198
|
|
|
359
|
-
|
|
199
|
+
Endpoints:
|
|
360
200
|
|
|
361
|
-
|
|
201
|
+
| Method | Path | Description |
|
|
202
|
+
|--------|------|-------------|
|
|
203
|
+
| `GET` | `/health` | Liveness probe |
|
|
204
|
+
| `GET` | `/status` | Memory stats (introspect) |
|
|
205
|
+
| `POST` | `/encode` | Store a memory (`{ content, source, tags?, context?, affect? }`) |
|
|
206
|
+
| `POST` | `/recall` | Semantic search (`{ query, limit?, context? }`) |
|
|
207
|
+
| `POST` | `/dream` | Full consolidation + decay cycle |
|
|
208
|
+
| `POST` | `/consolidate` | Run consolidation only |
|
|
209
|
+
| `POST` | `/forget` | Forget by `{ id }` or `{ query }` |
|
|
210
|
+
| `POST` | `/snapshot` | Export all memories as JSON |
|
|
211
|
+
| `POST` | `/restore` | Wipe and reimport from snapshot |
|
|
362
212
|
|
|
363
|
-
|
|
213
|
+
Example from any language:
|
|
364
214
|
|
|
365
|
-
```
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
{
|
|
370
|
-
]);
|
|
371
|
-
```
|
|
372
|
-
|
|
373
|
-
### `brain.recall(query, options)` -> `Promise<Memory[]>`
|
|
374
|
-
|
|
375
|
-
Retrieve memories ranked by `similarity * confidence`.
|
|
215
|
+
```bash
|
|
216
|
+
# Store a memory
|
|
217
|
+
curl -X POST http://localhost:3487/encode \
|
|
218
|
+
-H "Content-Type: application/json" \
|
|
219
|
+
-d '{"content": "The deploy failed due to OOM", "source": "direct-observation"}'
|
|
376
220
|
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
|
|
381
|
-
types: ['semantic'], // Filter by memory type
|
|
382
|
-
includeProvenance: true, // Include evidence chains
|
|
383
|
-
includeDormant: false, // Include dormant memories
|
|
384
|
-
tags: ['rate-limit'], // Only episodic memories with these tags
|
|
385
|
-
sources: ['direct-observation'], // Only episodic memories from these sources
|
|
386
|
-
after: '2026-02-01T00:00:00Z', // Only memories created after this date
|
|
387
|
-
before: '2026-03-01T00:00:00Z', // Only memories created before this date
|
|
388
|
-
context: { task: 'debugging' }, // Boost memories encoded in matching context
|
|
389
|
-
mood: { valence: -0.3, arousal: 0.5 }, // Mood-congruent retrieval
|
|
390
|
-
});
|
|
221
|
+
# Search memories
|
|
222
|
+
curl -X POST http://localhost:3487/recall \
|
|
223
|
+
-H "Content-Type: application/json" \
|
|
224
|
+
-d '{"query": "deploy failures", "limit": 5}'
|
|
391
225
|
```
|
|
392
226
|
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
Each result:
|
|
396
|
-
|
|
397
|
-
```js
|
|
398
|
-
{
|
|
399
|
-
id: '01ABC...',
|
|
400
|
-
content: 'Stripe enforces ~100 req/s rate limit',
|
|
401
|
-
type: 'semantic',
|
|
402
|
-
confidence: 0.87,
|
|
403
|
-
score: 0.74, // similarity * confidence
|
|
404
|
-
source: 'consolidation',
|
|
405
|
-
state: 'active',
|
|
406
|
-
contextMatch: 0.8, // When retrieval context provided
|
|
407
|
-
moodCongruence: 0.7, // When mood provided
|
|
408
|
-
provenance: { // When includeProvenance: true
|
|
409
|
-
evidenceEpisodeIds: ['01XYZ...', '01DEF...'],
|
|
410
|
-
evidenceCount: 3,
|
|
411
|
-
supportingCount: 3,
|
|
412
|
-
contradictingCount: 0,
|
|
413
|
-
},
|
|
414
|
-
}
|
|
415
|
-
```
|
|
227
|
+
## Versioning
|
|
416
228
|
|
|
417
|
-
|
|
229
|
+
Audrey stores memories in SQLite with WAL mode, which isn't git-friendly. Instead, use JSON snapshots:
|
|
418
230
|
|
|
419
|
-
|
|
231
|
+
```bash
|
|
232
|
+
# Save a checkpoint
|
|
233
|
+
npx audrey snapshot
|
|
420
234
|
|
|
421
|
-
|
|
235
|
+
# Commit it
|
|
236
|
+
git add audrey-snapshot-*.json && git commit -m "memory checkpoint"
|
|
422
237
|
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
console.log(memory.content, memory.score);
|
|
426
|
-
if (memory.score > 0.9) break;
|
|
427
|
-
}
|
|
238
|
+
# Restore on another machine or after a reset
|
|
239
|
+
npx audrey restore audrey-snapshot-2026-03-24_15-30-00.json
|
|
428
240
|
```
|
|
429
241
|
|
|
430
|
-
|
|
242
|
+
Snapshots are human-readable, diffable, and provider-agnostic. Embeddings are re-generated on import, so you can switch providers (e.g., local to Gemini) and restore seamlessly.
|
|
431
243
|
|
|
432
|
-
|
|
244
|
+
## Production Fit
|
|
433
245
|
|
|
434
|
-
|
|
435
|
-
const result = await brain.dream({
|
|
436
|
-
minClusterSize: 3, // Min episodes per cluster
|
|
437
|
-
similarityThreshold: 0.85, // KNN clustering threshold
|
|
438
|
-
dormantThreshold: 0.1, // Below this = dormant
|
|
439
|
-
});
|
|
440
|
-
// {
|
|
441
|
-
// consolidation: { episodesEvaluated, clustersFound, principlesExtracted, semanticsCreated, proceduresCreated },
|
|
442
|
-
// decay: { totalEvaluated, transitionedToDormant },
|
|
443
|
-
// stats: { episodic, semantic, procedural, ... },
|
|
444
|
-
// }
|
|
445
|
-
```
|
|
246
|
+
Audrey is strongest today in workflows where memory must stay local, reviewable, and durable:
|
|
446
247
|
|
|
447
|
-
|
|
248
|
+
- **Financial services operations**: payments ops, fraud and dispute workflows, KYC/KYB review, internal policy assistants
|
|
249
|
+
- **Healthcare operations**: care coordination, prior-auth workflows, intake and referral routing, internal staff knowledge assistants
|
|
448
250
|
|
|
449
|
-
|
|
251
|
+
Audrey is a memory layer, not a compliance boundary. For regulated environments, pair it with application-level access control, encryption, retention, audit logging, and data-minimization rules.
|
|
450
252
|
|
|
451
|
-
|
|
452
|
-
const result = await brain.reflect([
|
|
453
|
-
{ role: 'user', content: 'How do I handle rate limits?' },
|
|
454
|
-
{ role: 'assistant', content: 'Use exponential backoff...' },
|
|
455
|
-
]);
|
|
456
|
-
// { encoded: 2, memories: [...] }
|
|
457
|
-
```
|
|
253
|
+
Production guide: [docs/production-readiness.md](docs/production-readiness.md)
|
|
458
254
|
|
|
459
|
-
|
|
255
|
+
Industry demos:
|
|
460
256
|
|
|
461
|
-
|
|
257
|
+
- [examples/fintech-ops-demo.js](examples/fintech-ops-demo.js)
|
|
258
|
+
- [examples/healthcare-ops-demo.js](examples/healthcare-ops-demo.js)
|
|
462
259
|
|
|
463
|
-
|
|
464
|
-
const briefing = await brain.greeting({
|
|
465
|
-
context: 'debugging stripe', // Optional — also returns relevant memories
|
|
466
|
-
recentLimit: 10,
|
|
467
|
-
principleLimit: 5,
|
|
468
|
-
identityLimit: 5,
|
|
469
|
-
});
|
|
470
|
-
// { recent, principles, mood, unresolved, identity, contextual }
|
|
471
|
-
```
|
|
260
|
+
## Core Concepts
|
|
472
261
|
|
|
473
|
-
###
|
|
262
|
+
### Memory Types
|
|
474
263
|
|
|
475
|
-
|
|
264
|
+
- **Episodic**: raw events and observations
|
|
265
|
+
- **Semantic**: consolidated principles
|
|
266
|
+
- **Procedural**: reusable workflows and actions
|
|
267
|
+
- **Causal**: relationships that explain why something happened
|
|
476
268
|
|
|
477
|
-
|
|
478
|
-
brain.forget(memoryId); // soft-delete
|
|
479
|
-
brain.forget(memoryId, { purge: true }); // hard-delete (permanent)
|
|
480
|
-
// { id, type: 'episodic', purged: false }
|
|
481
|
-
```
|
|
482
|
-
|
|
483
|
-
### `brain.forgetByQuery(query, options)` -> `Promise<ForgetResult | null>`
|
|
269
|
+
### Confidence
|
|
484
270
|
|
|
485
|
-
|
|
271
|
+
Audrey scores memories using source reliability, evidence agreement, recency decay, and retrieval reinforcement. That helps keep direct observations above guesses and keeps stale or weakly supported knowledge from dominating recall.
|
|
486
272
|
|
|
487
|
-
|
|
488
|
-
const result = await brain.forgetByQuery('old API endpoint', {
|
|
489
|
-
minSimilarity: 0.9, // Threshold for match (default 0.9)
|
|
490
|
-
purge: false, // Hard-delete? (default false)
|
|
491
|
-
});
|
|
492
|
-
// null if no match above threshold
|
|
493
|
-
```
|
|
273
|
+
### Dream Cycle
|
|
494
274
|
|
|
495
|
-
|
|
275
|
+
`brain.dream()` runs the full maintenance path:
|
|
496
276
|
|
|
497
|
-
|
|
277
|
+
1. Consolidate related episodes into principles.
|
|
278
|
+
2. Apply decay so low-value memories lose weight over time.
|
|
279
|
+
3. Report memory health and current stats.
|
|
498
280
|
|
|
499
|
-
|
|
500
|
-
const counts = brain.purge();
|
|
501
|
-
// { episodes: 12, semantics: 3, procedures: 0 }
|
|
502
|
-
```
|
|
281
|
+
### Contradiction Handling
|
|
503
282
|
|
|
504
|
-
|
|
283
|
+
When evidence conflicts, Audrey tracks the contradiction instead of silently picking a winner. Resolutions can stay open, be marked resolved, or become context-dependent.
|
|
505
284
|
|
|
506
|
-
|
|
285
|
+
## Configuration
|
|
507
286
|
|
|
508
287
|
```js
|
|
509
|
-
const
|
|
510
|
-
|
|
511
|
-
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
}
|
|
288
|
+
const brain = new Audrey({
|
|
289
|
+
dataDir: './audrey-data',
|
|
290
|
+
agent: 'my-agent',
|
|
291
|
+
embedding: {
|
|
292
|
+
provider: 'local', // mock | local | gemini | openai
|
|
293
|
+
dimensions: 384,
|
|
294
|
+
device: 'gpu',
|
|
295
|
+
},
|
|
296
|
+
llm: {
|
|
297
|
+
provider: 'anthropic', // mock | anthropic | openai
|
|
298
|
+
apiKey: process.env.ANTHROPIC_API_KEY,
|
|
299
|
+
},
|
|
300
|
+
consolidation: {
|
|
301
|
+
minEpisodes: 3,
|
|
302
|
+
},
|
|
303
|
+
context: {
|
|
304
|
+
enabled: true,
|
|
305
|
+
weight: 0.3,
|
|
306
|
+
},
|
|
307
|
+
affect: {
|
|
308
|
+
enabled: true,
|
|
309
|
+
weight: 0.2,
|
|
310
|
+
},
|
|
311
|
+
decay: {
|
|
312
|
+
dormantThreshold: 0.1,
|
|
313
|
+
},
|
|
517
314
|
});
|
|
518
|
-
// { runId, status, episodesEvaluated, clustersFound, principlesExtracted, semanticsCreated, proceduresCreated }
|
|
519
|
-
```
|
|
520
|
-
|
|
521
|
-
### `brain.decay(options)` -> `DecayResult`
|
|
522
|
-
|
|
523
|
-
Apply forgetting curves. Transitions low-confidence memories to dormant.
|
|
524
|
-
|
|
525
|
-
```js
|
|
526
|
-
const result = brain.decay({ dormantThreshold: 0.1 });
|
|
527
|
-
// { totalEvaluated, transitionedToDormant, timestamp }
|
|
528
315
|
```
|
|
529
316
|
|
|
530
|
-
|
|
317
|
+
## Operations
|
|
531
318
|
|
|
532
|
-
|
|
319
|
+
Recommended production workflow:
|
|
533
320
|
|
|
534
|
-
```
|
|
535
|
-
|
|
536
|
-
|
|
537
|
-
|
|
538
|
-
|
|
539
|
-
### `brain.rollback(runId)` -> `RollbackResult`
|
|
321
|
+
```bash
|
|
322
|
+
# Health checks
|
|
323
|
+
npx audrey status
|
|
324
|
+
npx audrey status --json --fail-on-unhealthy
|
|
540
325
|
|
|
541
|
-
|
|
326
|
+
# Scheduled maintenance
|
|
327
|
+
npx audrey dream
|
|
542
328
|
|
|
543
|
-
|
|
544
|
-
|
|
545
|
-
// { rolledBackMemories: 3, restoredEpisodes: 9 }
|
|
546
|
-
```
|
|
329
|
+
# Repair vector/index drift after provider or dimension changes
|
|
330
|
+
npx audrey reembed
|
|
547
331
|
|
|
548
|
-
|
|
332
|
+
# Version control your memories
|
|
333
|
+
npx audrey snapshot
|
|
334
|
+
npx audrey restore <file> --force
|
|
549
335
|
|
|
550
|
-
|
|
336
|
+
# Run the benchmark harness
|
|
337
|
+
npm run bench:memory
|
|
551
338
|
|
|
552
|
-
|
|
553
|
-
|
|
554
|
-
// { resolution: 'context_dependent', conditions: { a: 'live keys', b: 'test keys' }, explanation: '...' }
|
|
339
|
+
# Fail CI if Audrey drops below benchmark guardrails
|
|
340
|
+
npm run bench:memory:check
|
|
555
341
|
```
|
|
556
342
|
|
|
557
|
-
|
|
343
|
+
## Benchmarking
|
|
558
344
|
|
|
559
|
-
|
|
345
|
+
Audrey now ships with a memory benchmark harness built for three purposes:
|
|
560
346
|
|
|
561
|
-
|
|
562
|
-
|
|
563
|
-
|
|
564
|
-
// episodic: 247, semantic: 31, procedural: 8,
|
|
565
|
-
// causalLinks: 42, dormant: 15,
|
|
566
|
-
// contradictions: { open: 2, resolved: 7, context_dependent: 3, reopened: 0 },
|
|
567
|
-
// lastConsolidation: '2026-02-18T22:00:00Z',
|
|
568
|
-
// totalConsolidationRuns: 14,
|
|
569
|
-
// }
|
|
570
|
-
```
|
|
347
|
+
- measure Audrey against naive local baselines on LongMemEval-style memory abilities plus privacy and abstention checks
|
|
348
|
+
- measure Audrey on lifecycle operations that other memory systems usually hand-wave: update, overwrite, delete, merge, and abstain
|
|
349
|
+
- keep Audrey grounded against published LoCoMo results from leading memory systems
|
|
571
350
|
|
|
572
|
-
|
|
351
|
+
Run it with:
|
|
573
352
|
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
### `brain.export()` / `brain.import(snapshot)`
|
|
577
|
-
|
|
578
|
-
Export all memories as a JSON snapshot, or import from one. Full-fidelity: preserves consolidation metrics, run metadata, and config. Import re-embeds everything with the current provider in a single atomic transaction.
|
|
579
|
-
|
|
580
|
-
```js
|
|
581
|
-
const snapshot = brain.export(); // { version, episodes, semantics, procedures, consolidationMetrics, ... }
|
|
582
|
-
await brain.import(snapshot); // Re-embeds everything with current provider
|
|
583
|
-
```
|
|
584
|
-
|
|
585
|
-
### Events
|
|
586
|
-
|
|
587
|
-
```js
|
|
588
|
-
brain.on('encode', ({ id, content, source }) => { ... });
|
|
589
|
-
brain.on('reinforcement', ({ episodeId, targetId, similarity }) => { ... });
|
|
590
|
-
brain.on('contradiction', ({ episodeId, contradictionId, semanticId, resolution }) => { ... });
|
|
591
|
-
brain.on('consolidation', ({ runId, principlesExtracted }) => { ... });
|
|
592
|
-
brain.on('decay', ({ totalEvaluated, transitionedToDormant }) => { ... });
|
|
593
|
-
brain.on('dream', ({ consolidation, decay, stats }) => { ... });
|
|
594
|
-
brain.on('rollback', ({ runId, rolledBackMemories }) => { ... });
|
|
595
|
-
brain.on('forget', ({ id, type, purged }) => { ... });
|
|
596
|
-
brain.on('purge', ({ episodes, semantics, procedures }) => { ... });
|
|
597
|
-
brain.on('interference', ({ newEpisodeId, suppressedId, similarity }) => { ... });
|
|
598
|
-
brain.on('resonance', ({ episodeId, resonances }) => { ... });
|
|
599
|
-
brain.on('migration', ({ episodes, semantics, procedures }) => { ... });
|
|
600
|
-
brain.on('error', (err) => { ... });
|
|
353
|
+
```bash
|
|
354
|
+
npm run bench:memory
|
|
601
355
|
```
|
|
602
356
|
|
|
603
|
-
|
|
357
|
+
Artifacts land in `benchmarks/output/` as JSON, SVG charts, and an HTML report.
|
|
604
358
|
|
|
605
|
-
|
|
359
|
+
For CI and release gates:
|
|
606
360
|
|
|
607
|
-
|
|
608
|
-
|
|
609
|
-
```
|
|
610
|
-
audrey-data/
|
|
611
|
-
audrey.db <- Single SQLite file. WAL mode. That's your brain.
|
|
612
|
-
```
|
|
613
|
-
|
|
614
|
-
```
|
|
615
|
-
src/
|
|
616
|
-
audrey.js Main class. EventEmitter. Public API surface.
|
|
617
|
-
causal.js Causal graph management. LLM-powered mechanism articulation.
|
|
618
|
-
confidence.js Compositional confidence formula. Pure math.
|
|
619
|
-
consolidate.js "Sleep" cycle. KNN clustering -> LLM extraction -> promote.
|
|
620
|
-
db.js SQLite + sqlite-vec. Schema, vec0 tables, migrations.
|
|
621
|
-
decay.js Ebbinghaus forgetting curves.
|
|
622
|
-
embedding.js Pluggable providers (Mock, Local/MiniLM, Gemini, OpenAI). Batch embedding.
|
|
623
|
-
encode.js Immutable episodic memory creation + vec0 writes.
|
|
624
|
-
affect.js Emotional memory: arousal-salience coupling, mood-congruent recall, resonance.
|
|
625
|
-
context.js Context-dependent retrieval modifier (encoding specificity).
|
|
626
|
-
interference.js Competitive memory suppression (engram competition).
|
|
627
|
-
forget.js Soft-delete, hard-delete, query-based forget, bulk purge.
|
|
628
|
-
introspect.js Health dashboard queries.
|
|
629
|
-
llm.js Pluggable LLM providers (Mock, Anthropic, OpenAI).
|
|
630
|
-
prompts.js Structured prompt templates for LLM operations.
|
|
631
|
-
recall.js KNN retrieval + confidence scoring + filtered recall + streaming.
|
|
632
|
-
rollback.js Undo consolidation runs.
|
|
633
|
-
utils.js Date math, safe JSON parse.
|
|
634
|
-
validate.js KNN validation + LLM contradiction detection.
|
|
635
|
-
migrate.js Dimension migration re-embedding.
|
|
636
|
-
adaptive.js Adaptive consolidation parameter suggestions.
|
|
637
|
-
export.js Memory export (JSON snapshots with consolidation metrics).
|
|
638
|
-
import.js Memory import with batch re-embedding in atomic transactions.
|
|
639
|
-
index.js SDK barrel export (all providers, database utilities).
|
|
640
|
-
|
|
641
|
-
mcp-server/
|
|
642
|
-
index.js MCP tool server (13 tools, stdio transport) + CLI subcommands.
|
|
643
|
-
config.js Shared config (env var parsing, provider resolution, install arg builder).
|
|
361
|
+
```bash
|
|
362
|
+
npm run bench:memory:check
|
|
644
363
|
```
|
|
645
364
|
|
|
646
|
-
|
|
365
|
+
That command fails if Audrey drops below its minimum local score, local pass rate, or required margin over the strongest naive baseline.
|
|
647
366
|
|
|
648
|
-
|
|
649
|
-
|---|---|
|
|
650
|
-
| `episodes` | Immutable raw events (content, source, salience, causal context, affect, private flag) |
|
|
651
|
-
| `semantics` | Consolidated principles (content, state, evidence chain) |
|
|
652
|
-
| `procedures` | Learned workflows (trigger conditions, success/failure counts) |
|
|
653
|
-
| `causal_links` | Causal relationships (cause, effect, mechanism, link type) |
|
|
654
|
-
| `contradictions` | Dispute tracking (claims, state, resolution) |
|
|
655
|
-
| `consolidation_runs` | Audit trail (inputs, outputs, status, checkpoint cursor) |
|
|
656
|
-
| `consolidation_metrics` | Per-run metrics and confidence deltas |
|
|
657
|
-
| `vec_episodes` | sqlite-vec KNN index for episode embeddings |
|
|
658
|
-
| `vec_semantics` | sqlite-vec KNN index for semantic embeddings |
|
|
659
|
-
| `vec_procedures` | sqlite-vec KNN index for procedural embeddings |
|
|
660
|
-
| `audrey_config` | Dimension configuration, embedding model info, metadata |
|
|
661
|
-
|
|
662
|
-
All mutations use SQLite transactions. CHECK constraints enforce valid states and source types. Vector search uses sqlite-vec with cosine distance.
|
|
663
|
-
|
|
664
|
-
## Running Tests
|
|
367
|
+
For track-specific runs:
|
|
665
368
|
|
|
666
369
|
```bash
|
|
667
|
-
npm
|
|
668
|
-
npm run
|
|
370
|
+
npm run bench:memory:retrieval
|
|
371
|
+
npm run bench:memory:operations
|
|
669
372
|
```
|
|
670
373
|
|
|
671
|
-
|
|
374
|
+
For committed GitHub-friendly charts:
|
|
672
375
|
|
|
673
376
|
```bash
|
|
674
|
-
|
|
377
|
+
npm run bench:memory:readme-assets
|
|
675
378
|
```
|
|
676
379
|
|
|
677
|
-
|
|
678
|
-
|
|
679
|
-
---
|
|
680
|
-
|
|
681
|
-
## Changelog
|
|
682
|
-
|
|
683
|
-
### v0.16.0 (current)
|
|
684
|
-
|
|
685
|
-
- Version bump for npm publish with all v0.15.0 features included
|
|
686
|
-
- 463 tests across 29 test files
|
|
687
|
-
|
|
688
|
-
### v0.15.0 — Production Hardening + Dream Cycle
|
|
689
|
-
|
|
690
|
-
- `dream()` method: consolidation + decay + introspect (biological sleep analog)
|
|
691
|
-
- `memory_dream` MCP tool with configurable thresholds
|
|
692
|
-
- `greeting` and `reflect` CLI subcommands for hook integration
|
|
693
|
-
- Consolidation routes procedural principles to `procedures` table (previously all went to semantics)
|
|
694
|
-
- Fully transactional consolidation — mid-run failures roll back all writes
|
|
695
|
-
- Recall gracefully degrades per memory type (independent try/catch per KNN search)
|
|
696
|
-
- sqlite-vec crash guard for empty vector tables
|
|
697
|
-
- LLM JSON parsing strips markdown code fences from any provider
|
|
698
|
-
- Input validation: empty content rejected, 50K char limit, forget requires exactly one target
|
|
699
|
-
- Full-fidelity export/import: preserves consolidation metrics, run metadata, config
|
|
700
|
-
- Import uses batch embedding in a single atomic transaction
|
|
701
|
-
- Expanded SDK exports: all embedding/LLM providers, database utilities
|
|
702
|
-
- Shared `resolveLLMConfig()` for CLI commands
|
|
703
|
-
- 463 tests across 29 test files
|
|
704
|
-
|
|
705
|
-
### v0.14.0 — Memory Intelligence
|
|
380
|
+
### README Snapshot
|
|
706
381
|
|
|
707
|
-
-
|
|
708
|
-
- `memory_greeting` MCP tool — session-start context briefing
|
|
709
|
-
- `greeting()` method: mood, principles, identity, recent memories, unresolved threads
|
|
710
|
-
- `reflect()` method: LLM-powered conversation analysis and memory formation
|
|
711
|
-
- Rewritten consolidation prompt for deeper principle extraction
|
|
712
|
-
- Rewritten reflection prompt for relational and emotional depth
|
|
713
|
-
- `npx audrey status` shows last consolidation time
|
|
382
|
+
Local Audrey-vs-baseline results:
|
|
714
383
|
|
|
715
|
-
|
|
384
|
+

|
|
716
385
|
|
|
717
|
-
|
|
718
|
-
- True single-forward-pass batch embedding for LocalEmbeddingProvider
|
|
719
|
-
- Gemini `batchEmbedContents` API for batch embedding
|
|
720
|
-
- `reembedAll` uses `embedBatch` for performance
|
|
721
|
-
- `AUDREY_DEVICE` env var, `memoryStatus` reports device
|
|
386
|
+
Lifecycle operations benchmark:
|
|
722
387
|
|
|
723
|
-
|
|
388
|
+

|
|
724
389
|
|
|
725
|
-
|
|
726
|
-
- `GeminiEmbeddingProvider` — 3072d via Google text-embedding-004
|
|
727
|
-
- `private: true` memory flag — memories visible to AI only, excluded from public recall
|
|
728
|
-
- Auto-select embedding provider: local -> gemini (if API key present) -> explicit openai
|
|
729
|
-
- `npx audrey reembed` CLI subcommand for provider migration
|
|
730
|
-
- `reflect()` method for post-conversation memory formation
|
|
731
|
-
- 409 tests across 29 test files
|
|
390
|
+
Published comparison anchors from current LLM memory systems:
|
|
732
391
|
|
|
733
|
-
|
|
392
|
+

|
|
734
393
|
|
|
735
|
-
|
|
736
|
-
- Arousal-salience coupling via Yerkes-Dodson inverted-U curve
|
|
737
|
-
- Mood-congruent recall — matching emotional state boosts retrieval confidence
|
|
738
|
-
- Emotional resonance detection — new experiences that echo past emotional patterns emit events
|
|
739
|
-
- MCP server: `memory_encode` accepts `affect`, `memory_recall` accepts `mood`
|
|
394
|
+
**Current deterministic CI snapshot** (`node benchmarks/run.js --provider mock --dimensions 64`):
|
|
740
395
|
|
|
741
|
-
|
|
742
|
-
|
|
743
|
-
|
|
744
|
-
|
|
745
|
-
|
|
746
|
-
### v0.7.0 — Interference + Salience
|
|
747
|
-
|
|
748
|
-
- Interference-based forgetting: new memories competitively suppress similar existing ones
|
|
749
|
-
- Salience-weighted confidence: high-salience memories resist decay
|
|
750
|
-
- Spaced-repetition reconsolidation: retrieval intervals affect reinforcement strength
|
|
751
|
-
|
|
752
|
-
### v0.6.0 — Filtered Recall + Forget
|
|
753
|
-
|
|
754
|
-
- Filtered recall: tag, source, and date-range filters on `recall()` and `recallStream()`
|
|
755
|
-
- `forget()`, `forgetByQuery()`, `purge()`
|
|
756
|
-
- `memory_forget` and `memory_decay` MCP tools
|
|
757
|
-
|
|
758
|
-
### v0.5.0 — Feature Depth
|
|
759
|
-
|
|
760
|
-
- Configurable confidence weights and decay rates per instance
|
|
761
|
-
- Memory export/import (JSON snapshots with re-embedding)
|
|
762
|
-
- `memory_export` and `memory_import` MCP tools
|
|
763
|
-
- Auto-consolidation scheduling
|
|
764
|
-
- Adaptive consolidation parameter suggestions
|
|
765
|
-
|
|
766
|
-
### v0.3.1 — MCP Server
|
|
767
|
-
|
|
768
|
-
- MCP tool server via `@modelcontextprotocol/sdk` with stdio transport
|
|
769
|
-
- One-command install: `npx audrey install` (auto-detects API keys)
|
|
770
|
-
- CLI subcommands: `install`, `uninstall`, `status`
|
|
771
|
-
|
|
772
|
-
### v0.3.0 — Vector Performance
|
|
396
|
+
| Local track | Audrey | Best Baseline |
|
|
397
|
+
|---|---|---|
|
|
398
|
+
| Combined local benchmark | **100.0%** | 41.7% |
|
|
399
|
+
| Retrieval capabilities | **100.0%** | 56.3% |
|
|
400
|
+
| Memory operations | **100.0%** | 25.0% |
|
|
773
401
|
|
|
774
|
-
-
|
|
775
|
-
- KNN queries for recall, validation, and consolidation clustering
|
|
776
|
-
- Batch encoding API and streaming recall with async generators
|
|
402
|
+
Retrieval-family breakdown:
|
|
777
403
|
|
|
778
|
-
|
|
404
|
+
| Category | Audrey | Vector Only | Best Baseline |
|
|
405
|
+
|---|---|---|---|
|
|
406
|
+
| Information Extraction | 100% | 100% | 100% |
|
|
407
|
+
| Knowledge Updates | 100% | 50% | 50% |
|
|
408
|
+
| Multi-Session Reasoning | 100% | 100% | 100% |
|
|
409
|
+
| Temporal Reasoning | 100% | 100% | 100% |
|
|
410
|
+
| Abstention | 100% | 50% | 50% |
|
|
411
|
+
| Conflict Resolution | 100% | 50% | 50% |
|
|
412
|
+
| Procedural Learning | 100% | 0% | 0% |
|
|
413
|
+
| Privacy | 100% | 0% | 0% |
|
|
779
414
|
|
|
780
|
-
-
|
|
781
|
-
- Context-dependent truth resolution
|
|
782
|
-
- Configurable LLM providers (Mock, Anthropic, OpenAI)
|
|
415
|
+
Operation-family breakdown:
|
|
783
416
|
|
|
784
|
-
|
|
417
|
+
| Operation | Audrey | Vector Only | Best Baseline |
|
|
418
|
+
|---|---|---|---|
|
|
419
|
+
| Update / Overwrite | 100% | 50% | 50% |
|
|
420
|
+
| Delete + Abstain | 100% | 0% | 50% |
|
|
421
|
+
| Semantic Merge | 100% | 0% | 0% |
|
|
422
|
+
| Procedural Merge | 100% | 0% | 0% |
|
|
785
423
|
|
|
786
|
-
-
|
|
787
|
-
- Consolidation engine, contradiction lifecycle, rollback
|
|
788
|
-
- Circular self-confirmation defense, causal context, introspection
|
|
424
|
+
Published comparison anchors from the field (different benchmarks and conditions - included for field context, not direct comparison):
|
|
789
425
|
|
|
790
|
-
|
|
426
|
+
| System | Benchmark | Score | What it represents |
|
|
427
|
+
|---|---|---|---|
|
|
428
|
+
| **Audrey** | Internal retrieval + operations benchmark | **100.0%** | Update, overwrite, delete, merge, abstention, consolidation, privacy |
|
|
429
|
+
| MIRIX | Published LoCoMo | 85.4% | Typed multimodal memory |
|
|
430
|
+
| Letta Filesystem | Published LoCoMo | 74.0% | Context-engineering |
|
|
431
|
+
| Mem0 Graph Memory | Published LoCoMo | 68.5% | Graph memory |
|
|
432
|
+
| Mem0 | Published LoCoMo | 66.9% | Production baseline |
|
|
791
433
|
|
|
792
|
-
|
|
434
|
+
Primary comparison sources:
|
|
793
435
|
|
|
794
|
-
|
|
436
|
+
- [MIRIX paper](https://arxiv.org/abs/2507.07957)
|
|
437
|
+
- [Mem0 paper](https://arxiv.org/abs/2504.19413)
|
|
438
|
+
- [Letta benchmark write-up](https://www.letta.com/blog/benchmarking-ai-agent-memory)
|
|
439
|
+
- [LongMemEval paper](https://arxiv.org/abs/2410.10813)
|
|
795
440
|
|
|
796
|
-
|
|
441
|
+
Benchmark guide: [docs/benchmarking.md](docs/benchmarking.md)
|
|
797
442
|
|
|
798
|
-
|
|
443
|
+
## Repository
|
|
799
444
|
|
|
800
|
-
|
|
445
|
+
- Contributing guide: [CONTRIBUTING.md](CONTRIBUTING.md)
|
|
446
|
+
- Security policy: [SECURITY.md](SECURITY.md)
|
|
447
|
+
- CI workflow: [.github/workflows/ci.yml](.github/workflows/ci.yml)
|
|
448
|
+
- Benchmarking guide: [docs/benchmarking.md](docs/benchmarking.md)
|
|
801
449
|
|
|
802
|
-
|
|
450
|
+
## Development
|
|
803
451
|
|
|
804
|
-
|
|
452
|
+
```bash
|
|
453
|
+
npm ci
|
|
454
|
+
npm test
|
|
455
|
+
npm run pack:check
|
|
456
|
+
npm run bench:memory
|
|
457
|
+
npm run bench:memory:retrieval
|
|
458
|
+
npm run bench:memory:operations
|
|
459
|
+
npm run bench:memory:check
|
|
460
|
+
npm run bench:memory:readme-assets
|
|
461
|
+
```
|
|
462
|
+
|
|
463
|
+
Current validated baseline:
|
|
464
|
+
|
|
465
|
+
- `npm test`
|
|
466
|
+
- `npm run pack:check`
|
|
467
|
+
- `npm run bench:memory`
|
|
468
|
+
- `npm run bench:memory:retrieval`
|
|
469
|
+
- `npm run bench:memory:operations`
|
|
470
|
+
- `npm run bench:memory:check`
|
|
471
|
+
- `npm run bench:memory:readme-assets`
|
|
805
472
|
|
|
806
473
|
## License
|
|
807
474
|
|
|
808
|
-
MIT
|
|
475
|
+
MIT. See [LICENSE](LICENSE).
|