hippo-memory 1.15.0 → 1.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (97) hide show
  1. package/README.md +862 -861
  2. package/dist/audit.d.ts +1 -1
  3. package/dist/audit.d.ts.map +1 -1
  4. package/dist/audit.js.map +1 -1
  5. package/dist/cli.d.ts.map +1 -1
  6. package/dist/cli.js +1243 -3
  7. package/dist/cli.js.map +1 -1
  8. package/dist/customer-notes.d.ts +95 -0
  9. package/dist/customer-notes.d.ts.map +1 -0
  10. package/dist/customer-notes.js +296 -0
  11. package/dist/customer-notes.js.map +1 -0
  12. package/dist/db.d.ts.map +1 -1
  13. package/dist/db.js +731 -1
  14. package/dist/db.js.map +1 -1
  15. package/dist/graph-extract.d.ts +39 -0
  16. package/dist/graph-extract.d.ts.map +1 -0
  17. package/dist/graph-extract.js +141 -0
  18. package/dist/graph-extract.js.map +1 -0
  19. package/dist/graph-recall.d.ts +41 -0
  20. package/dist/graph-recall.d.ts.map +1 -0
  21. package/dist/graph-recall.js +246 -0
  22. package/dist/graph-recall.js.map +1 -0
  23. package/dist/graph.d.ts +137 -0
  24. package/dist/graph.d.ts.map +1 -0
  25. package/dist/graph.js +433 -0
  26. package/dist/graph.js.map +1 -0
  27. package/dist/incidents.d.ts +100 -0
  28. package/dist/incidents.d.ts.map +1 -0
  29. package/dist/incidents.js +322 -0
  30. package/dist/incidents.js.map +1 -0
  31. package/dist/index.d.ts +1 -0
  32. package/dist/index.d.ts.map +1 -1
  33. package/dist/index.js +1 -0
  34. package/dist/index.js.map +1 -1
  35. package/dist/memory.d.ts +6 -0
  36. package/dist/memory.d.ts.map +1 -1
  37. package/dist/memory.js +6 -0
  38. package/dist/memory.js.map +1 -1
  39. package/dist/policies.d.ts +149 -0
  40. package/dist/policies.d.ts.map +1 -0
  41. package/dist/policies.js +380 -0
  42. package/dist/policies.js.map +1 -0
  43. package/dist/processes.d.ts +104 -0
  44. package/dist/processes.d.ts.map +1 -0
  45. package/dist/processes.js +330 -0
  46. package/dist/processes.js.map +1 -0
  47. package/dist/project-briefs.d.ts +126 -0
  48. package/dist/project-briefs.d.ts.map +1 -0
  49. package/dist/project-briefs.js +453 -0
  50. package/dist/project-briefs.js.map +1 -0
  51. package/dist/search.d.ts +7 -0
  52. package/dist/search.d.ts.map +1 -1
  53. package/dist/search.js.map +1 -1
  54. package/dist/server.d.ts.map +1 -1
  55. package/dist/server.js +1028 -16
  56. package/dist/server.js.map +1 -1
  57. package/dist/skills.d.ts +98 -0
  58. package/dist/skills.d.ts.map +1 -0
  59. package/dist/skills.js +339 -0
  60. package/dist/skills.js.map +1 -0
  61. package/dist/src/audit.js.map +1 -1
  62. package/dist/src/cli.js +1243 -3
  63. package/dist/src/cli.js.map +1 -1
  64. package/dist/src/customer-notes.js +296 -0
  65. package/dist/src/customer-notes.js.map +1 -0
  66. package/dist/src/db.js +731 -1
  67. package/dist/src/db.js.map +1 -1
  68. package/dist/src/graph-extract.js +141 -0
  69. package/dist/src/graph-extract.js.map +1 -0
  70. package/dist/src/graph-recall.js +246 -0
  71. package/dist/src/graph-recall.js.map +1 -0
  72. package/dist/src/graph.js +433 -0
  73. package/dist/src/graph.js.map +1 -0
  74. package/dist/src/incidents.js +322 -0
  75. package/dist/src/incidents.js.map +1 -0
  76. package/dist/src/index.js +1 -0
  77. package/dist/src/index.js.map +1 -1
  78. package/dist/src/memory.js +6 -0
  79. package/dist/src/memory.js.map +1 -1
  80. package/dist/src/policies.js +380 -0
  81. package/dist/src/policies.js.map +1 -0
  82. package/dist/src/processes.js +330 -0
  83. package/dist/src/processes.js.map +1 -0
  84. package/dist/src/project-briefs.js +453 -0
  85. package/dist/src/project-briefs.js.map +1 -0
  86. package/dist/src/search.js.map +1 -1
  87. package/dist/src/server.js +1028 -16
  88. package/dist/src/server.js.map +1 -1
  89. package/dist/src/skills.js +339 -0
  90. package/dist/src/skills.js.map +1 -0
  91. package/dist/src/version.js +1 -1
  92. package/dist/version.d.ts +1 -1
  93. package/dist/version.js +1 -1
  94. package/extensions/openclaw-plugin/openclaw.plugin.json +1 -1
  95. package/extensions/openclaw-plugin/package.json +1 -1
  96. package/openclaw.plugin.json +1 -1
  97. package/package.json +2 -2
package/README.md CHANGED
@@ -1,861 +1,862 @@
1
- # 🦛 Hippo
2
-
3
- **The secret to good memory isn't remembering more. It's knowing what to forget.**
4
-
5
- [![npm](https://img.shields.io/npm/v/hippo-memory)](https://npmjs.com/package/hippo-memory)
6
- [![license](https://img.shields.io/badge/license-MIT-blue)](./LICENSE)
7
-
8
- <p align="center">
9
- <img src="./assets/hippo-init.svg" alt="hippo init --scan ~ — initializing memory across all repos" width="720">
10
- </p>
11
-
12
- A memory layer for AI agents. Modeled on the hippocampus. Decay by default, strength through use, provenance on every memory. SQLite under the hood, zero runtime deps, works with every CLI agent you have.
13
-
14
- ```bash
15
- npm install -g hippo-memory && hippo init --scan ~
16
- ```
17
-
18
- One command. Every git repo on your machine gets memory.
19
-
20
- ```
21
- Works with: Claude Code, Codex, Cursor, OpenClaw, OpenCode, Pi, any MCP client
22
- Imports from: ChatGPT, Claude (CLAUDE.md), Cursor (.cursorrules), Slack, markdown
23
- Storage: SQLite backbone with markdown mirrors. Git-trackable, human-readable.
24
- Dependencies: Zero runtime deps. Node.js 22.5+. Optional embeddings via @xenova/transformers.
25
- ```
26
-
27
- ---
28
-
29
- ## Why this exists
30
-
31
- Most "AI memory" systems save everything and search later. That's storage with semantic search bolted on. It's why your agent kept hitting the same deploy bug last week. And the week before. The system saw the failure four times. It had no way to know it should remember.
32
-
33
- Hippo applies the thing brains have been getting right for 500 million years. Memories decay over time. Retrieval makes them stronger. Three biological layers (buffer, episodic, semantic) consolidate during sleep. Hard lessons stick because you used them. Trivia fades because you didn't.
34
-
35
- It also fixes the portability problem. Your ChatGPT memories don't travel to Claude. Your `.cursorrules` don't travel to Codex. Hippo is one process behind every agent. CLAUDE.md, Cursor rules, ChatGPT exports, Slack history, all in one SQLite store, all queryable from any tool that speaks MCP or HTTP.
36
-
37
- ---
38
-
39
- ## Receipts
40
-
41
- Numbers, not adjectives. Every claim links to the benchmark or the test that proves it.
42
-
43
- - **Sequential Learning Benchmark.** [benchmarks/sequential-learning/](benchmarks/sequential-learning/). 50 tasks, 10 buried traps. Measures whether agents learn from past mistakes, not just retrieve text. v0.11.0 informal magnitude RETRACTED v1.7.9; mechanism remains shipped. See [CHANGELOG.md](./CHANGELOG.md) v1.7.9 entry.
44
- - **R@5 = 74.0%** on [LongMemEval](benchmarks/longmemeval/). 500-question industry retrieval benchmark, BM25 only, no embeddings.
45
- - **10 of 10 incident scenarios beat transcript replay** on a staged Slack corpus ([benchmarks/e1.3/](benchmarks/e1.3/)). Recall surfaces the cause faster than scrolling the last N messages.
46
- - **0 outbound HTTP** on the 1000-event ingestion smoke. Proven by a `globalThis.fetch` spy that throws on call, not a hardcoded zero.
47
- - **926 tests, real DB, zero mocks.** Project rule. The one mocks-vs-prod divergence that bit us early is now the constraint that kept the next ten releases honest.
48
- - **dlPFC goal-conditioned cluster discrimination, 3/3 queries pass** — full goal stack with policy weighting and lifespan-windowed outcome propagation. Per-goal lift on a 3-cluster fixture where BM25 alone cannot discriminate; deterministic test in [`benchmarks/micro/results/b3-depth.json`](benchmarks/micro/results/b3-depth.json).
49
-
50
- ---
51
-
52
- ## What it does for your agent
53
-
54
- - **Stops repeating mistakes.** Tag a failure with `--tag error` once, the lesson surfaces every time the agent walks back into that part of the code. Errors decay slower than ordinary observations.
55
- - **Survives tool switches.** Use Claude Code on Monday, Cursor on Tuesday, Codex on Wednesday. Same `.hippo/` store. Same memories. Pick up exactly where you left off.
56
- - **Ingests systems of record.** Slack today (`POST /v1/connectors/slack/events`). GitHub, Jira, Notion next. Webhooks land as `kind='raw'` memories with full provenance and GDPR-correct deletion.
57
- - **Knows where every memory came from.** Every row carries `kind`, `scope`, `owner`, and `artifact_ref`. Right-to-be-forgotten is a single API call, not an audit nightmare.
58
- - **Plays nice with multi-tenant.** API keys, scrypt-hashed. Audit log on every mutation. Tenant A literally cannot see tenant B's memories. Proven by negative test.
59
-
60
- ---
61
-
62
- ## Quick start
63
-
64
- ```bash
65
- npm install -g hippo-memory
66
-
67
- # Single project
68
- hippo init
69
-
70
- # All your projects at once (recommended)
71
- hippo init --scan ~
72
- ```
73
-
74
- `--scan` finds every git repo under your home directory, creates a `.hippo/` store in each one, and seeds it with lessons from the last 30 days of commit history. One command, instant memory across all your projects.
75
-
76
- After setup, `hippo sleep` runs at session end (via auto-installed agent hooks) and does five things:
77
-
78
- 1. **Learns** from today's git commits
79
- 2. **Imports** new entries from Claude Code MEMORY.md files
80
- 3. **Consolidates** memories (decay, merge, prune)
81
- 4. **Deduplicates** near-identical memories, keeping the stronger copy
82
- 5. **Shares** high-value lessons to a global store so they surface in every project
83
-
84
- ```bash
85
- # Manual usage
86
- hippo remember "FRED cache silently dropped the tips_10y series" --tag error
87
- hippo recall "data pipeline issues" --budget 2000
88
- ```
89
-
90
- ---
91
-
92
- Full release history: **[CHANGELOG.md](./CHANGELOG.md)** · [GitHub Releases](https://github.com/kitfunso/hippo-memory/releases)
93
-
94
-
95
- ### Zero-config agent integration
96
-
97
- `hippo init` auto-detects your agent framework and wires itself in:
98
-
99
- ```bash
100
- cd my-project
101
- hippo init
102
-
103
- # Initialized Hippo at /my-project
104
- # Directories: buffer/ episodic/ semantic/ conflicts/
105
- # Auto-installed claude-code hook in CLAUDE.md
106
- ```
107
-
108
- If you have a `CLAUDE.md`, it patches it. `AGENTS.md` for Codex/OpenClaw/OpenCode. `.cursorrules` for Cursor. For Codex, Hippo also wraps the detected launcher in place so `/exit` can consolidate memory without a manual PATH step. No manual `hook install` needed. Your agent starts using Hippo on its next session.
109
-
110
- It also registers the current project in Hippo's workspace registry and installs one machine-level daily runner (6:15am). That runner sweeps every registered workspace, runs `hippo learn --git --days 1`, then `hippo sleep`. You get strict daily consolidation without creating one OS task per project.
111
-
112
- To skip: `hippo init --no-hooks --no-schedule`
113
-
114
- ---
115
-
116
- ## Cross-Tool Import
117
-
118
- Your memories shouldn't be locked inside one tool. Hippo pulls them in from anywhere.
119
-
120
- ```bash
121
- # ChatGPT memory export
122
- hippo import --chatgpt memories.json
123
-
124
- # Claude's CLAUDE.md (skips existing hippo hook blocks)
125
- hippo import --claude CLAUDE.md
126
-
127
- # Cursor rules
128
- hippo import --cursor .cursorrules
129
-
130
- # Any markdown file (headings become tags)
131
- hippo import --markdown MEMORY.md
132
-
133
- # Any text file
134
- hippo import --file notes.txt
135
- ```
136
-
137
- All import commands support `--dry-run` (preview without writing), `--global` (write to `~/.hippo/`), and `--tag` (add extra tags). Duplicates are detected and skipped automatically.
138
-
139
- ### Conversation Capture
140
-
141
- Extract memories from raw conversation text. No LLM needed: pattern-based heuristics find decisions, rules, errors, and preferences.
142
-
143
- ```bash
144
- # Pipe a conversation in
145
- cat session.log | hippo capture --stdin
146
-
147
- # Or point at a file
148
- hippo capture --file conversation.md
149
-
150
- # Preview first
151
- hippo capture --file conversation.md --dry-run
152
- ```
153
-
154
- ### Slack ingestion (E1.3)
155
-
156
- Hippo accepts Slack Events API webhooks at `POST /v1/connectors/slack/events`. Configure `SLACK_SIGNING_SECRET` (validated on every request) and point Slack at `https://<your-host>/v1/connectors/slack/events`. Messages land as `kind='raw'` memories with `slack://team/channel/ts` provenance and a `slack:public:Cxxx` or `slack:private:Cxxx` scope. Source deletions are honored (GDPR).
157
-
158
- Backfill an existing channel: `SLACK_BOT_TOKEN=xoxb-... hippo slack backfill --channel C0000`. Inspect malformed events: `hippo slack dlq list`.
159
-
160
- Multi-workspace deployments populate `slack_workspaces (team_id, tenant_id)` to route events per tenant; single-workspace falls back to `HIPPO_TENANT`.
161
-
162
- ### Active task snapshots
163
-
164
- Long-running work needs short-term continuity, not just long-term memory. Hippo can persist the current in-flight task so a later `continue` has something concrete to recover.
165
-
166
- ```bash
167
- hippo snapshot save \
168
- --task "Ship SQLite backbone" \
169
- --summary "Tests/build/smoke are green, next slice is active-session recovery" \
170
- --next-step "Implement active snapshot retrieval in context output"
171
-
172
- hippo snapshot show
173
- hippo context --auto --budget 1500
174
- hippo snapshot clear
175
- ```
176
-
177
- `hippo context --auto` includes the active task snapshot before long-term memories, so agents get both the immediate thread and the deeper lessons.
178
-
179
- ### Session event trails
180
-
181
- Manual snapshots are useful, but real work also needs a breadcrumb trail. Hippo can now store short session events and link them to the active snapshot so context output shows the latest steps, not just the last summary.
182
-
183
- ```bash
184
- hippo session log \
185
- --id sess_20260326 \
186
- --task "Ship continuity" \
187
- --type progress \
188
- --content "Schema migration is done, next step is CLI wiring"
189
-
190
- hippo snapshot save \
191
- --task "Ship continuity" \
192
- --summary "Structured session events are flowing" \
193
- --next-step "Surface them in framework hooks" \
194
- --session sess_20260326
195
-
196
- hippo session show --id sess_20260326
197
- hippo context --auto --budget 1500
198
- ```
199
-
200
- Hippo mirrors the latest trail to `.hippo/buffer/recent-session.md` so you can inspect the short-term thread without opening SQLite.
201
-
202
- ### Session handoffs
203
-
204
- When you're done for the day (or switching to another agent), create a handoff so the next session knows exactly where to pick up:
205
-
206
- ```bash
207
- hippo handoff create \
208
- --summary "Finished schema migration, tests green" \
209
- --next "Wire handoff injection into context output" \
210
- --session sess_20260403 \
211
- --artifact src/db.ts
212
-
213
- hippo handoff latest # show the most recent handoff
214
- hippo handoff show 3 # show a specific handoff by ID
215
- hippo session resume # re-inject latest handoff as context
216
- ```
217
-
218
- ### Working memory
219
-
220
- Working memory is a bounded scratchpad for current-state notes. It's separate from long-term memory and gets cleared between sessions.
221
-
222
- ```bash
223
- hippo wm push --scope repo \
224
- --content "Investigating flaky test in store.test.ts, line 42" \
225
- --importance 0.9
226
-
227
- hippo wm read --scope repo # show current working notes
228
- hippo wm clear --scope repo # wipe the scratchpad
229
- hippo wm flush --scope repo # flush on session end
230
- ```
231
-
232
- The buffer holds a maximum of 20 entries per scope. When full, the lowest-importance entry is evicted.
233
-
234
- ### Explainable recall
235
-
236
- See why a memory was returned:
237
-
238
- ```bash
239
- hippo recall "data pipeline" --why --limit 5
240
-
241
- # --- mem_a1b2c3 [episodic] [observed] [local] score=0.847
242
- # BM25: matched [data, pipeline]; cosine: 0.82
243
- # ...memory content...
244
- ```
245
-
246
- ---
247
-
248
- ## How It Works
249
-
250
- Input enters the buffer. Important things get encoded into episodic memory. During "sleep," repeated episodes compress into semantic patterns. Weak memories decay and disappear.
251
-
252
- ```mermaid
253
- flowchart TD
254
- I[New information] --> B[Buffer<br/>session-only, no decay]
255
- B -->|encode: tags, strength, half-life| E[Episodic Store<br/>timestamped, decay by default<br/>retrieval strengthens, errors stick]
256
- E -->|hippo sleep<br/>replay + merge| S[Semantic Store<br/>compressed patterns, stable<br/>schema-aware]
257
- E -.->|decay| X[forgotten]
258
- S -.->|recall| E
259
- classDef bio fill:#fff4dc,stroke:#a8742d,color:#2b1b00
260
- classDef forgotten fill:#f5f5f5,stroke:#999,color:#666,stroke-dasharray:5 5
261
- class B,E,S bio
262
- class X forgotten
263
- ```
264
-
265
- ---
266
-
267
- ## Key Features
268
-
269
- A memory's life across a typical session, before walking each feature in turn:
270
-
271
- ```mermaid
272
- sequenceDiagram
273
- autonumber
274
- actor Agent
275
- participant B as Buffer
276
- participant E as Episodic
277
- participant S as Semantic
278
- Agent->>B: hippo remember "cache dropped tips_10y" --error
279
- B->>E: encode (half_life=14d, valence=neg)
280
- Note over E: strength=1.0
281
- Agent->>E: hippo recall "data pipeline"
282
- E-->>Agent: returns memory (rank 1)
283
- Note over E: half_life 14d → 16d, retrieval_count++
284
- Agent->>E: hippo outcome --good
285
- Note over E: reward_factor 1.0 → 1.15
286
- Agent->>S: hippo sleep
287
- S->>E: merge 3 related episodic → 1 semantic
288
- Note over E,S: original episodic decays, pattern survives
289
- ```
290
-
291
- ### Decay by default
292
-
293
- Every memory has a half-life. 7 days by default. Persistence is earned.
294
-
295
- ```bash
296
- hippo remember "always check cache contents after refresh"
297
- # stored with half_life: 7d, strength: 1.0
298
-
299
- # 14 days later with no retrieval:
300
- hippo inspect mem_a1b2c3
301
- # strength: 0.25 (decayed by 2 half-lives)
302
- # at risk of removal on next sleep
303
- ```
304
-
305
- ---
306
-
307
- ### Retrieval strengthens
308
-
309
- Use it or lose it. Each recall boosts the half-life by 2 days.
310
-
311
- ```bash
312
- hippo recall "cache issues"
313
- # finds mem_a1b2c3, retrieval_count: 1 -> 2
314
- # half_life extended: 7d -> 9d
315
- # strength recalculated from retrieval timestamp
316
-
317
- hippo recall "cache issues" # again next week
318
- # retrieval_count: 2 -> 3
319
- # half_life: 9d -> 11d
320
- # this memory is learning to survive
321
- ```
322
-
323
- ---
324
-
325
- ### Active invalidation
326
-
327
- When you migrate from one tool to another, old memories about the replaced tool should die immediately. Hippo detects migration and breaking-change commits during `hippo learn --git` and actively weakens matching memories.
328
-
329
- ```bash
330
- hippo learn --git
331
- # feat: migrate from webpack to vite
332
- # Invalidated 3 memories referencing "webpack"
333
- # Learned: migrate from webpack to vite
334
- ```
335
-
336
- You can also invalidate manually:
337
-
338
- ```bash
339
- hippo invalidate "REST API" --reason "migrated to GraphQL"
340
- # Invalidated 5 memories referencing "REST API".
341
- ```
342
-
343
- ---
344
-
345
- ### Architectural decisions
346
-
347
- One-off decisions don't repeat, so they can't earn their keep through retrieval alone. `hippo decide` stores them with a 90-day half-life and verified confidence so they survive long enough to matter.
348
-
349
- ```bash
350
- hippo decide "Use PostgreSQL for all new services" --context "JSONB support"
351
- # Decision recorded: mem_a1b2c3
352
-
353
- # Later, when the decision changes:
354
- hippo decide "Use CockroachDB for global services" \
355
- --context "Need multi-region" \
356
- --supersedes mem_a1b2c3
357
- # Superseded mem_a1b2c3 (half-life halved, marked stale)
358
- # Decision recorded: mem_d4e5f6
359
- ```
360
-
361
- ---
362
-
363
- ### Error memories stick
364
-
365
- Tag a memory as an error and it gets 2x the half-life automatically.
366
-
367
- ```bash
368
- hippo remember "deployment failed: forgot to run migrations" --error
369
- # half_life: 14d instead of 7d
370
- # emotional_valence: negative
371
- # strength formula applies 2.0x multiplier (HIPPO_LOSS_AVERSION_RATIO=0.75 to keep v1.13.4 1.5x)
372
-
373
- # production incidents don't fade quietly
374
- ```
375
-
376
- ---
377
-
378
- ### Confidence tiers
379
-
380
- Every memory carries a confidence level: `verified`, `observed`, `inferred`, or `stale`. This tells agents how much to trust what they're reading.
381
-
382
- ```bash
383
- hippo remember "API rate limit is 100/min" --verified
384
- hippo remember "deploy usually takes ~3 min" --observed
385
- hippo remember "the flaky test might be a race condition" --inferred
386
- ```
387
-
388
- When context is generated, confidence is shown inline:
389
-
390
- ```
391
- [verified] API rate limit is 100/min per the docs
392
- [observed] Deploy usually takes ~3 min
393
- [inferred] The flaky test might be a race condition
394
- ```
395
-
396
- Agents can see at a glance what's established fact vs. a pattern worth questioning.
397
-
398
- Memories unretrieved for 30+ days are automatically marked `stale` during the next `hippo sleep`. If one gets recalled again, Hippo wakes it back up to `observed` so it can earn trust again instead of staying permanently stale.
399
-
400
- ### Conflict tracking
401
-
402
- Hippo detects obvious contradictions between overlapping memories and keeps them visible instead of silently letting both masquerade as truth. Shared tags alone do not count; the statements themselves need to overlap in content.
403
-
404
- ```bash
405
- hippo sleep # refreshes open conflicts
406
- hippo conflicts # inspect them
407
- ```
408
-
409
- Open conflicts are stored in SQLite, mirrored under `.hippo/conflicts/`, and linked back into each memory's `conflicts_with` field.
410
-
411
- ---
412
-
413
- ### Observation framing
414
-
415
- Memories aren't presented as bare assertions. By default, Hippo frames them as observations with dates, so agents treat them as context rather than commands.
416
-
417
- ```bash
418
- hippo context --framing observe # default
419
- # Output: "Previously observed (2026-03-10): deploy takes ~3 min"
420
-
421
- hippo context --framing suggest
422
- # Output: "Consider: deploy takes ~3 min"
423
-
424
- hippo context --framing assert
425
- # Output: "Deploy takes ~3 min"
426
- ```
427
-
428
- Three modes: `observe` (default), `suggest`, `assert`. Choose based on how directive you want the memory to be.
429
-
430
- ---
431
-
432
- ### Sleep consolidation
433
-
434
- Run `hippo sleep` and episodes compress into patterns.
435
-
436
- ```bash
437
- hippo sleep
438
-
439
- # Running consolidation...
440
- #
441
- # Results:
442
- # Active memories: 23
443
- # Removed (decayed): 4
444
- # Merged episodic: 6
445
- # New semantic: 2
446
- ```
447
-
448
- Three or more related episodes get merged into a single semantic memory. The originals decay. The pattern survives.
449
-
450
- ---
451
-
452
- ### Outcome feedback
453
-
454
- Did the recalled memories actually help? Tell Hippo. It tightens the feedback loop.
455
-
456
- ```bash
457
- hippo recall "why is the gold model broken"
458
- # ... you read the memories and fix the bug ...
459
-
460
- hippo outcome --good
461
- # Applied positive outcome to 3 memories
462
- # reward factor increases, decay slows
463
-
464
- hippo outcome --bad
465
- # Applied negative outcome to 3 memories
466
- # reward factor decreases, decay accelerates
467
- ```
468
-
469
- Outcomes are cumulative. A memory with 5 positive outcomes and 0 negative has a reward factor of ~1.42, making its effective half-life 42% longer. A memory with 0 positive and 3 negative has a factor of ~0.63, decaying nearly twice as fast. Mixed outcomes converge toward neutral (1.0).
470
-
471
- ---
472
-
473
- ### Token budgets
474
-
475
- Recall only what fits. No context stuffing.
476
-
477
- ```bash
478
- # fits within Claude's 2K token window for task context
479
- hippo recall "deployment checklist" --budget 2000
480
-
481
- # need more for a big task
482
- hippo recall "full project history" --budget 8000
483
-
484
- # machine-readable for programmatic use
485
- hippo recall "api errors" --budget 1000 --json
486
- ```
487
-
488
- Results are ranked by `relevance * strength * recency`. The highest-signal memories fill the budget first.
489
-
490
- ---
491
-
492
- ### Auto-learn from git
493
-
494
- Hippo can scan your commit history and extract lessons from fix/revert/bug commits automatically.
495
-
496
- ```bash
497
- # Learn from the last 7 days of commits
498
- hippo learn --git
499
-
500
- # Learn from the last 30 days
501
- hippo learn --git --days 30
502
-
503
- # Scan multiple repos in one pass
504
- hippo learn --git --repos "~/project-a,~/project-b,~/project-c"
505
- ```
506
-
507
- The `--repos` flag accepts comma-separated paths. Hippo scans each repo's git log, extracts fix/revert/bug lessons, deduplicates against existing memories, and stores new ones. Pair with `hippo sleep` afterwards to consolidate.
508
-
509
- Ideal for a weekly cron:
510
-
511
- ```bash
512
- hippo learn --git --repos "~/repo1,~/repo2" --days 7
513
- hippo sleep
514
- ```
515
-
516
- ---
517
-
518
- ### Watch mode
519
-
520
- Wrap any command with `hippo watch` to auto-learn from failures:
521
-
522
- ```bash
523
- hippo watch "npm run build"
524
- # if it fails, Hippo captures the error automatically
525
- # next time an agent asks about build issues, the memory is there
526
- ```
527
-
528
- ---
529
-
530
- ## CLI Reference
531
-
532
- | Command | What it does |
533
- |---------|-------------|
534
- | `hippo init` | Create `.hippo/` + auto-install agent hooks |
535
- | `hippo init --global` | Create global store at `~/.hippo/` |
536
- | `hippo init --no-hooks` | Create `.hippo/` without auto-installing hooks |
537
- | `hippo remember "<text>"` | Store a memory |
538
- | `hippo remember "<text>" --tag <t>` | Store with tag (repeatable) |
539
- | `hippo remember "<text>" --error` | Store as error (2x half-life) |
540
- | `hippo remember "<text>" --pin` | Store with no decay |
541
- | `hippo remember "<text>" --verified` | Set confidence: verified (default) |
542
- | `hippo remember "<text>" --observed` | Set confidence: observed |
543
- | `hippo remember "<text>" --inferred` | Set confidence: inferred |
544
- | `hippo remember "<text>" --global` | Store in global `~/.hippo/` store |
545
- | `hippo recall "<query>"` | Retrieve relevant memories (local + global) |
546
- | `hippo recall "<query>" --budget <n>` | Recall within token limit (default: 4000) |
547
- | `hippo recall "<query>" --limit <n>` | Cap result count |
548
- | `hippo recall "<query>" --why` | Show match reasons and source buckets |
549
- | `hippo recall "<query>" --json` | Output as JSON |
550
- | `hippo context --auto` | Smart context injection (auto-detects task from git) |
551
- | `hippo context "<query>" --budget <n>` | Context injection with explicit query (default: 1500) |
552
- | `hippo context --limit <n>` | Cap memory count in context |
553
- | `hippo context --budget 0` | Skip entirely (zero token cost) |
554
- | `hippo context --framing <mode>` | Framing: observe (default), suggest, assert |
555
- | `hippo context --format <fmt>` | Output format: markdown (default) or json |
556
- | `hippo import --chatgpt <path>` | Import from ChatGPT memory export (JSON or txt) |
557
- | `hippo import --claude <path>` | Import from CLAUDE.md or Claude memory.json |
558
- | `hippo import --cursor <path>` | Import from .cursorrules or .cursor/rules |
559
- | `hippo import --markdown <path>` | Import from structured markdown (headings -> tags) |
560
- | `hippo import --file <path>` | Import from any text file |
561
- | `hippo import --dry-run` | Preview import without writing |
562
- | `hippo import --global` | Write imported memories to `~/.hippo/` |
563
- | `hippo capture --stdin` | Extract memories from piped conversation text |
564
- | `hippo capture --file <path>` | Extract memories from a file |
565
- | `hippo capture --dry-run` | Preview extraction without writing |
566
- | `hippo sleep` | Run consolidation (decay + merge + compress) |
567
- | `hippo sleep --dry-run` | Preview consolidation without writing |
568
- | `hippo status` | Memory health: counts, strengths, last sleep |
569
- | `hippo outcome --good` | Strengthen last recalled memories |
570
- | `hippo outcome --bad` | Weaken last recalled memories |
571
- | `hippo outcome --id <id> --good` | Target a specific memory |
572
- | `hippo inspect <id>` | Full detail on one memory |
573
- | `hippo forget <id>` | Force remove a memory |
574
- | `hippo embed` | Embed all memories for semantic search |
575
- | `hippo embed --status` | Show embedding coverage |
576
- | `hippo watch "<command>"` | Run command, auto-learn from failures |
577
- | `hippo learn --git` | Scan recent git commits for lessons |
578
- | `hippo learn --git --days <n>` | Scan N days back (default: 7) |
579
- | `hippo learn --git --repos <paths>` | Scan multiple repos (comma-separated) |
580
- | `hippo daily-runner` | Sweep registered workspaces and run daily learn+sleep |
581
- | `hippo conflicts` | List detected open memory conflicts |
582
- | `hippo conflicts --json` | Output conflicts as JSON |
583
- | `hippo resolve <id>` | Show both conflicting memories for comparison |
584
- | `hippo resolve <id> --keep <mem_id>` | Resolve: keep winner, weaken loser |
585
- | `hippo resolve <id> --keep <mem_id> --forget` | Resolve: keep winner, delete loser |
586
- | `hippo promote <id>` | Copy a local memory to the global store |
587
- | `hippo share <id>` | Share with attribution + transfer scoring |
588
- | `hippo share <id> --force` | Share even if transfer score is low |
589
- | `hippo share --auto` | Auto-share all high-scoring memories |
590
- | `hippo share --auto --dry-run` | Preview what would be shared |
591
- | `hippo peers` | List projects contributing to global store |
592
- | `hippo sync` | Pull global memories into local project |
593
- | `hippo invalidate "<pattern>"` | Actively weaken memories matching an old pattern |
594
- | `hippo invalidate "<pattern>" --reason "<why>"` | Include what replaced it |
595
- | `hippo decide "<decision>"` | Record architectural decision (90-day half-life) |
596
- | `hippo decide "<decision>" --context "<why>"` | Include reasoning |
597
- | `hippo decide "<decision>" --supersedes <id>` | Supersede a previous decision |
598
- | `hippo hook list` | Show available framework hooks |
599
- | `hippo hook install <target>` | Install hook (claude-code also adds Stop hook for auto-sleep) |
600
- | `hippo hook uninstall <target>` | Remove hook |
601
- | `hippo handoff create --summary "..."` | Create a session handoff |
602
- | `hippo handoff latest` | Show the most recent handoff |
603
- | `hippo handoff show <id>` | Show a specific handoff by ID |
604
- | `hippo session latest` | Show latest task snapshot + events |
605
- | `hippo session resume` | Re-inject latest handoff as context |
606
- | `hippo current show` | Compact current state (task + session events) |
607
- | `hippo wm push --scope <s> --content "..."` | Push to working memory |
608
- | `hippo wm read --scope <s>` | Read working memory entries |
609
- | `hippo wm clear --scope <s>` | Clear working memory |
610
- | `hippo wm flush --scope <s>` | Flush working memory (session end) |
611
- | `hippo dashboard` | Open web dashboard at localhost:3333 |
612
- | `hippo dashboard --port <n>` | Use custom port |
613
- | `hippo mcp` | Start MCP server (stdio transport) |
614
-
615
- ---
616
-
617
- ## Framework Integrations
618
-
619
- ### Auto-install (recommended)
620
-
621
- `hippo init` detects your agent framework and patches the right config file automatically:
622
-
623
- | Framework | Detected by | Patches |
624
- |-----------|------------|---------|
625
- | Claude Code | `CLAUDE.md` or `.claude/settings.json` | `CLAUDE.md` + `SessionStart`/`SessionEnd` hooks in `settings.json` |
626
- | Codex | `AGENTS.md` or `.codex` | `AGENTS.md` + automatic in-place Codex launcher wrapper |
627
- | Cursor | `.cursorrules` or `.cursor/rules` | `.cursorrules` |
628
- | OpenClaw | `.openclaw` or `AGENTS.md` | native OpenClaw plugin or `AGENTS.md` |
629
- | OpenCode | `.opencode/` or `opencode.json` | `AGENTS.md` + TS plugin at `~/.config/opencode/plugins/hippo.ts` (subscribes to `session.idle` + `session.created`) |
630
-
631
- No extra commands needed. Just `hippo init` and your agent knows about Hippo.
632
-
633
- ### Manual install
634
-
635
- If you prefer explicit control:
636
-
637
- ```bash
638
- hippo hook install claude-code # patches CLAUDE.md + adds SessionStart/SessionEnd + UserPromptSubmit hooks
639
- hippo hook install codex # optional repair/manual run: patches AGENTS.md + wraps the detected Codex launcher
640
- hippo hook install cursor # patches .cursorrules
641
- hippo hook install openclaw # patches AGENTS.md
642
- hippo hook install opencode # patches AGENTS.md + installs the opencode TS plugin
643
- ```
644
-
645
- This adds a `<!-- hippo:start -->` ... `<!-- hippo:end -->` block that tells the agent to:
646
- 1. Run `hippo context --auto --budget 1500` at session start
647
- 2. Run `hippo remember "<lesson>" --error` on errors
648
- 3. Run `hippo outcome --good` on completion
649
-
650
- For Claude Code, it also adds:
651
- - a `SessionEnd` hook so `hippo sleep` runs automatically when the session exits
652
- - a `SessionStart` hook that prints the previous session's consolidation output
653
- - a `UserPromptSubmit` hook that runs `hippo context --pinned-only --include-recent 5 --format additional-context` every turn. It re-injects pinned memories (`hippo remember <text> --pin`) plus the last 5 writes, so fresh same-session lessons appear on the next prompt before you pin them. Opt out with `{"pinnedInject":{"enabled":false}}` in `.hippo/config.json`.
654
-
655
- To remove: `hippo hook uninstall claude-code`
656
-
657
- ### What the hook adds (Claude Code example)
658
-
659
- ```markdown
660
- ## Project Memory (Hippo)
661
-
662
- Before starting work, load relevant context:
663
- hippo context --auto --budget 1500
664
-
665
- When you hit an error or discover a gotcha:
666
- hippo remember "<what went wrong and why>" --error
667
-
668
- After completing work successfully:
669
- hippo outcome --good
670
- ```
671
-
672
- ### MCP Server
673
-
674
- For any MCP-compatible client (Cursor, Windsurf, Cline, Claude Desktop):
675
-
676
- ```bash
677
- hippo mcp # starts MCP server over stdio
678
- ```
679
-
680
- Add to your MCP config (e.g. `.cursor/mcp.json` or `claude_desktop_config.json`):
681
-
682
- ```json
683
- {
684
- "mcpServers": {
685
- "hippo-memory": {
686
- "command": "hippo",
687
- "args": ["mcp"]
688
- }
689
- }
690
- }
691
- ```
692
-
693
- Exposes tools: `hippo_recall`, `hippo_remember`, `hippo_outcome`, `hippo_context`, `hippo_status`, `hippo_learn`, `hippo_wm_push`.
694
-
695
- ### OpenClaw Plugin
696
-
697
- Native plugin with auto-context injection, workspace-aware memory lookup, and
698
- tool hooks for auto-learn / auto-sleep. When `autoSleep` is enabled, the
699
- OpenClaw plugin now launches `hippo sleep` in a detached background worker at
700
- session end so the live session can exit immediately.
701
-
702
- Query-time retrieval still uses the active workspace store plus the shared
703
- global store. Daily consolidation comes from the machine-level runner that
704
- `hippo init` / `hippo setup` installs.
705
-
706
- ```bash
707
- openclaw plugins install hippo-memory
708
- openclaw plugins enable hippo-memory
709
- ```
710
-
711
- Plugin docs: [extensions/openclaw-plugin/](extensions/openclaw-plugin/). Integration guide: [integrations/openclaw.md](integrations/openclaw.md).
712
-
713
- ### Claude Code Plugin
714
-
715
- Plugin with SessionStart/Stop hooks and error auto-capture. See [extensions/claude-code-plugin/](extensions/claude-code-plugin/).
716
-
717
- Full integration details: [integrations/](integrations/)
718
-
719
- ---
720
-
721
- ## The Neuroscience
722
-
723
- Hippo is modeled on seven properties of the human hippocampus. Not metaphorically. Literally.
724
-
725
- **Why two stores?** The brain uses a fast hippocampal buffer + a slow neocortical store (Complementary Learning Systems theory, McClelland et al. 1995). If the neocortex learned fast, new information would overwrite old knowledge. The buffer absorbs new episodes; the neocortex extracts patterns over time.
726
-
727
- **Why does decay help?** New neurons born in the dentate gyrus actively disrupt old memory traces (Frankland et al. 2013). This is adaptive: it reduces interference from outdated information. Forgetting isn't failure. It's maintenance.
728
-
729
- **Why do errors stick?** The amygdala modulates hippocampal consolidation based on emotional significance. Fear and error signals boost encoding. Your first production incident is burned into memory. Your 200th uneventful deploy isn't.
730
-
731
- **Why does retrieval strengthen?** Recalled memories undergo "reconsolidation" (Nader et al. 2000). The act of retrieval destabilizes the trace, then re-encodes it stronger. This is the testing effect. Hippo implements it mechanically via the half-life extension on recall.
732
-
733
- **Why does sleep consolidate?** During sleep, the hippocampus replays compressed versions of recent episodes and "teaches" the neocortex by repeatedly activating the same patterns. Hippo's `sleep` command runs this as a deliberate consolidation pass.
734
-
735
- The 7 mechanisms in full: [PLAN.md#core-principles](PLAN.md#core-principles)
736
-
737
- For how these mechanisms connect to LLM training, continual learning, and open research problems: **[RESEARCH.md](RESEARCH.md)**
738
-
739
- **Why does reward modulate decay?** In spiking neural networks, reward-modulated STDP strengthens synapses that contribute to positive outcomes and weakens those that don't. Hippo's reward-proportional decay (v0.11.0) implements this: memories with consistent positive outcomes decay slower, negatives decay faster, with no fixed deltas. Inspired by [MH-FLOCKE](https://github.com/MarcHesse/mhflocke)'s R-STDP architecture for quadruped locomotion, where the same mechanism produces stable learning with 11.6x lower variance than PPO.
740
-
741
- **Prior art in agent memory simulation.** The idea that human-like memory produces human-like behavior as an emergent property was explored in IEEE research from 2010-2011 ([5952114](https://ieeexplore.ieee.org/document/5952114), [5548405](https://ieeexplore.ieee.org/document/5548405), [5953964](https://ieeexplore.ieee.org/document/5953964)). Walking between rooms and forgetting why you went there doesn't need direct simulation; it emerges naturally from a memory system with capacity limits and decay. Hippo's design follows the same principle: implement the mechanisms, and the behavior follows.
742
-
743
- **Related work:** [HippoRAG](https://arxiv.org/abs/2405.14831) (Gutierrez et al., 2024) applies hippocampal indexing to RAG via knowledge graphs. [MemPalace](https://github.com/milla-jovovich/mempalace) (Sigman & Jovovich, 2026) organizes memory spatially (wings/halls/rooms) with AAAK compression, achieving 100% on [LongMemEval](https://arxiv.org/abs/2410.10813). [MH-FLOCKE](https://github.com/MarcHesse/mhflocke) (Hesse, 2026) uses spiking neurons with R-STDP for embodied cognition. Each system tackles a different facet: HippoRAG optimizes retrieval quality, MemPalace optimizes retrieval organization, MH-FLOCKE optimizes embodied learning, and Hippo optimizes memory lifecycle.
744
-
745
- ---
746
-
747
- ## Comparison
748
-
749
- The AI-memory category matured fast in 2026. Hippo's specific take — bio-decay, strengthen-on-use, outcome-weighted half-lives — is one stance among several. The table below is a feature snapshot, not a verdict: graph-first systems ([gbrain](https://hermesatlas.com/projects/garrytan/gbrain), [Zep](https://www.getzep.com/), [Cognee](https://www.cognee.ai/)), agent-managed systems ([Letta](https://github.com/letta-ai/letta)), and version-control / skill-distillation takes ([Memoria](https://github.com/matrixorigin/Memoria), [EverMind](https://evermind.ai/)) all solve adjacent problems with different mechanics.
750
-
751
- | Feature | Hippo | [MemPalace](https://github.com/milla-jovovich/mempalace) | [Mem0](https://github.com/mem0ai/mem0) | [Basic Memory](https://github.com/basicmachines-co/basic-memory) | [gbrain](https://hermesatlas.com/projects/garrytan/gbrain) | [Zep](https://www.getzep.com/) | [Letta](https://github.com/letta-ai/letta) | [Cognee](https://www.cognee.ai/) | [Memoria](https://github.com/matrixorigin/Memoria) | [EverMind](https://evermind.ai/) |
752
- |---------|-------|-----------|------|-------------|--------|-----|-------|--------|---------|----------|
753
- | Decay by default | Yes | No | No | No | No | No | No | No | No | No |
754
- | Retrieval strengthening | Yes | No | No | No | No | No | No | Partial (recall tuning) | No | Partial (Skill Memory distills patterns) |
755
- | Reward-proportional decay | Yes | No | No | No | No | No | No | No | No | No |
756
- | Hybrid search (BM25 + embeddings) | Yes | Embeddings + spatial | Embeddings only | No | Yes (vec + rerank + graph) | Yes (graph + vec) | ? | Yes (GraphRAG) | Yes (vector + full-text) | Yes (mRAG, multi-modal) |
757
- | Schema acceleration / knowledge graph | Yes (schema) | No | No | No | Yes (typed KG, self-wiring) | Yes (temporal KG) | No | Yes (auto-ontologies) | No (typed claims) | Yes (hierarchical: user/group/agent) |
758
- | Conflict detection + resolution | Yes | No | No | No | Yes (eval-surfaced) | Yes (auto-invalidate stale facts) | No | No | Yes (auto-detect + quarantine) | Partial (temporal tracking) |
759
- | Multi-agent shared memory | Yes | No | No | No | Yes (brain repo, team mounts) | Yes | No (single-agent state) | Yes | Yes (branch/merge across sessions) | Yes (multi-agent coordination) |
760
- | Transfer scoring | Yes | No | No | No | No | No | No | No | No | No |
761
- | Outcome tracking | Yes | No | No | No | No | No | No | No | No | Partial (Cases: agent trajectories) |
762
- | Confidence tiers | Yes | No | No | No | No (typed facts) | No | No | No | No | No |
763
- | Spatial organization | No | Yes (wings/halls/rooms) | No | No | No | No | No | No | No | No |
764
- | Lossless compression | No | Yes (AAAK, 30x) | No | No | No | No | No | No | No | No |
765
- | Cross-tool import (ChatGPT/Claude/Cursor) | Yes | No | No | No | Partial (data sources) | ? | No | Partial (28 data sources) | No (Git ops) | Partial (mRAG: PDFs/images/URLs) |
766
- | Auto-hook install | Yes | No | No | No | No | No | No | No | No | No |
767
- | MCP server | Yes | Yes | No | No | Yes (stdio + HTTP/OAuth) | Partial (managed) | Yes (via Letta Code) | Yes (first-party Claude/LangGraph) | Yes | ? |
768
- | Zero runtime deps | Yes | No (ChromaDB) | No | No | No (PGLite or PG+pgvector) | No (managed service) | No (Python deps) | No (Python deps) | Yes (single Rust binary) | No (managed + OSS) |
769
- | LongMemEval (best published) | 86.8% R@5 (F13+F9, oracle\*) | 96.6% raw / 100% reranked R@5 | ~49-85% R@5 | N/A | 97.6-97.9% R@5 (s_cleaned\*) | N/A (LoCoMo 80.3%) | N/A | N/A | 88.78% overall accuracy w/ reader\*\* | 83.00% overall\*\* (LoCoMo 93.05%, HaluMem 93.04%) |
770
- | Git-friendly | Yes | No | No | Yes | Yes | No | No | No | Yes (Git is the model) | ? |
771
- | Framework agnostic | Yes | Yes | Partial | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
772
- | License | MIT | (open) | Apache-2.0 | (open) | MIT | Apache-2.0 (community) | Apache-2.0 | MIT (core) | Apache-2.0 | Apache-2.0 (OSS) + cloud |
773
-
774
- \* Split-mismatched: Hippo's 86.8% is on `longmemeval_oracle` (3 sessions per haystack); gbrain's 97.6% is on `longmemeval_s_cleaned` (~40 sessions per haystack). Different splits, different difficulty. Not directly comparable.
775
-
776
- \*\* Different metric: Memoria's 88.78% and EverMind's 83% are reported as overall accuracy with a reader LLM, not retrieval R@5. Higher denominator + LLM helps. Not directly comparable to retrieval-only R@5 numbers above.
777
-
778
- Different tools answer different questions. Mem0 and Basic Memory implement "save everything, search later." MemPalace implements "store everything, organize spatially for retrieval." gbrain, Zep, and Cognee implement "extract typed entities and relationships into a knowledge graph." Letta implements "the agent edits its own memory blocks." Memoria implements "Git-style version control over the memory state itself." EverMind implements "self-evolving Skill Memory + multi-modal retrieval over hierarchical scopes." Hippo implements "forget by default, earn persistence through use." These are complementary takes, not a single-axis ranking: bio-lifecycle (Hippo) + GraphRAG (gbrain/Cognee/Zep) + agent-self-edit (Letta) + memory-VCS (Memoria) + skill-distillation (EverMind) cover different parts of the same problem.
779
-
780
- ---
781
-
782
- ## Benchmarks
783
-
784
- Two benchmarks testing two different things. Full details in [`benchmarks/`](benchmarks/).
785
-
786
- ### LongMemEval (retrieval accuracy)
787
-
788
- [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) is the industry-standard benchmark: 500 questions across 5 memory abilities, embedded in 115k+ token chat histories.
789
-
790
- **Hippo v0.28.0 results (hybrid BM25 + cosine, full 500 questions):**
791
-
792
- | Metric | v0.28 | v0.11 (BM25 only) |
793
- |--------|-------|-------------------|
794
- | Recall@1 | 46.6% | 50.4% |
795
- | Recall@3 | **67.0%** | 66.6% |
796
- | Recall@5 | 73.8% | 74.0% |
797
- | Recall@10 | 81.0% | 82.6% |
798
- | Answer in content@5 | **49.6%** | 46.6% |
799
-
800
- | Question Type | Count | R@5 | R@10 |
801
- |---------------|-------|-----|------|
802
- | single-session-assistant | 56 | 100.0% | 100.0% |
803
- | knowledge-update | 78 | 89.7% | 96.2% |
804
- | multi-session | 133 | 72.2% | 82.0% |
805
- | temporal-reasoning | 133 | 72.9% | 78.9% |
806
- | single-session-user | 70 | 62.9% | 71.4% |
807
- | single-session-preference | 30 | 20.0% | 33.3% |
808
-
809
- For context: MemPalace scores 96.6% (raw) using ChromaDB embeddings + spatial indexing. Hippo v0.28 achieves 73.8% R@5 with hybrid BM25 + cosine. Hybrid scoring trades a little R@1 accuracy for better top-5 content relevance (answer_in_content@5 +3pp vs v0.11).
810
-
811
- Hippo's strongest categories (single-session-assistant 100% R@5, knowledge-update 89.7%) are where keyword overlap between question and stored content is highest. The weakest (preference 20%) involves indirect references that need deeper semantic understanding.
812
-
813
- > Note: v0.28 R@10 is 1.6pp below v0.11's BM25-only result. The earlier v0.27 benchmark showed an apparent 35pp regression — that was a methodology bug (budget-limited retrieval vs unlimited), fixed in v0.28 with the `minResults` option. See [`evals/README.md`](evals/README.md) for the full investigation and per-type breakdown.
814
-
815
- ```bash
816
- cd benchmarks/longmemeval
817
- python ingest_direct.py --data data/longmemeval_oracle.json --store-dir ./store
818
- python retrieve_fast.py --data data/longmemeval_oracle.json --store-dir ./store --output results/retrieval.jsonl
819
- python evaluate_retrieval.py --retrieval results/retrieval.jsonl --data data/longmemeval_oracle.json
820
- ```
821
-
822
- ### Sequential Learning Benchmark (agent improvement over time)
823
-
824
- No other public benchmark tests whether memory systems produce learning curves. LongMemEval tests retrieval on a fixed corpus. This benchmark tests whether an agent with memory *performs better on task 40 than task 5*.
825
-
826
- 50 tasks, 10 trap categories, each appearing 2-3 times across the sequence.
827
-
828
- > **v0.11.0 informal results — RETRACTED v1.7.9.** The 78% → 14% magnitude does NOT reproduce on the formal sequential-learning benchmark. Three pre-registered workload variants (v1.7.5 full-late, v1.7.6 budget sweep, v1.7.7 `--restrict-late-to 4`) all returned C2 hippo-base late mean = 0.0% across every seed (the workload's late phase saturates structurally). The mechanism (dlPFC goal-stack: `pushGoal`/`completeGoal` hooks, `--use-goal-stack`) is shipped and exercisable. **The magnitude is RETRACTED. The mechanism is shipped; no magnitude is currently claimed.** v1.8.0 (queued) explores adversarial trap categories as mechanism characterisation under the magnitude-smuggling guard in `docs/RETRACTION.md`. Pre-registration trail: `docs/evals/2026-05-07-v1.7.5-goal-stack-eval-prereg.md`, `docs/evals/2026-05-09-v1.7.6-calibration-result.md`, `docs/evals/2026-05-09-v1.7.7-goal-stack-eval-result.md`. CHANGELOG: see v1.7.9 entry.
829
-
830
- <details>
831
- <summary>Original v0.11.0 informal numbers (RETRACTED — preserved as audit trail in git, not reproduced here)</summary>
832
-
833
- v0.11.0 reported a single-run informal headline citing late-phase trap-rate decline on the sequential-learning benchmark. The specific numbers are archived at git tag `v0.11.0` and the corresponding `CHANGELOG.md` historical entry. Retained in version control, not reproduced here, since reproduction risks accidental re-citation. See `git show v0.11.0 -- README.md` for the original wording.
834
-
835
- </details>
836
-
837
- The benchmark, harness, and adapter contract remain shipped. Any memory system can run this benchmark by implementing the [adapter interface](benchmarks/sequential-learning/adapters/interface.mjs).
838
-
839
- ```bash
840
- cd benchmarks/sequential-learning
841
- node run.mjs --adapter all
842
- ```
843
-
844
- ---
845
-
846
- ## Contributing
847
-
848
- Issues and PRs welcome. Before contributing, run `hippo status` in the repo root to see the project's own memory.
849
-
850
- The interesting problems:
851
- - **Improve LongMemEval score.** Current R@5 is 73.8% with hybrid BM25 + cosine (v0.28). Gap to MemPalace's 96.6% likely needs better chunking, reranking, or semantic compression — not just more of the same retrieval.
852
- - Better consolidation heuristics (LLM-powered merge vs current text overlap)
853
- - Web UI / dashboard for visualizing decay curves and memory health
854
- - Optimal decay parameter tuning from real usage data
855
- - Cross-agent transfer learning evaluation
856
- - **MemPalace-style spatial organization.** Could spatial structure (wings/halls/rooms) improve hippo's semantic layer?
857
- - **AAAK-style compression for semantic memories.** Lossless token compression for context injection.
858
-
859
- ## License
860
-
861
- MIT
1
+ # 🦛 Hippo
2
+
3
+ **The secret to good memory isn't remembering more. It's knowing what to forget.**
4
+
5
+ [![npm](https://img.shields.io/npm/v/hippo-memory)](https://npmjs.com/package/hippo-memory)
6
+ [![license](https://img.shields.io/badge/license-MIT-blue)](./LICENSE)
7
+
8
+ <p align="center">
9
+ <img src="./assets/hippo-init.svg" alt="hippo init --scan ~ — initializing memory across all repos" width="720">
10
+ </p>
11
+
12
+ A memory layer for AI agents. Modeled on the hippocampus. Decay by default, strength through use, provenance on every memory. SQLite under the hood, zero runtime deps, works with every CLI agent you have.
13
+
14
+ ```bash
15
+ npm install -g hippo-memory && hippo init --scan ~
16
+ ```
17
+
18
+ One command. Every git repo on your machine gets memory.
19
+
20
+ ```
21
+ Works with: Claude Code, Codex, Cursor, OpenClaw, OpenCode, Pi, any MCP client
22
+ Imports from: ChatGPT, Claude (CLAUDE.md), Cursor (.cursorrules), Slack, markdown
23
+ Storage: SQLite backbone with markdown mirrors. Git-trackable, human-readable.
24
+ Dependencies: Zero runtime deps. Node.js 22.5+. Optional embeddings via @xenova/transformers.
25
+ ```
26
+
27
+ ---
28
+
29
+ ## Why this exists
30
+
31
+ Most "AI memory" systems save everything and search later. That's storage with semantic search bolted on. It's why your agent kept hitting the same deploy bug last week. And the week before. The system saw the failure four times. It had no way to know it should remember.
32
+
33
+ Hippo applies the thing brains have been getting right for 500 million years. Memories decay over time. Retrieval makes them stronger. Three biological layers (buffer, episodic, semantic) consolidate during sleep. Hard lessons stick because you used them. Trivia fades because you didn't.
34
+
35
+ It also fixes the portability problem. Your ChatGPT memories don't travel to Claude. Your `.cursorrules` don't travel to Codex. Hippo is one process behind every agent. CLAUDE.md, Cursor rules, ChatGPT exports, Slack history, all in one SQLite store, all queryable from any tool that speaks MCP or HTTP.
36
+
37
+ ---
38
+
39
+ ## Receipts
40
+
41
+ Numbers, not adjectives. Every claim links to the benchmark or the test that proves it.
42
+
43
+ - **Sequential Learning Benchmark.** [benchmarks/sequential-learning/](benchmarks/sequential-learning/). 50 tasks, 10 buried traps. Measures whether agents learn from past mistakes, not just retrieve text. v0.11.0 informal magnitude RETRACTED v1.7.9; mechanism remains shipped. See [CHANGELOG.md](./CHANGELOG.md) v1.7.9 entry.
44
+ - **R@5 = 74.0%** on [LongMemEval](benchmarks/longmemeval/). 500-question industry retrieval benchmark, BM25 only, no embeddings.
45
+ - **10 of 10 incident scenarios beat transcript replay** on a staged Slack corpus ([benchmarks/e1.3/](benchmarks/e1.3/)). Recall surfaces the cause faster than scrolling the last N messages.
46
+ - **0 outbound HTTP** on the 1000-event ingestion smoke. Proven by a `globalThis.fetch` spy that throws on call, not a hardcoded zero.
47
+ - **926 tests, real DB, zero mocks.** Project rule. The one mocks-vs-prod divergence that bit us early is now the constraint that kept the next ten releases honest.
48
+ - **dlPFC goal-conditioned cluster discrimination, 3/3 queries pass** — full goal stack with policy weighting and lifespan-windowed outcome propagation. Per-goal lift on a 3-cluster fixture where BM25 alone cannot discriminate; deterministic test in [`benchmarks/micro/results/b3-depth.json`](benchmarks/micro/results/b3-depth.json).
49
+
50
+ ---
51
+
52
+ ## What it does for your agent
53
+
54
+ - **Stops repeating mistakes.** Tag a failure with `--tag error` once, the lesson surfaces every time the agent walks back into that part of the code. Errors decay slower than ordinary observations.
55
+ - **Survives tool switches.** Use Claude Code on Monday, Cursor on Tuesday, Codex on Wednesday. Same `.hippo/` store. Same memories. Pick up exactly where you left off.
56
+ - **Ingests systems of record.** Slack today (`POST /v1/connectors/slack/events`). GitHub, Jira, Notion next. Webhooks land as `kind='raw'` memories with full provenance and GDPR-correct deletion.
57
+ - **Knows where every memory came from.** Every row carries `kind`, `scope`, `owner`, and `artifact_ref`. Right-to-be-forgotten is a single API call, not an audit nightmare.
58
+ - **Plays nice with multi-tenant.** API keys, scrypt-hashed. Audit log on every mutation. Tenant A literally cannot see tenant B's memories. Proven by negative test.
59
+
60
+ ---
61
+
62
+ ## Quick start
63
+
64
+ ```bash
65
+ npm install -g hippo-memory
66
+
67
+ # Single project
68
+ hippo init
69
+
70
+ # All your projects at once (recommended)
71
+ hippo init --scan ~
72
+ ```
73
+
74
+ `--scan` finds every git repo under your home directory, creates a `.hippo/` store in each one, and seeds it with lessons from the last 30 days of commit history. One command, instant memory across all your projects.
75
+
76
+ After setup, `hippo sleep` runs at session end (via auto-installed agent hooks) and does five things:
77
+
78
+ 1. **Learns** from today's git commits
79
+ 2. **Imports** new entries from Claude Code MEMORY.md files
80
+ 3. **Consolidates** memories (decay, merge, prune)
81
+ 4. **Deduplicates** near-identical memories, keeping the stronger copy
82
+ 5. **Shares** high-value lessons to a global store so they surface in every project
83
+
84
+ ```bash
85
+ # Manual usage
86
+ hippo remember "FRED cache silently dropped the tips_10y series" --tag error
87
+ hippo recall "data pipeline issues" --budget 2000
88
+ ```
89
+
90
+ ---
91
+
92
+ Full release history: **[CHANGELOG.md](./CHANGELOG.md)** · [GitHub Releases](https://github.com/kitfunso/hippo-memory/releases)
93
+
94
+
95
+ ### Zero-config agent integration
96
+
97
+ `hippo init` auto-detects your agent framework and wires itself in:
98
+
99
+ ```bash
100
+ cd my-project
101
+ hippo init
102
+
103
+ # Initialized Hippo at /my-project
104
+ # Directories: buffer/ episodic/ semantic/ conflicts/
105
+ # Auto-installed claude-code hook in CLAUDE.md
106
+ ```
107
+
108
+ If you have a `CLAUDE.md`, it patches it. `AGENTS.md` for Codex/OpenClaw/OpenCode. `.cursorrules` for Cursor. For Codex, Hippo also wraps the detected launcher in place so `/exit` can consolidate memory without a manual PATH step. No manual `hook install` needed. Your agent starts using Hippo on its next session.
109
+
110
+ It also registers the current project in Hippo's workspace registry and installs one machine-level daily runner (6:15am). That runner sweeps every registered workspace, runs `hippo learn --git --days 1`, then `hippo sleep`. You get strict daily consolidation without creating one OS task per project.
111
+
112
+ To skip: `hippo init --no-hooks --no-schedule`
113
+
114
+ ---
115
+
116
+ ## Cross-Tool Import
117
+
118
+ Your memories shouldn't be locked inside one tool. Hippo pulls them in from anywhere.
119
+
120
+ ```bash
121
+ # ChatGPT memory export
122
+ hippo import --chatgpt memories.json
123
+
124
+ # Claude's CLAUDE.md (skips existing hippo hook blocks)
125
+ hippo import --claude CLAUDE.md
126
+
127
+ # Cursor rules
128
+ hippo import --cursor .cursorrules
129
+
130
+ # Any markdown file (headings become tags)
131
+ hippo import --markdown MEMORY.md
132
+
133
+ # Any text file
134
+ hippo import --file notes.txt
135
+ ```
136
+
137
+ All import commands support `--dry-run` (preview without writing), `--global` (write to `~/.hippo/`), and `--tag` (add extra tags). Duplicates are detected and skipped automatically.
138
+
139
+ ### Conversation Capture
140
+
141
+ Extract memories from raw conversation text. No LLM needed: pattern-based heuristics find decisions, rules, errors, and preferences.
142
+
143
+ ```bash
144
+ # Pipe a conversation in
145
+ cat session.log | hippo capture --stdin
146
+
147
+ # Or point at a file
148
+ hippo capture --file conversation.md
149
+
150
+ # Preview first
151
+ hippo capture --file conversation.md --dry-run
152
+ ```
153
+
154
+ ### Slack ingestion (E1.3)
155
+
156
+ Hippo accepts Slack Events API webhooks at `POST /v1/connectors/slack/events`. Configure `SLACK_SIGNING_SECRET` (validated on every request) and point Slack at `https://<your-host>/v1/connectors/slack/events`. Messages land as `kind='raw'` memories with `slack://team/channel/ts` provenance and a `slack:public:Cxxx` or `slack:private:Cxxx` scope. Source deletions are honored (GDPR).
157
+
158
+ Backfill an existing channel: `SLACK_BOT_TOKEN=xoxb-... hippo slack backfill --channel C0000`. Inspect malformed events: `hippo slack dlq list`.
159
+
160
+ Multi-workspace deployments populate `slack_workspaces (team_id, tenant_id)` to route events per tenant; single-workspace falls back to `HIPPO_TENANT`.
161
+
162
+ ### Active task snapshots
163
+
164
+ Long-running work needs short-term continuity, not just long-term memory. Hippo can persist the current in-flight task so a later `continue` has something concrete to recover.
165
+
166
+ ```bash
167
+ hippo snapshot save \
168
+ --task "Ship SQLite backbone" \
169
+ --summary "Tests/build/smoke are green, next slice is active-session recovery" \
170
+ --next-step "Implement active snapshot retrieval in context output"
171
+
172
+ hippo snapshot show
173
+ hippo context --auto --budget 1500
174
+ hippo snapshot clear
175
+ ```
176
+
177
+ `hippo context --auto` includes the active task snapshot before long-term memories, so agents get both the immediate thread and the deeper lessons.
178
+
179
+ ### Session event trails
180
+
181
+ Manual snapshots are useful, but real work also needs a breadcrumb trail. Hippo can now store short session events and link them to the active snapshot so context output shows the latest steps, not just the last summary.
182
+
183
+ ```bash
184
+ hippo session log \
185
+ --id sess_20260326 \
186
+ --task "Ship continuity" \
187
+ --type progress \
188
+ --content "Schema migration is done, next step is CLI wiring"
189
+
190
+ hippo snapshot save \
191
+ --task "Ship continuity" \
192
+ --summary "Structured session events are flowing" \
193
+ --next-step "Surface them in framework hooks" \
194
+ --session sess_20260326
195
+
196
+ hippo session show --id sess_20260326
197
+ hippo context --auto --budget 1500
198
+ ```
199
+
200
+ Hippo mirrors the latest trail to `.hippo/buffer/recent-session.md` so you can inspect the short-term thread without opening SQLite.
201
+
202
+ ### Session handoffs
203
+
204
+ When you're done for the day (or switching to another agent), create a handoff so the next session knows exactly where to pick up:
205
+
206
+ ```bash
207
+ hippo handoff create \
208
+ --summary "Finished schema migration, tests green" \
209
+ --next "Wire handoff injection into context output" \
210
+ --session sess_20260403 \
211
+ --artifact src/db.ts
212
+
213
+ hippo handoff latest # show the most recent handoff
214
+ hippo handoff show 3 # show a specific handoff by ID
215
+ hippo session resume # re-inject latest handoff as context
216
+ ```
217
+
218
+ ### Working memory
219
+
220
+ Working memory is a bounded scratchpad for current-state notes. It's separate from long-term memory and gets cleared between sessions.
221
+
222
+ ```bash
223
+ hippo wm push --scope repo \
224
+ --content "Investigating flaky test in store.test.ts, line 42" \
225
+ --importance 0.9
226
+
227
+ hippo wm read --scope repo # show current working notes
228
+ hippo wm clear --scope repo # wipe the scratchpad
229
+ hippo wm flush --scope repo # flush on session end
230
+ ```
231
+
232
+ The buffer holds a maximum of 20 entries per scope. When full, the lowest-importance entry is evicted.
233
+
234
+ ### Explainable recall
235
+
236
+ See why a memory was returned:
237
+
238
+ ```bash
239
+ hippo recall "data pipeline" --why --limit 5
240
+
241
+ # --- mem_a1b2c3 [episodic] [observed] [local] score=0.847
242
+ # BM25: matched [data, pipeline]; cosine: 0.82
243
+ # ...memory content...
244
+ ```
245
+
246
+ ---
247
+
248
+ ## How It Works
249
+
250
+ Input enters the buffer. Important things get encoded into episodic memory. During "sleep," repeated episodes compress into semantic patterns. Weak memories decay and disappear.
251
+
252
+ ```mermaid
253
+ flowchart TD
254
+ I[New information] --> B[Buffer<br/>session-only, no decay]
255
+ B -->|encode: tags, strength, half-life| E[Episodic Store<br/>timestamped, decay by default<br/>retrieval strengthens, errors stick]
256
+ E -->|hippo sleep<br/>replay + merge| S[Semantic Store<br/>compressed patterns, stable<br/>schema-aware]
257
+ E -.->|decay| X[forgotten]
258
+ S -.->|recall| E
259
+ classDef bio fill:#fff4dc,stroke:#a8742d,color:#2b1b00
260
+ classDef forgotten fill:#f5f5f5,stroke:#999,color:#666,stroke-dasharray:5 5
261
+ class B,E,S bio
262
+ class X forgotten
263
+ ```
264
+
265
+ ---
266
+
267
+ ## Key Features
268
+
269
+ A memory's life across a typical session, before walking each feature in turn:
270
+
271
+ ```mermaid
272
+ sequenceDiagram
273
+ autonumber
274
+ actor Agent
275
+ participant B as Buffer
276
+ participant E as Episodic
277
+ participant S as Semantic
278
+ Agent->>B: hippo remember "cache dropped tips_10y" --error
279
+ B->>E: encode (half_life=14d, valence=neg)
280
+ Note over E: strength=1.0
281
+ Agent->>E: hippo recall "data pipeline"
282
+ E-->>Agent: returns memory (rank 1)
283
+ Note over E: half_life 14d → 16d, retrieval_count++
284
+ Agent->>E: hippo outcome --good
285
+ Note over E: reward_factor 1.0 → 1.15
286
+ Agent->>S: hippo sleep
287
+ S->>E: merge 3 related episodic → 1 semantic
288
+ Note over E,S: original episodic decays, pattern survives
289
+ ```
290
+
291
+ ### Decay by default
292
+
293
+ Every memory has a half-life. 7 days by default. Persistence is earned.
294
+
295
+ ```bash
296
+ hippo remember "always check cache contents after refresh"
297
+ # stored with half_life: 7d, strength: 1.0
298
+
299
+ # 14 days later with no retrieval:
300
+ hippo inspect mem_a1b2c3
301
+ # strength: 0.25 (decayed by 2 half-lives)
302
+ # at risk of removal on next sleep
303
+ ```
304
+
305
+ ---
306
+
307
+ ### Retrieval strengthens
308
+
309
+ Use it or lose it. Each recall boosts the half-life by 2 days.
310
+
311
+ ```bash
312
+ hippo recall "cache issues"
313
+ # finds mem_a1b2c3, retrieval_count: 1 -> 2
314
+ # half_life extended: 7d -> 9d
315
+ # strength recalculated from retrieval timestamp
316
+
317
+ hippo recall "cache issues" # again next week
318
+ # retrieval_count: 2 -> 3
319
+ # half_life: 9d -> 11d
320
+ # this memory is learning to survive
321
+ ```
322
+
323
+ ---
324
+
325
+ ### Active invalidation
326
+
327
+ When you migrate from one tool to another, old memories about the replaced tool should die immediately. Hippo detects migration and breaking-change commits during `hippo learn --git` and actively weakens matching memories.
328
+
329
+ ```bash
330
+ hippo learn --git
331
+ # feat: migrate from webpack to vite
332
+ # Invalidated 3 memories referencing "webpack"
333
+ # Learned: migrate from webpack to vite
334
+ ```
335
+
336
+ You can also invalidate manually:
337
+
338
+ ```bash
339
+ hippo invalidate "REST API" --reason "migrated to GraphQL"
340
+ # Invalidated 5 memories referencing "REST API".
341
+ ```
342
+
343
+ ---
344
+
345
+ ### Architectural decisions
346
+
347
+ One-off decisions don't repeat, so they can't earn their keep through retrieval alone. `hippo decide` stores them with a 90-day half-life and verified confidence so they survive long enough to matter.
348
+
349
+ ```bash
350
+ hippo decide "Use PostgreSQL for all new services" --context "JSONB support"
351
+ # Decision recorded: mem_a1b2c3
352
+
353
+ # Later, when the decision changes:
354
+ hippo decide "Use CockroachDB for global services" \
355
+ --context "Need multi-region" \
356
+ --supersedes mem_a1b2c3
357
+ # Superseded mem_a1b2c3 (half-life halved, marked stale)
358
+ # Decision recorded: mem_d4e5f6
359
+ ```
360
+
361
+ ---
362
+
363
+ ### Error memories stick
364
+
365
+ Tag a memory as an error and it gets 2x the half-life automatically.
366
+
367
+ ```bash
368
+ hippo remember "deployment failed: forgot to run migrations" --error
369
+ # half_life: 14d instead of 7d
370
+ # emotional_valence: negative
371
+ # strength formula applies 2.0x multiplier (HIPPO_LOSS_AVERSION_RATIO=0.75 to keep v1.13.4 1.5x)
372
+
373
+ # production incidents don't fade quietly
374
+ ```
375
+
376
+ ---
377
+
378
+ ### Confidence tiers
379
+
380
+ Every memory carries a confidence level: `verified`, `observed`, `inferred`, or `stale`. This tells agents how much to trust what they're reading.
381
+
382
+ ```bash
383
+ hippo remember "API rate limit is 100/min" --verified
384
+ hippo remember "deploy usually takes ~3 min" --observed
385
+ hippo remember "the flaky test might be a race condition" --inferred
386
+ ```
387
+
388
+ When context is generated, confidence is shown inline:
389
+
390
+ ```
391
+ [verified] API rate limit is 100/min per the docs
392
+ [observed] Deploy usually takes ~3 min
393
+ [inferred] The flaky test might be a race condition
394
+ ```
395
+
396
+ Agents can see at a glance what's established fact vs. a pattern worth questioning.
397
+
398
+ Memories unretrieved for 30+ days are automatically marked `stale` during the next `hippo sleep`. If one gets recalled again, Hippo wakes it back up to `observed` so it can earn trust again instead of staying permanently stale.
399
+
400
+ ### Conflict tracking
401
+
402
+ Hippo detects obvious contradictions between overlapping memories and keeps them visible instead of silently letting both masquerade as truth. Shared tags alone do not count; the statements themselves need to overlap in content.
403
+
404
+ ```bash
405
+ hippo sleep # refreshes open conflicts
406
+ hippo conflicts # inspect them
407
+ ```
408
+
409
+ Open conflicts are stored in SQLite, mirrored under `.hippo/conflicts/`, and linked back into each memory's `conflicts_with` field.
410
+
411
+ ---
412
+
413
+ ### Observation framing
414
+
415
+ Memories aren't presented as bare assertions. By default, Hippo frames them as observations with dates, so agents treat them as context rather than commands.
416
+
417
+ ```bash
418
+ hippo context --framing observe # default
419
+ # Output: "Previously observed (2026-03-10): deploy takes ~3 min"
420
+
421
+ hippo context --framing suggest
422
+ # Output: "Consider: deploy takes ~3 min"
423
+
424
+ hippo context --framing assert
425
+ # Output: "Deploy takes ~3 min"
426
+ ```
427
+
428
+ Three modes: `observe` (default), `suggest`, `assert`. Choose based on how directive you want the memory to be.
429
+
430
+ ---
431
+
432
+ ### Sleep consolidation
433
+
434
+ Run `hippo sleep` and episodes compress into patterns.
435
+
436
+ ```bash
437
+ hippo sleep
438
+
439
+ # Running consolidation...
440
+ #
441
+ # Results:
442
+ # Active memories: 23
443
+ # Removed (decayed): 4
444
+ # Merged episodic: 6
445
+ # New semantic: 2
446
+ ```
447
+
448
+ Three or more related episodes get merged into a single semantic memory. The originals decay. The pattern survives.
449
+
450
+ ---
451
+
452
+ ### Outcome feedback
453
+
454
+ Did the recalled memories actually help? Tell Hippo. It tightens the feedback loop.
455
+
456
+ ```bash
457
+ hippo recall "why is the gold model broken"
458
+ # ... you read the memories and fix the bug ...
459
+
460
+ hippo outcome --good
461
+ # Applied positive outcome to 3 memories
462
+ # reward factor increases, decay slows
463
+
464
+ hippo outcome --bad
465
+ # Applied negative outcome to 3 memories
466
+ # reward factor decreases, decay accelerates
467
+ ```
468
+
469
+ Outcomes are cumulative. A memory with 5 positive outcomes and 0 negative has a reward factor of ~1.42, making its effective half-life 42% longer. A memory with 0 positive and 3 negative has a factor of ~0.63, decaying nearly twice as fast. Mixed outcomes converge toward neutral (1.0).
470
+
471
+ ---
472
+
473
+ ### Token budgets
474
+
475
+ Recall only what fits. No context stuffing.
476
+
477
+ ```bash
478
+ # fits within Claude's 2K token window for task context
479
+ hippo recall "deployment checklist" --budget 2000
480
+
481
+ # need more for a big task
482
+ hippo recall "full project history" --budget 8000
483
+
484
+ # machine-readable for programmatic use
485
+ hippo recall "api errors" --budget 1000 --json
486
+ ```
487
+
488
+ Results are ranked by `relevance * strength * recency`. The highest-signal memories fill the budget first.
489
+
490
+ ---
491
+
492
+ ### Auto-learn from git
493
+
494
+ Hippo can scan your commit history and extract lessons from fix/revert/bug commits automatically.
495
+
496
+ ```bash
497
+ # Learn from the last 7 days of commits
498
+ hippo learn --git
499
+
500
+ # Learn from the last 30 days
501
+ hippo learn --git --days 30
502
+
503
+ # Scan multiple repos in one pass
504
+ hippo learn --git --repos "~/project-a,~/project-b,~/project-c"
505
+ ```
506
+
507
+ The `--repos` flag accepts comma-separated paths. Hippo scans each repo's git log, extracts fix/revert/bug lessons, deduplicates against existing memories, and stores new ones. Pair with `hippo sleep` afterwards to consolidate.
508
+
509
+ Ideal for a weekly cron:
510
+
511
+ ```bash
512
+ hippo learn --git --repos "~/repo1,~/repo2" --days 7
513
+ hippo sleep
514
+ ```
515
+
516
+ ---
517
+
518
+ ### Watch mode
519
+
520
+ Wrap any command with `hippo watch` to auto-learn from failures:
521
+
522
+ ```bash
523
+ hippo watch "npm run build"
524
+ # if it fails, Hippo captures the error automatically
525
+ # next time an agent asks about build issues, the memory is there
526
+ ```
527
+
528
+ ---
529
+
530
+ ## CLI Reference
531
+
532
+ | Command | What it does |
533
+ |---------|-------------|
534
+ | `hippo init` | Create `.hippo/` + auto-install agent hooks |
535
+ | `hippo init --global` | Create global store at `~/.hippo/` |
536
+ | `hippo init --no-hooks` | Create `.hippo/` without auto-installing hooks |
537
+ | `hippo remember "<text>"` | Store a memory |
538
+ | `hippo remember "<text>" --tag <t>` | Store with tag (repeatable) |
539
+ | `hippo remember "<text>" --error` | Store as error (2x half-life) |
540
+ | `hippo remember "<text>" --pin` | Store with no decay |
541
+ | `hippo remember "<text>" --verified` | Set confidence: verified (default) |
542
+ | `hippo remember "<text>" --observed` | Set confidence: observed |
543
+ | `hippo remember "<text>" --inferred` | Set confidence: inferred |
544
+ | `hippo remember "<text>" --global` | Store in global `~/.hippo/` store |
545
+ | `hippo recall "<query>"` | Retrieve relevant memories (local + global) |
546
+ | `hippo recall "<query>" --budget <n>` | Recall within token limit (default: 4000) |
547
+ | `hippo recall "<query>" --limit <n>` | Cap result count |
548
+ | `hippo recall "<query>" --why` | Show match reasons and source buckets |
549
+ | `hippo recall "<query>" --hops <n>` | Also surface memories N hops away in the entity/relation graph (0..3, default off) |
550
+ | `hippo recall "<query>" --json` | Output as JSON |
551
+ | `hippo context --auto` | Smart context injection (auto-detects task from git) |
552
+ | `hippo context "<query>" --budget <n>` | Context injection with explicit query (default: 1500) |
553
+ | `hippo context --limit <n>` | Cap memory count in context |
554
+ | `hippo context --budget 0` | Skip entirely (zero token cost) |
555
+ | `hippo context --framing <mode>` | Framing: observe (default), suggest, assert |
556
+ | `hippo context --format <fmt>` | Output format: markdown (default) or json |
557
+ | `hippo import --chatgpt <path>` | Import from ChatGPT memory export (JSON or txt) |
558
+ | `hippo import --claude <path>` | Import from CLAUDE.md or Claude memory.json |
559
+ | `hippo import --cursor <path>` | Import from .cursorrules or .cursor/rules |
560
+ | `hippo import --markdown <path>` | Import from structured markdown (headings -> tags) |
561
+ | `hippo import --file <path>` | Import from any text file |
562
+ | `hippo import --dry-run` | Preview import without writing |
563
+ | `hippo import --global` | Write imported memories to `~/.hippo/` |
564
+ | `hippo capture --stdin` | Extract memories from piped conversation text |
565
+ | `hippo capture --file <path>` | Extract memories from a file |
566
+ | `hippo capture --dry-run` | Preview extraction without writing |
567
+ | `hippo sleep` | Run consolidation (decay + merge + compress) |
568
+ | `hippo sleep --dry-run` | Preview consolidation without writing |
569
+ | `hippo status` | Memory health: counts, strengths, last sleep |
570
+ | `hippo outcome --good` | Strengthen last recalled memories |
571
+ | `hippo outcome --bad` | Weaken last recalled memories |
572
+ | `hippo outcome --id <id> --good` | Target a specific memory |
573
+ | `hippo inspect <id>` | Full detail on one memory |
574
+ | `hippo forget <id>` | Force remove a memory |
575
+ | `hippo embed` | Embed all memories for semantic search |
576
+ | `hippo embed --status` | Show embedding coverage |
577
+ | `hippo watch "<command>"` | Run command, auto-learn from failures |
578
+ | `hippo learn --git` | Scan recent git commits for lessons |
579
+ | `hippo learn --git --days <n>` | Scan N days back (default: 7) |
580
+ | `hippo learn --git --repos <paths>` | Scan multiple repos (comma-separated) |
581
+ | `hippo daily-runner` | Sweep registered workspaces and run daily learn+sleep |
582
+ | `hippo conflicts` | List detected open memory conflicts |
583
+ | `hippo conflicts --json` | Output conflicts as JSON |
584
+ | `hippo resolve <id>` | Show both conflicting memories for comparison |
585
+ | `hippo resolve <id> --keep <mem_id>` | Resolve: keep winner, weaken loser |
586
+ | `hippo resolve <id> --keep <mem_id> --forget` | Resolve: keep winner, delete loser |
587
+ | `hippo promote <id>` | Copy a local memory to the global store |
588
+ | `hippo share <id>` | Share with attribution + transfer scoring |
589
+ | `hippo share <id> --force` | Share even if transfer score is low |
590
+ | `hippo share --auto` | Auto-share all high-scoring memories |
591
+ | `hippo share --auto --dry-run` | Preview what would be shared |
592
+ | `hippo peers` | List projects contributing to global store |
593
+ | `hippo sync` | Pull global memories into local project |
594
+ | `hippo invalidate "<pattern>"` | Actively weaken memories matching an old pattern |
595
+ | `hippo invalidate "<pattern>" --reason "<why>"` | Include what replaced it |
596
+ | `hippo decide "<decision>"` | Record architectural decision (90-day half-life) |
597
+ | `hippo decide "<decision>" --context "<why>"` | Include reasoning |
598
+ | `hippo decide "<decision>" --supersedes <id>` | Supersede a previous decision |
599
+ | `hippo hook list` | Show available framework hooks |
600
+ | `hippo hook install <target>` | Install hook (claude-code also adds Stop hook for auto-sleep) |
601
+ | `hippo hook uninstall <target>` | Remove hook |
602
+ | `hippo handoff create --summary "..."` | Create a session handoff |
603
+ | `hippo handoff latest` | Show the most recent handoff |
604
+ | `hippo handoff show <id>` | Show a specific handoff by ID |
605
+ | `hippo session latest` | Show latest task snapshot + events |
606
+ | `hippo session resume` | Re-inject latest handoff as context |
607
+ | `hippo current show` | Compact current state (task + session events) |
608
+ | `hippo wm push --scope <s> --content "..."` | Push to working memory |
609
+ | `hippo wm read --scope <s>` | Read working memory entries |
610
+ | `hippo wm clear --scope <s>` | Clear working memory |
611
+ | `hippo wm flush --scope <s>` | Flush working memory (session end) |
612
+ | `hippo dashboard` | Open web dashboard at localhost:3333 |
613
+ | `hippo dashboard --port <n>` | Use custom port |
614
+ | `hippo mcp` | Start MCP server (stdio transport) |
615
+
616
+ ---
617
+
618
+ ## Framework Integrations
619
+
620
+ ### Auto-install (recommended)
621
+
622
+ `hippo init` detects your agent framework and patches the right config file automatically:
623
+
624
+ | Framework | Detected by | Patches |
625
+ |-----------|------------|---------|
626
+ | Claude Code | `CLAUDE.md` or `.claude/settings.json` | `CLAUDE.md` + `SessionStart`/`SessionEnd` hooks in `settings.json` |
627
+ | Codex | `AGENTS.md` or `.codex` | `AGENTS.md` + automatic in-place Codex launcher wrapper |
628
+ | Cursor | `.cursorrules` or `.cursor/rules` | `.cursorrules` |
629
+ | OpenClaw | `.openclaw` or `AGENTS.md` | native OpenClaw plugin or `AGENTS.md` |
630
+ | OpenCode | `.opencode/` or `opencode.json` | `AGENTS.md` + TS plugin at `~/.config/opencode/plugins/hippo.ts` (subscribes to `session.idle` + `session.created`) |
631
+
632
+ No extra commands needed. Just `hippo init` and your agent knows about Hippo.
633
+
634
+ ### Manual install
635
+
636
+ If you prefer explicit control:
637
+
638
+ ```bash
639
+ hippo hook install claude-code # patches CLAUDE.md + adds SessionStart/SessionEnd + UserPromptSubmit hooks
640
+ hippo hook install codex # optional repair/manual run: patches AGENTS.md + wraps the detected Codex launcher
641
+ hippo hook install cursor # patches .cursorrules
642
+ hippo hook install openclaw # patches AGENTS.md
643
+ hippo hook install opencode # patches AGENTS.md + installs the opencode TS plugin
644
+ ```
645
+
646
+ This adds a `<!-- hippo:start -->` ... `<!-- hippo:end -->` block that tells the agent to:
647
+ 1. Run `hippo context --auto --budget 1500` at session start
648
+ 2. Run `hippo remember "<lesson>" --error` on errors
649
+ 3. Run `hippo outcome --good` on completion
650
+
651
+ For Claude Code, it also adds:
652
+ - a `SessionEnd` hook so `hippo sleep` runs automatically when the session exits
653
+ - a `SessionStart` hook that prints the previous session's consolidation output
654
+ - a `UserPromptSubmit` hook that runs `hippo context --pinned-only --include-recent 5 --format additional-context` every turn. It re-injects pinned memories (`hippo remember <text> --pin`) plus the last 5 writes, so fresh same-session lessons appear on the next prompt before you pin them. Opt out with `{"pinnedInject":{"enabled":false}}` in `.hippo/config.json`.
655
+
656
+ To remove: `hippo hook uninstall claude-code`
657
+
658
+ ### What the hook adds (Claude Code example)
659
+
660
+ ```markdown
661
+ ## Project Memory (Hippo)
662
+
663
+ Before starting work, load relevant context:
664
+ hippo context --auto --budget 1500
665
+
666
+ When you hit an error or discover a gotcha:
667
+ hippo remember "<what went wrong and why>" --error
668
+
669
+ After completing work successfully:
670
+ hippo outcome --good
671
+ ```
672
+
673
+ ### MCP Server
674
+
675
+ For any MCP-compatible client (Cursor, Windsurf, Cline, Claude Desktop):
676
+
677
+ ```bash
678
+ hippo mcp # starts MCP server over stdio
679
+ ```
680
+
681
+ Add to your MCP config (e.g. `.cursor/mcp.json` or `claude_desktop_config.json`):
682
+
683
+ ```json
684
+ {
685
+ "mcpServers": {
686
+ "hippo-memory": {
687
+ "command": "hippo",
688
+ "args": ["mcp"]
689
+ }
690
+ }
691
+ }
692
+ ```
693
+
694
+ Exposes tools: `hippo_recall`, `hippo_remember`, `hippo_outcome`, `hippo_context`, `hippo_status`, `hippo_learn`, `hippo_wm_push`.
695
+
696
+ ### OpenClaw Plugin
697
+
698
+ Native plugin with auto-context injection, workspace-aware memory lookup, and
699
+ tool hooks for auto-learn / auto-sleep. When `autoSleep` is enabled, the
700
+ OpenClaw plugin now launches `hippo sleep` in a detached background worker at
701
+ session end so the live session can exit immediately.
702
+
703
+ Query-time retrieval still uses the active workspace store plus the shared
704
+ global store. Daily consolidation comes from the machine-level runner that
705
+ `hippo init` / `hippo setup` installs.
706
+
707
+ ```bash
708
+ openclaw plugins install hippo-memory
709
+ openclaw plugins enable hippo-memory
710
+ ```
711
+
712
+ Plugin docs: [extensions/openclaw-plugin/](extensions/openclaw-plugin/). Integration guide: [integrations/openclaw.md](integrations/openclaw.md).
713
+
714
+ ### Claude Code Plugin
715
+
716
+ Plugin with SessionStart/Stop hooks and error auto-capture. See [extensions/claude-code-plugin/](extensions/claude-code-plugin/).
717
+
718
+ Full integration details: [integrations/](integrations/)
719
+
720
+ ---
721
+
722
+ ## The Neuroscience
723
+
724
+ Hippo is modeled on seven properties of the human hippocampus. Not metaphorically. Literally.
725
+
726
+ **Why two stores?** The brain uses a fast hippocampal buffer + a slow neocortical store (Complementary Learning Systems theory, McClelland et al. 1995). If the neocortex learned fast, new information would overwrite old knowledge. The buffer absorbs new episodes; the neocortex extracts patterns over time.
727
+
728
+ **Why does decay help?** New neurons born in the dentate gyrus actively disrupt old memory traces (Frankland et al. 2013). This is adaptive: it reduces interference from outdated information. Forgetting isn't failure. It's maintenance.
729
+
730
+ **Why do errors stick?** The amygdala modulates hippocampal consolidation based on emotional significance. Fear and error signals boost encoding. Your first production incident is burned into memory. Your 200th uneventful deploy isn't.
731
+
732
+ **Why does retrieval strengthen?** Recalled memories undergo "reconsolidation" (Nader et al. 2000). The act of retrieval destabilizes the trace, then re-encodes it stronger. This is the testing effect. Hippo implements it mechanically via the half-life extension on recall.
733
+
734
+ **Why does sleep consolidate?** During sleep, the hippocampus replays compressed versions of recent episodes and "teaches" the neocortex by repeatedly activating the same patterns. Hippo's `sleep` command runs this as a deliberate consolidation pass.
735
+
736
+ The 7 mechanisms in full: [PLAN.md#core-principles](PLAN.md#core-principles)
737
+
738
+ For how these mechanisms connect to LLM training, continual learning, and open research problems: **[RESEARCH.md](RESEARCH.md)**
739
+
740
+ **Why does reward modulate decay?** In spiking neural networks, reward-modulated STDP strengthens synapses that contribute to positive outcomes and weakens those that don't. Hippo's reward-proportional decay (v0.11.0) implements this: memories with consistent positive outcomes decay slower, negatives decay faster, with no fixed deltas. Inspired by [MH-FLOCKE](https://github.com/MarcHesse/mhflocke)'s R-STDP architecture for quadruped locomotion, where the same mechanism produces stable learning with 11.6x lower variance than PPO.
741
+
742
+ **Prior art in agent memory simulation.** The idea that human-like memory produces human-like behavior as an emergent property was explored in IEEE research from 2010-2011 ([5952114](https://ieeexplore.ieee.org/document/5952114), [5548405](https://ieeexplore.ieee.org/document/5548405), [5953964](https://ieeexplore.ieee.org/document/5953964)). Walking between rooms and forgetting why you went there doesn't need direct simulation; it emerges naturally from a memory system with capacity limits and decay. Hippo's design follows the same principle: implement the mechanisms, and the behavior follows.
743
+
744
+ **Related work:** [HippoRAG](https://arxiv.org/abs/2405.14831) (Gutierrez et al., 2024) applies hippocampal indexing to RAG via knowledge graphs. [MemPalace](https://github.com/milla-jovovich/mempalace) (Sigman & Jovovich, 2026) organizes memory spatially (wings/halls/rooms) with AAAK compression, achieving 100% on [LongMemEval](https://arxiv.org/abs/2410.10813). [MH-FLOCKE](https://github.com/MarcHesse/mhflocke) (Hesse, 2026) uses spiking neurons with R-STDP for embodied cognition. Each system tackles a different facet: HippoRAG optimizes retrieval quality, MemPalace optimizes retrieval organization, MH-FLOCKE optimizes embodied learning, and Hippo optimizes memory lifecycle.
745
+
746
+ ---
747
+
748
+ ## Comparison
749
+
750
+ The AI-memory category matured fast in 2026. Hippo's specific take — bio-decay, strengthen-on-use, outcome-weighted half-lives — is one stance among several. The table below is a feature snapshot, not a verdict: graph-first systems ([gbrain](https://hermesatlas.com/projects/garrytan/gbrain), [Zep](https://www.getzep.com/), [Cognee](https://www.cognee.ai/)), agent-managed systems ([Letta](https://github.com/letta-ai/letta)), and version-control / skill-distillation takes ([Memoria](https://github.com/matrixorigin/Memoria), [EverMind](https://evermind.ai/)) all solve adjacent problems with different mechanics.
751
+
752
+ | Feature | Hippo | [MemPalace](https://github.com/milla-jovovich/mempalace) | [Mem0](https://github.com/mem0ai/mem0) | [Basic Memory](https://github.com/basicmachines-co/basic-memory) | [gbrain](https://hermesatlas.com/projects/garrytan/gbrain) | [Zep](https://www.getzep.com/) | [Letta](https://github.com/letta-ai/letta) | [Cognee](https://www.cognee.ai/) | [Memoria](https://github.com/matrixorigin/Memoria) | [EverMind](https://evermind.ai/) |
753
+ |---------|-------|-----------|------|-------------|--------|-----|-------|--------|---------|----------|
754
+ | Decay by default | Yes | No | No | No | No | No | No | No | No | No |
755
+ | Retrieval strengthening | Yes | No | No | No | No | No | No | Partial (recall tuning) | No | Partial (Skill Memory distills patterns) |
756
+ | Reward-proportional decay | Yes | No | No | No | No | No | No | No | No | No |
757
+ | Hybrid search (BM25 + embeddings) | Yes | Embeddings + spatial | Embeddings only | No | Yes (vec + rerank + graph) | Yes (graph + vec) | ? | Yes (GraphRAG) | Yes (vector + full-text) | Yes (mRAG, multi-modal) |
758
+ | Schema acceleration / knowledge graph | Yes (schema) | No | No | No | Yes (typed KG, self-wiring) | Yes (temporal KG) | No | Yes (auto-ontologies) | No (typed claims) | Yes (hierarchical: user/group/agent) |
759
+ | Conflict detection + resolution | Yes | No | No | No | Yes (eval-surfaced) | Yes (auto-invalidate stale facts) | No | No | Yes (auto-detect + quarantine) | Partial (temporal tracking) |
760
+ | Multi-agent shared memory | Yes | No | No | No | Yes (brain repo, team mounts) | Yes | No (single-agent state) | Yes | Yes (branch/merge across sessions) | Yes (multi-agent coordination) |
761
+ | Transfer scoring | Yes | No | No | No | No | No | No | No | No | No |
762
+ | Outcome tracking | Yes | No | No | No | No | No | No | No | No | Partial (Cases: agent trajectories) |
763
+ | Confidence tiers | Yes | No | No | No | No (typed facts) | No | No | No | No | No |
764
+ | Spatial organization | No | Yes (wings/halls/rooms) | No | No | No | No | No | No | No | No |
765
+ | Lossless compression | No | Yes (AAAK, 30x) | No | No | No | No | No | No | No | No |
766
+ | Cross-tool import (ChatGPT/Claude/Cursor) | Yes | No | No | No | Partial (data sources) | ? | No | Partial (28 data sources) | No (Git ops) | Partial (mRAG: PDFs/images/URLs) |
767
+ | Auto-hook install | Yes | No | No | No | No | No | No | No | No | No |
768
+ | MCP server | Yes | Yes | No | No | Yes (stdio + HTTP/OAuth) | Partial (managed) | Yes (via Letta Code) | Yes (first-party Claude/LangGraph) | Yes | ? |
769
+ | Zero runtime deps | Yes | No (ChromaDB) | No | No | No (PGLite or PG+pgvector) | No (managed service) | No (Python deps) | No (Python deps) | Yes (single Rust binary) | No (managed + OSS) |
770
+ | LongMemEval (best published) | 86.8% R@5 (F13+F9, oracle\*) | 96.6% raw / 100% reranked R@5 | ~49-85% R@5 | N/A | 97.6-97.9% R@5 (s_cleaned\*) | N/A (LoCoMo 80.3%) | N/A | N/A | 88.78% overall accuracy w/ reader\*\* | 83.00% overall\*\* (LoCoMo 93.05%, HaluMem 93.04%) |
771
+ | Git-friendly | Yes | No | No | Yes | Yes | No | No | No | Yes (Git is the model) | ? |
772
+ | Framework agnostic | Yes | Yes | Partial | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
773
+ | License | MIT | (open) | Apache-2.0 | (open) | MIT | Apache-2.0 (community) | Apache-2.0 | MIT (core) | Apache-2.0 | Apache-2.0 (OSS) + cloud |
774
+
775
+ \* Split-mismatched: Hippo's 86.8% is on `longmemeval_oracle` (3 sessions per haystack); gbrain's 97.6% is on `longmemeval_s_cleaned` (~40 sessions per haystack). Different splits, different difficulty. Not directly comparable.
776
+
777
+ \*\* Different metric: Memoria's 88.78% and EverMind's 83% are reported as overall accuracy with a reader LLM, not retrieval R@5. Higher denominator + LLM helps. Not directly comparable to retrieval-only R@5 numbers above.
778
+
779
+ Different tools answer different questions. Mem0 and Basic Memory implement "save everything, search later." MemPalace implements "store everything, organize spatially for retrieval." gbrain, Zep, and Cognee implement "extract typed entities and relationships into a knowledge graph." Letta implements "the agent edits its own memory blocks." Memoria implements "Git-style version control over the memory state itself." EverMind implements "self-evolving Skill Memory + multi-modal retrieval over hierarchical scopes." Hippo implements "forget by default, earn persistence through use." These are complementary takes, not a single-axis ranking: bio-lifecycle (Hippo) + GraphRAG (gbrain/Cognee/Zep) + agent-self-edit (Letta) + memory-VCS (Memoria) + skill-distillation (EverMind) cover different parts of the same problem.
780
+
781
+ ---
782
+
783
+ ## Benchmarks
784
+
785
+ Two benchmarks testing two different things. Full details in [`benchmarks/`](benchmarks/).
786
+
787
+ ### LongMemEval (retrieval accuracy)
788
+
789
+ [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) is the industry-standard benchmark: 500 questions across 5 memory abilities, embedded in 115k+ token chat histories.
790
+
791
+ **Hippo v0.28.0 results (hybrid BM25 + cosine, full 500 questions):**
792
+
793
+ | Metric | v0.28 | v0.11 (BM25 only) |
794
+ |--------|-------|-------------------|
795
+ | Recall@1 | 46.6% | 50.4% |
796
+ | Recall@3 | **67.0%** | 66.6% |
797
+ | Recall@5 | 73.8% | 74.0% |
798
+ | Recall@10 | 81.0% | 82.6% |
799
+ | Answer in content@5 | **49.6%** | 46.6% |
800
+
801
+ | Question Type | Count | R@5 | R@10 |
802
+ |---------------|-------|-----|------|
803
+ | single-session-assistant | 56 | 100.0% | 100.0% |
804
+ | knowledge-update | 78 | 89.7% | 96.2% |
805
+ | multi-session | 133 | 72.2% | 82.0% |
806
+ | temporal-reasoning | 133 | 72.9% | 78.9% |
807
+ | single-session-user | 70 | 62.9% | 71.4% |
808
+ | single-session-preference | 30 | 20.0% | 33.3% |
809
+
810
+ For context: MemPalace scores 96.6% (raw) using ChromaDB embeddings + spatial indexing. Hippo v0.28 achieves 73.8% R@5 with hybrid BM25 + cosine. Hybrid scoring trades a little R@1 accuracy for better top-5 content relevance (answer_in_content@5 +3pp vs v0.11).
811
+
812
+ Hippo's strongest categories (single-session-assistant 100% R@5, knowledge-update 89.7%) are where keyword overlap between question and stored content is highest. The weakest (preference 20%) involves indirect references that need deeper semantic understanding.
813
+
814
+ > Note: v0.28 R@10 is 1.6pp below v0.11's BM25-only result. The earlier v0.27 benchmark showed an apparent 35pp regression — that was a methodology bug (budget-limited retrieval vs unlimited), fixed in v0.28 with the `minResults` option. See [`evals/README.md`](evals/README.md) for the full investigation and per-type breakdown.
815
+
816
+ ```bash
817
+ cd benchmarks/longmemeval
818
+ python ingest_direct.py --data data/longmemeval_oracle.json --store-dir ./store
819
+ python retrieve_fast.py --data data/longmemeval_oracle.json --store-dir ./store --output results/retrieval.jsonl
820
+ python evaluate_retrieval.py --retrieval results/retrieval.jsonl --data data/longmemeval_oracle.json
821
+ ```
822
+
823
+ ### Sequential Learning Benchmark (agent improvement over time)
824
+
825
+ No other public benchmark tests whether memory systems produce learning curves. LongMemEval tests retrieval on a fixed corpus. This benchmark tests whether an agent with memory *performs better on task 40 than task 5*.
826
+
827
+ 50 tasks, 10 trap categories, each appearing 2-3 times across the sequence.
828
+
829
+ > **v0.11.0 informal results — RETRACTED v1.7.9.** The 78% → 14% magnitude does NOT reproduce on the formal sequential-learning benchmark. Three pre-registered workload variants (v1.7.5 full-late, v1.7.6 budget sweep, v1.7.7 `--restrict-late-to 4`) all returned C2 hippo-base late mean = 0.0% across every seed (the workload's late phase saturates structurally). The mechanism (dlPFC goal-stack: `pushGoal`/`completeGoal` hooks, `--use-goal-stack`) is shipped and exercisable. **The magnitude is RETRACTED. The mechanism is shipped; no magnitude is currently claimed.** v1.8.0 (queued) explores adversarial trap categories as mechanism characterisation under the magnitude-smuggling guard in `docs/RETRACTION.md`. Pre-registration trail: `docs/evals/2026-05-07-v1.7.5-goal-stack-eval-prereg.md`, `docs/evals/2026-05-09-v1.7.6-calibration-result.md`, `docs/evals/2026-05-09-v1.7.7-goal-stack-eval-result.md`. CHANGELOG: see v1.7.9 entry.
830
+
831
+ <details>
832
+ <summary>Original v0.11.0 informal numbers (RETRACTED — preserved as audit trail in git, not reproduced here)</summary>
833
+
834
+ v0.11.0 reported a single-run informal headline citing late-phase trap-rate decline on the sequential-learning benchmark. The specific numbers are archived at git tag `v0.11.0` and the corresponding `CHANGELOG.md` historical entry. Retained in version control, not reproduced here, since reproduction risks accidental re-citation. See `git show v0.11.0 -- README.md` for the original wording.
835
+
836
+ </details>
837
+
838
+ The benchmark, harness, and adapter contract remain shipped. Any memory system can run this benchmark by implementing the [adapter interface](benchmarks/sequential-learning/adapters/interface.mjs).
839
+
840
+ ```bash
841
+ cd benchmarks/sequential-learning
842
+ node run.mjs --adapter all
843
+ ```
844
+
845
+ ---
846
+
847
+ ## Contributing
848
+
849
+ Issues and PRs welcome. Before contributing, run `hippo status` in the repo root to see the project's own memory.
850
+
851
+ The interesting problems:
852
+ - **Improve LongMemEval score.** Current R@5 is 73.8% with hybrid BM25 + cosine (v0.28). Gap to MemPalace's 96.6% likely needs better chunking, reranking, or semantic compression — not just more of the same retrieval.
853
+ - Better consolidation heuristics (LLM-powered merge vs current text overlap)
854
+ - Web UI / dashboard for visualizing decay curves and memory health
855
+ - Optimal decay parameter tuning from real usage data
856
+ - Cross-agent transfer learning evaluation
857
+ - **MemPalace-style spatial organization.** Could spatial structure (wings/halls/rooms) improve hippo's semantic layer?
858
+ - **AAAK-style compression for semantic memories.** Lossless token compression for context injection.
859
+
860
+ ## License
861
+
862
+ MIT