hippo-memory 1.11.1 → 1.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,6 +5,10 @@
5
5
  [![npm](https://img.shields.io/npm/v/hippo-memory)](https://npmjs.com/package/hippo-memory)
6
6
  [![license](https://img.shields.io/badge/license-MIT-blue)](./LICENSE)
7
7
 
8
+ <p align="center">
9
+ <img src="./assets/hippo-init.svg" alt="hippo init --scan ~ — initializing memory across all repos" width="720">
10
+ </p>
11
+
8
12
  A memory layer for AI agents. Modeled on the hippocampus. Decay by default, strength through use, provenance on every memory. SQLite under the hood, zero runtime deps, works with every CLI agent you have.
9
13
 
10
14
  ```bash
@@ -36,7 +40,7 @@ It also fixes the portability problem. Your ChatGPT memories don't travel to Cla
36
40
 
37
41
  Numbers, not adjectives. Every claim links to the benchmark or the test that proves it.
38
42
 
39
- - **Sequential Learning Benchmark.** [benchmarks/sequential-learning/](benchmarks/sequential-learning/). 50 tasks, 10 buried traps. Measures whether agents learn from past mistakes, not just retrieve text. v0.11.0 informal magnitude RETRACTED v1.7.9; mechanism remains shipped. See "What's new in v1.7.9".
43
+ - **Sequential Learning Benchmark.** [benchmarks/sequential-learning/](benchmarks/sequential-learning/). 50 tasks, 10 buried traps. Measures whether agents learn from past mistakes, not just retrieve text. v0.11.0 informal magnitude RETRACTED v1.7.9; mechanism remains shipped. See [CHANGELOG.md](./CHANGELOG.md) v1.7.9 entry.
40
44
  - **R@5 = 74.0%** on [LongMemEval](benchmarks/longmemeval/). 500-question industry retrieval benchmark, BM25 only, no embeddings.
41
45
  - **10 of 10 incident scenarios beat transcript replay** on a staged Slack corpus ([benchmarks/e1.3/](benchmarks/e1.3/)). Recall surfaces the cause faster than scrolling the last N messages.
42
46
  - **0 outbound HTTP** on the 1000-event ingestion smoke. Proven by a `globalThis.fetch` spy that throws on call, not a hardcoded zero.
@@ -85,264 +89,7 @@ hippo recall "data pipeline issues" --budget 2000
85
89
 
86
90
  ---
87
91
 
88
- ### What's new in v1.10.1
89
-
90
- - **`stop()` pidfile-ownership guard.** Closes the last open item in the v0.37 server-hardening cluster. `serve()`'s `stop()` and the `cli.ts` stale-pidfile self-heal removed `server.pid` unconditionally, so a shutting-down server could delete a newer live server's pidfile and orphan it. The new `removePidfileIfOwned` unlinks the pidfile only on a `(pid, started_at)` identity match; both call sites are rewired to it. Built via the `/dev-framework-rl` pipeline (plan-eng 91, code-review 88, independent-review 88, ship-readiness 92, canary 96).
91
- - **Version-field sync.** `package.json`, the lockfile, `openclaw.plugin.json`, `src/version.ts`, and both `extensions/openclaw-plugin` manifests are now all `1.10.1`, correcting drift left by v1.9.x/v1.10.0 that left `/health` and the MCP `serverInfo` under-reporting the version.
92
-
93
- ### What's new in v1.10.0
94
-
95
- - **Server and lifecycle hardening.** Closes the `TODOS.md` "server / lifecycle hardening" cluster (deferred follow-ups from the v0.37 server-mode work, the v0.40 security pass, and the A3 envelope review), six items in all. `detectServer` is now async and confirms a recorded server is genuinely this hippo process by matching a `/health` `started_at` before the CLI routes to it (H1). The pidfile carries a `schema` version (L3). `hippo serve` refuses to start when a live peer already serves the hippoRoot (H3). The 413 over-cap-body path closes the socket instead of draining the rest (M3). A `HIPPO_REQUIRE_SERVER` env knob turns a missing server into a loud error instead of a silent direct-mode fallback that discards `HIPPO_API_KEY` (H2). And `hippo forget --archive --reason` gives raw, append-only memories a real removal path via `archiveRaw` instead of a misleading "not found" (A3).
96
- - **Reviewed via the dev-framework chain.** self-review, an independent code review, a cross-model codex (gpt-5.5) pass, and a security pass. Four findings were fixed before ship: the `forget --archive` server-routing bypass, an unbounded `/health` response parse, a timeout that wrongly unlinked a busy server's pidfile, and pidfile-url validation against off-box redirection. Full suite green: 216 files, 1557 tests.
97
-
98
- ### What's new in v1.9.3
99
-
100
- - **Reranker review-tail patch.** Closes the three follow-ups raised on PR #25: `src/rerankers/llm.ts` now wires `AbortController` + `setTimeout` around the fetch (default 30 s, overridable via `HIPPO_LLM_RERANKER_TIMEOUT_MS`) so recall never hangs on a wedged endpoint; `src/rerankers/cross-encoder.ts` emits a single `console.warn` on first identity-fallback per process so silent fallback no longer masquerades as a working reranker; the orphan `RerankSignals` type (sole consumer retracted in v1.9.1) is removed at both the re-export and the definition.
101
- - **Version alignment.** `package.json` bumped 1.8.1 → 1.9.3. v1.9.0 / v1.9.1 / v1.9.2 were on-master research milestones never published to npm; v1.9.3 is the first published `1.9.x` release and carries the cumulative scope from F6 (rerankers) through F13 (chunk-per-turn) plus the F10 HARD RETRACTION.
102
- - **Mechanism cumulative-null status unaffected.** Per `docs/RETRACTION.md:94-113`. No `src/` change in this patch touches the dlPFC goal-stack mechanism. **This release does not re-assert the retracted −10pp magnitude.**
103
-
104
- ### What's new in v1.9.2
105
-
106
- - **F13 chunk-per-turn LongMemEval R@5 = 86.8 on oracle (Gate-B PASS).** Plan F13 (`docs/evals/2026-05-12-r5-track6-chunk-per-turn-prereg.md`) addresses the structural pathology that limited every prior LongMemEval track (F8–F12): sessions in `data/longmemeval_oracle.json` are ~14k chars median (~3,500 tokens), but the embedders we can reach (MiniLM, BGE-base, multilingual-e5-large) cap at 512–514 tokens. Every prior track embedded only the first ~2 turns of each 12-turn session and truncated the rest. F13 replaces session-level embedding with turn-level embedding (10,866 turns over the 940 oracle sessions, max-pool by `session_id` at retrieval). Gate-A PASS (10,866 turns, all 940 sessions covered, 768-dim normalized). **Gate-B PASS:** F13 + F9 sub-agent rerank stack R@5 = 86.8 on `data/longmemeval_oracle.json` (threshold ≥ 83.2 = F11+F9 deployable best 78.2 + 5pp; margin 3.6). R@1 = 70.8, R@10 = 90.2, R@20 = 93.4.
107
- - **Roadmap target met (oracle split).** R@5 ≥ 85% was NON-binding per every prior prereg; observed 86.8 on `data/longmemeval_oracle.json` as of this release. Descriptive characterisation; not a re-assertion of any retracted magnitude.
108
- - **Split-mismatch with gbrain (unchanged).** `longmemeval_oracle` carries 3 sessions per haystack; gbrain v0.28.8's 97.60 figure is on `longmemeval_s_cleaned` (~40 sessions per haystack) with OpenAI `text-embedding-3-large@1536`. Both HF Hub and OpenAI API are host-blocked from this sandbox (verified 2026-05-12). F13's 86.8 is NOT directly comparable to gbrain's 97.60.
109
- - **F12 retracted.** Plan F12 (`docs/evals/2026-05-11-r5-track5-e5-large-top100-prereg.md`) vendored `intfloat/multilingual-e5-large` and widened the candidate pool to top-100. Gate-A PASS; Gate-B FAIL with best variant R@5 = 78.8 (threshold 83.2). HARD RETRACTION executed: `hippo_store2/` reverted to BGE-base; the `prefixFor` / `preferredBackend` dispatch helpers stay in `src/embeddings.ts` per the dispatch-shape carve-out (they return the legacy behaviour for non-e5 models).
110
- - **No `src/` changes in v1.9.2.** F13 is implemented as `benchmarks/longmemeval/chunk_per_turn_{embed,retrieve}.mjs` and reuses F11/F12's existing dispatch helpers. The cumulative-null status of the dlPFC goal-stack mechanism (`docs/RETRACTION.md:94-113`) is unaffected. **This release does not re-assert the retracted −10pp magnitude.**
111
-
112
- ### What's new in v1.9.1
113
-
114
- - **F10 features-reranker retraction.** Plan F10 (`docs/plans/2026-05-11-r5-track3-richer-ingest.md`) tested whether populating entry-level signals via 19 Claude-sub-agent invocations would let the features reranker move R@5 above features-default + 5pp on LongMemEval. Observed: features-enriched R@5 = 59.2 vs features-default R@5 = 75.8 (same bge-base embedding model), a 21.6pp shortfall against the binding gate. Per the prereg's HARD RETRACTION clause, `src/rerankers/features.ts` + its test + its micro-fixture + its dispatcher case are removed in v1.9.1. The Track 2 cross-encoder and Track 3 LLM-rerank skeletons are preserved. **This release does not re-assert the retracted −10pp magnitude.** Per `docs/RETRACTION.md`.
115
- - **F11 embedding upgrade tested and documented (not shipped as default).** Plan F11 (`docs/plans/2026-05-11-r5-track4-embedding-upgrade.md`) swapped `Xenova/all-MiniLM-L6-v2` for `BAAI/bge-base-en-v1.5` (768-dim, CLS pooling). Gate-A PASS; Gate-B FAIL (R@5 = 77.0% vs threshold 81.8%). The `poolingFor` per-model dispatch in `src/embeddings.ts` and the `--model` flag in `scripts/fetch_embedding_model.mjs` ship; MiniLM remains the project default.
116
- - **Cross-track R@5 status (as of v1.9.1):** F8 hybrid tuning (MiniLM) 76.8, F9 v2 sub-agent LLM rerank (MiniLM) 78.0, F11 bge-base baseline 77.0, F11+F9 stack (BGE-base + sub-agent rerank) 78.2 — cross-track best at v1.9.1 — F10 features-enriched (retracted) 59.2. Roadmap target R@5 ≥ 85% was NOT MET at v1.9.1. NON-binding per each prereg. *(Superseded in v1.9.2 by F13 + F9 stack R@5 = 86.8 on oracle.)*
117
-
118
- ### What's new in v1.9.0
119
-
120
- - **F6 reranker hardening shipped.** New `RerankerFn` seam in `hybridSearch` with three reranker tracks: Track 1 features (`MemoryEntry`-level signals, no external deps), Track 2 cross-encoder (MS-MARCO MiniLM via optional `@xenova/transformers`, identity-fallback on load failure), Track 3 LLM (env-gated skeleton against an OpenAI-compatible endpoint). Opt in via `hippo recall --reranker <name>`.
121
- - **Workload-validity verdicts on the LongMemEval sweep** (`docs/evals/2026-05-10-f6-reranker-result.md`, prereg `docs/evals/2026-05-10-f6-reranker-prereg.md`): Gate-A (firing rate, binding) PASS for the features track, PASS-with-caveat for cross-encoder (500/500 invocations all took the identity-fallback branch — HF model download was blocked in the test environment, so this is NOT a real cross-encoder evaluation). Gate-B (hyperparameter discrimination, binding) FAIL — features_topk{20,50,100} produced byte-identical R@K, so no per-hyperparameter R@5 effect is claimed.
122
- - **Roadmap R@5 ≥ 85% target NOT met on the workload tested.** Observed R@5 = 75.4% (features, all three top-K settings) and 75.6% (baseline). Per the prereg this is descriptive characterisation, not a binding gate; the mechanism ships, and a real attempt at the target requires either a real cross-encoder evaluation (HF access) or a richer ingest path that populates entry-level reranker signals.
123
- - **This release does not re-assert the retracted −10pp magnitude.** Per `docs/RETRACTION.md`. The dlPFC goal-stack cumulative-null status (`docs/RETRACTION.md:94-113`) is independent of this release.
124
-
125
- ### What's new in v1.8.1
126
-
127
- - **v1.8 prereg's v1.9 LongMemEval cross-validation pre-commitment RETRACTED.** Outside-voice review on two iterations of the v1.9 plan found six structural barriers (canonical harness bypasses the boost path; ingest tag namespace excludes content-derived stems; pushGoal API field mismatch; depth-cap suspension; trigger AND clause unreachable; workload-validity gate ceremonial). Per Root Cause Over Patches, public retraction over re-architecture. **This release does not re-assert the retracted −10pp magnitude.** Per `docs/RETRACTION.md`.
128
- - **Pre-registration discipline rule pinned in `docs/RETRACTION.md`:** no future eval pre-commitment is binding without (a) source-read of the code paths the design depends on, AND (b) a 1-question dry-run confirming the mechanism FIRES before pre-reg locks.
129
- - **Mechanism-effect status (cumulative null escalation)** appended to `docs/RETRACTION.md`. Across every workload pre-registered and tested to date (v1.7.5/6/7 SANITY_FAILs, v1.8 SAME=20/20 sign-only, v1.9 untestable), the dlPFC goal-stack mechanism has not produced a detectable behavioural effect at the metric level. The mechanism's CODE is preserved; the THEORY is preserved; what is acknowledged is that its EFFECT on the workloads we have been able to test is undetectable.
130
- - **No new eval pre-commitment in v1.8.1.** Future eval directions drafted under the new discipline rule.
131
-
132
- ### What's new in v1.8.0
133
-
134
- - **Adversarial-categories release** for the sequential-learning benchmark. 10 → 13 categories (3 new: `timezone_naive`, `idempotency_retry`, `float_accumulation`). Lesson vocabulary verified <0.30 Jaccard overlap vs existing 10 (`tools/jaccard-overlap.mjs`; max=0.033). Workload 50 → 62 tasks; late-phase metric (`--restrict-late-to 4`) preserved.
135
- - **Workload-validity verdict: PASS.** C2 hippo-base lateMean = 0.25 (lattice rate), 20 of 20 seeds non-zero — first non-saturated workload across v1.7.5/6/7/8. Framed as workload-validity / non-saturation check per `docs/RETRACTION.md`, NOT a magnitude criterion.
136
- - **Mechanism characterisation: C3 = C2 on all 20 seeds.** Sign-only seed-pair direction count (vs C2): 0 STRICTLY_LOWER / 0 STRICTLY_HIGHER / 20 TIED. The goal-stack mechanism does not detectably change per-seed late-4 lattice rate on this workload. Hook failures: 0/0. **This release does not re-assert the retracted −10pp magnitude.** Per `docs/RETRACTION.md`, mechanism remains shipped; no magnitude is currently claimed.
137
- - **Pre-committed v1.9 direction:** LongMemEval R@5 cross-validation. Named BEFORE v1.8 ran; the v1.8 PASS verdict does not change the pre-commitment.
138
-
139
- ### What's new in v1.7.9
140
-
141
- - **−10pp goal-stack lift magnitude RETRACTED.** Three pre-registered workload variants (v1.7.5 full-late SANITY_FAIL, v1.7.6 budget sweep B*=NULL, v1.7.7 `--restrict-late-to 4` SANITY_FAIL) all returned C2 hippo-base late mean = 0.0% across every seed. The 78% → 14% headline does not reproduce on the formal harness. Mechanism (dlPFC goal-stack) remains shipped; **no magnitude is currently claimed.**
142
- - **Pre-emptive retraction (deliberate departure from v1.7.7 prereg).** The prereg explicitly distinguished SANITY_FAIL (no retraction) from NOT_SUPPORTED (retraction). v1.7.9 deviates on cumulative-evidence grounds; the deviation is declared, not silent. v1.8 still runs as planned; retraction is independent of v1.8 outcome.
143
- - **`docs/RETRACTION.md`** pinned this release as a magnitude-smuggling guard for v1.8 and beyond.
144
- - **3 P2 polish items folded in** from the v1.7.8 audit (README/result.md rounding consistency with raw-data disclosure, `pairedPermutationCI` docstring, `BAND_LOW`/`BAND_HIGH` provenance comment). The 4th (Float64Array micro-opt in `analyze-v1.7.7.mjs`) is **deferred to v1.7.10** to keep this release doc-only and audit-clean.
145
-
146
- ### What's new in v1.7.8
147
-
148
- - **Audit-fix patch.** Retroactive `/review` on v1.7.5/v1.7.6/v1.7.7 found 9 P0+P1 items (the review chain was partially skipped on those releases). All 9 fixed surgically across 3 atomic commits. No behavior change for end users; integrity fixes for the eval audit trail.
149
- - **(P0)** Analyzer sanity gate now matches the v1.7.7 pre-reg (N=4 lattice rule: mean ∈ [5%, 50%] AND ≥3 distinct seeds non-zero, not the inherited [4%, 24%] band). v1.7.6 calibration result doc replaces overstated "pre-registration discipline" framing with explicit citation of the plan v2 commit + calibrate.mjs commit as the actual pre-registration anchors.
150
- - **(P1)** Hippo benchmark adapter instance state hoisted from module-level to per-instance fields (race-condition-free for future parallel benchmarks). `selectBStar` reason string honesty fix. v1.7.7 prereg SUPPORTED template band corrected. ROADMAP-RESEARCH:156 status update on the −10pp claim. Defensive throw in `runOneBudget`. Verdict-precedence and selectBStar defensive tests added.
151
- - **Tests:** 1480 passing (+4 from v1.7.7), 0 regressions.
152
-
153
- > Updated v1.7.9: the −10pp magnitude is RETRACTED. See "What's new in v1.7.9" above and `CHANGELOG.md` v1.7.9 entry.
154
-
155
- ### What's new in v1.7.7
156
-
157
- - **`--restrict-late-to <int>` flag** on the sequential-learning runner. Narrows the late-phase metric to the last N trap encounters; early/mid re-split (Option A) so the three slices stay disjoint. Default null preserves chronological-third behavior.
158
- - **C2 sanity preflight at N=4 lattice — FAILED.** 20 seeds at `--restrict-late-to 4`. Late mean = 0.00% across all seeds; floor effect persists at last-4 just as it did at last-7. **C3 (goal-stack ON) was NOT collected** — no goal-stack data leak under SANITY_FAIL. Adapter not starved (early=77.3%, mid=4.5%); the workload is structurally easy in late phase regardless of window size.
159
- - **Cumulative evidence:** three pre-registered workload variants tested (v1.7.5 full-late, v1.7.6 budget sweep, v1.7.7 window restriction); none discriminating. The −10pp goal-stack lift claim remains untested. Hard-stop retraction fires on NOT_SUPPORTED, not SANITY_FAIL — magnitude is not auto-retracted yet. v1.8 (adversarial trap categories) is the last pre-registered escalation.
160
- - **`run.mjs` + `calibrate.mjs` now import-safe.** Stripped leading shebangs that broke vitest's importer; `run.mjs` wraps `main()` in an `invokedAsScript` guard. Latent fix from v1.7.6.
161
- - **17 new tests** (11 slice-math + 6 verdict). 1476 total passing.
162
-
163
- > Updated v1.7.9: the −10pp magnitude is RETRACTED. See "What's new in v1.7.9" above and `CHANGELOG.md` v1.7.9 entry.
164
-
165
- ### What's new in v1.7.6
166
-
167
- - **Fresh-tail pinned context injection.** `hippo context --pinned-only --include-recent <n>` now includes the last N writes regardless of pinning, so memories saved mid-session can appear in the next Claude Code `UserPromptSubmit` injection before they are explicitly pinned. New Claude hook installs use `--include-recent 5`; legacy pinned-only hooks are migrated on `hippo hook install`.
168
- - **Calibration sweep on the sequential-learning benchmark.** Adds `--budget` plumbing through the runner + a calibration script (`calibrate.mjs`) with a mechanical B* selection rule. Used to test "would smaller budget recover headroom for the goal-stack hypothesis?" on the v1.7.5 floor.
169
- - **Calibration verdict: budget reduction does not produce a discriminating workload.** 5 budgets × 10 seeds = 50 single-seed runs all returned 0% late-phase trap rate. Floor effect is structural, not budget-tunable. B\* = NULL. Per pre-registered escalation, v1.7.7 will sweep `--restrict-late-to last-4` instead.
170
- - **Bug-fix on `calibrate.mjs` starvation guard.** Read a non-existent JSON field; false-positive `starved=true` on every candidate. Did not affect the verdict (lateMean=0% was load-bearing). Fix: drop the broken extraction.
171
- - **Hypothesis still untested.** The −10pp goal-stack lift claim remains unsupported by a discriminating workload. Mechanism still shipped from v1.7.4. Honest reporting: see `docs/evals/2026-05-09-v1.7.6-calibration-result.md`.
172
-
173
- > Updated v1.7.9: the −10pp magnitude is RETRACTED. See "What's new in v1.7.9" above and `CHANGELOG.md` v1.7.9 entry.
174
-
175
- ### What's new in v1.7.5
176
-
177
- - **Sequential-learning benchmark gains `pushGoal`/`completeGoal` hooks** + a multi-seed eval harness with seeded category-to-slot variance, exact paired permutation CI, and `--eval-strict` mode. The dlPFC goal-stack mechanism is now exercisable on the public benchmark.
178
- - **Tag-fix on memory store** so the goal-stack boost can actually match. Pre-fix the boost would have matched zero memories.
179
- - **Eval ran but stopped per pre-registered sanity gate.** Both hippo-base and hippo+goal-stack hit 0% late-phase trap rate across 20 seeds — floor effect prevents H1/H0 discrimination. The −10pp hypothesis remains untested on a discriminating workload. Mechanism shipped, hypothesis open. Pre-reg + result in `docs/evals/`.
180
-
181
- > Updated v1.7.9: the −10pp magnitude is RETRACTED. See "What's new in v1.7.9" above and `CHANGELOG.md` v1.7.9 entry.
182
-
183
- ### What's new in v1.7.4
184
-
185
- - **Goal-stack boost on MCP + HTTP.** Set `RecallOpts.sessionId` (or HTTP `?session_id=...`, or MCP `hippo_recall { session_id }`) and the dlPFC goal-stack boost — previously CLI-only — applies on MCP and HTTP too. Both `api.recall` (primary BM25 band, before fresh-tail / summary appendix) AND MCP's separate `physicsSearch`/`hybridSearch` path are boosted. New `RecallOpts.goalTag` lets callers opt out per-call.
186
- - **`goal complete --no-propagate`.** New CLI flag and `CompleteGoalOpts.noPropagate` field for users who want to close a goal without strength side-effects on recalled memories. Default unchanged (propagate).
187
- - **Internal: `applyGoalStackBoost` and `enforceDepthCapWithinTx` helpers.** Lifted ~140 lines of duplicated logic into shared helpers. `@internal`, not on the public API surface.
188
-
189
- ### What's new in v1.7.3
190
-
191
- - **Hygiene release.** Closes the v1.7.2 review-tail: module-load assertion runtime test, `summarize_overflow=0` thin-client pin, internal `scopeFilter` rename, and a README "What's new" backfill for v1.7.0 and v1.6.5.
192
- - No public API change. No behaviour change. No schema change.
193
-
194
- ### What's new in v1.7.2
195
-
196
- - **`scorerWindow` over the wire.** HTTP `/v1/memories?scorer_window=N`, MCP `hippo_recall.scorer_window`, thin-client serializes `scorerWindow`. Validation unchanged (`recall()` rejects 0/negative/non-finite/non-numeric with `RecallContractError code: invalid_scorer_window`).
197
- - **Thin-client parity sweep.** `client.ts` now serializes all four RecallOpts transport fields (`fresh_tail_count`, `fresh_tail_session_id`, `summarize_overflow`, `scorer_window`); previously only the first three over HTTP, all missing in client.
198
- - **Single source of truth for default-deny recall scopes.** `RECALL_DEFAULT_DENY_SCOPES` constant shared by SQL clause + 5 JS sites (api.recall, MCP physics-scorer, MCP assemble, CLI continuity, api continuity). Adding a literal deny scope is a one-place change.
199
- - **Internal type cleanup.** `loadSearchRows::recallScope` is a discriminated union (`@internal`); can't construct an invalid intermediate.
200
-
201
- ### What's new in v1.7.1
202
-
203
- - Fixed `unknown:legacy` scope leak in BM25 base recall, at the **producer layer** (SQL predicate in `loadSearchRows` via new `loadRecallSearchEntries` helper). Future recall consumers cannot silently re-introduce the leak. Operators investigating the quarantine bucket should pass explicit `scope: 'unknown:legacy'`.
204
- - Hardened test coverage on the v1.7.0 foundations: `scorerWindow=1` lower bound, no-terms `ORDER BY`, tenant isolation across FTS / no-terms / LIKE-fallback paths, HTTP `windowSize` serialization.
205
- - Deterministic LIKE-fallback testing via new `HIPPO_FORCE_LIKE_PATH=1` env hook (read-only — never poisons the on-disk FTS index).
206
-
207
- ### What's new in v1.7.0
208
-
209
- - **`MemoryEntry.bm25_score?: number`.** Raw FTS5 `bm25()` score surfaced as provenance metadata on the FTS path of `loadSearchEntries`. `undefined` on every other path (empty query, FTS unavailable, LIKE fallback, full-store fallback, `readEntry`, `loadAllEntries`, deserialize). NOT a drop-in for the JS-side BM25 scorer in `src/search.ts` — different tokenizer, scale, sign convention. Provenance only.
210
- - **`RecallOpts.scorerWindow?: number`.** Decouples scorer candidate pool from `limit`. Default `undefined` preserves the existing 200-row store-internal default. Useful when `summarizeOverflow=true` and you want a wider candidate pool to detect more level-2 parent clusters.
211
- - **`RecallResult.windowSize?: number`.** Reports the scorer window actually used so callers can introspect "did the scorer see enough candidates?" without re-deriving the value.
212
- - **API contract fix (CRITICAL).** `RecallContractError` HTTP serialization aligned to `{error: <message>, code: <code>}` to match every other v1/* error. The v1.6.5 one-off shape (`{error: <code>, message: <text>}`) was a public-contract drift caught by the api-contract specialist in `/review`. **Breaking for v1.6.5 callers reading `body.error` for the typed code value** — migrate to `body.code`.
213
- - Three review-chain rounds (`/plan-eng-review`, `/codex review --model gpt-5.5`, `/review`) shaped this release: 4 P0s killed mk1 (including a fabricated `bm25_score` column), 2 P0s killed mk2 (including an MCP cap addressing a non-existent contract), and the 5-specialist `/review` pass added the api-contract fix and 4 INFO-level test improvements.
214
-
215
- ### What's new in v1.6.5
216
-
217
- - **`RecallContractError` exported class with `.code` field.** Thrown by `api.recall` when `HIPPO_REQUIRE_SESSION_SCOPED_FRESH_TAIL=1` AND `freshTailCount > 0` AND `freshTailSessionId` is unset. HTTP returns 400 with the typed error; MCP propagates via `-32603`; CLI exits 1. Default env unset preserves v1.6.x tenant-wide back-compat.
218
- - **Timestamp invariant documented** in `src/memory.ts`: all in-process `MemoryEntry` and session-state timestamps are canonical `Date.prototype.toISOString()` (24 chars, UTC, ms, trailing `Z`). Importers preserving local-time offsets MUST normalize on write.
219
- - **`assemble` ISO sort uses byte compare** instead of `localeCompare` — ~50× faster on canonical UTC ISO with no semantic change given the in-process invariant. Caveat documented: `deserializeEntry` / `rebuildIndex` round-trip frontmatter timestamps as-is, so legacy markdown with non-canonical offsets propagates without normalization.
220
- - **`loadFreshRawMemories` JSDoc-deprecated** for tenant-wide use (no `sessionId`). NO runtime `console.warn` — codex C9 rejected library-level stderr noise. Direct callers bypass the `api.recall` guard, so the JSDoc is the only nudge at that layer.
221
-
222
- ### What's new in v1.6.4
223
-
224
- - **`drillDown` returns a discriminated outcome.** `not_found` / `not_drillable` / `scope_blocked` instead of `null`. HTTP maps `not_drillable` to 422; cross-tenant and scope-blocked stay at 404 (no info-leak). Breaking for JS callers that did `result === null`; migrate to `'failure' in result`.
225
- - **HTTP `:id` segment validation.** Reject URL-encoded slashes (`%2F`/`%2f`) before path matching; reject illegal charset and >256 chars after. Applied across all `:id` routes.
226
- - Plan-stage `/codex` + `/review` caught 2 P0s in the initial draft (unscoped cross-tenant probe; validator ordering bug) before any code landed. Discipline pays.
227
-
228
- ### What's new in v1.6.3
229
-
230
- - **One P0 + four P1s caught by `/review` after v1.6.2 shipped.** The user noticed `/review` had been skipped across multiple releases. Running it retroactively surfaced a misleading `assemble.totalRaw` semantic on long sessions, three transport-surface drifts on the new RecallOpts, and an HTTP input-validation gap. All addressed. Process correction documented honestly in CHANGELOG.
231
-
232
- ### What's new in v1.6.2
233
-
234
- - **Two functional bugs caught by `/codex review` after v1.6.1.** (1) `loadSessionRawMemories` cap was returning the OLDEST rows instead of the newest, silently breaking fresh-tail protection in `assemble` for sessions > cap. Now reversed. (2) `loadFreshRawMemories` was tenant-wide only; multi-session tenants surfaced cross-session rows as `isFreshTail`. Now accepts `sessionId`; `RecallOpts.freshTailSessionId` lets callers scope fresh-tail correctly.
235
-
236
- ### What's new in v1.6.1
237
-
238
- - **Retroactive patch from a senior cross-model review of v1.5.1 + v1.5.2 + v1.6.0.** `assemble` gained a 5000-row cap on session loads (configurable, surfaces `truncated`), `totalRaw` is now post-scope-filter so all-private sessions don't look like missing-session bugs, and `AssembleOpts.scope` reaches parity with `RecallOpts.scope` so authorised callers can assemble a private session by passing scope explicitly.
239
-
240
- ### What's new in v1.6.0
241
-
242
- - **`hippo assemble --session <id>`** + `api.assemble` + MCP `hippo_assemble` + HTTP `GET /v1/sessions/:id/assemble`. Phase 2 of the DAG plan: build a chronologically-ordered context window for a session — fresh-tail raw rows + level-2 summary substitutions for older rows + budget-fit. Adapted from [lossless-claw](https://github.com/Martian-Engineering/lossless-claw) but with bio-aware eviction: when over-budget, Hippo drops the lowest-strength non-fresh-tail item first instead of oldest-first, so high-importance older context survives while low-strength recent rows get evicted.
243
- - Phase 3 (sub-agent expansion, large file externalization) deferred as a non-fit for Hippo's memory-store role; `drillDown` already covers detail recovery.
244
-
245
- ### What's new in v1.5.2
246
-
247
- - **Fresh-tail recall.** New `RecallOpts.freshTailCount` (default 0): when > 0, recall surfaces the last N `kind='raw'` rows tagged with `isFreshTail=true` regardless of whether the query matched them. Useful for "what did I just see in this session" continuity on top of the query path. Dual-membership: when a recent row also hits BM25, the existing result is flagged in place rather than duplicated.
248
-
249
- ### What's new in v1.5.1
250
-
251
- - **`hippo_drill` MCP tool + `GET /v1/recall/drill/:id` HTTP route.** Completes the v1.5.0 drillDown surface — the function was reachable via CLI + JS API; now it's also a first-class MCP tool and a Bearer-auth HTTP endpoint. Same shape, same tenant + scope guards, same 404 semantics on leaves and cross-tenant ids.
252
-
253
- ### What's new in v1.5.0
254
-
255
- - **DAG-aware recall.** When a query's matched leaves overflow the result limit and ≥2 of them share a level-2 parent summary, recall appends the summary so you see a compact pointer to the missing detail instead of silently dropping it. Capped at ceil(limit * 0.3) extras so a runaway DAG can't bloat results. Tenant-scoped, scope-filtered, opt-out via `summarizeOverflow: false`.
256
- - **`hippo drill <summary-id>`.** Companion command. Walks one step down the DAG from a level-2 summary to its direct children. `--limit N` and `--budget N` options for budgeted recovery. JSON output via `--json`.
257
- - **Schema v25** caches `descendant_count`, `earliest_at`, `latest_at` on summary rows. Idempotent ALTER + backfill on existing v24 DBs; no `min_compatible_binary` bump.
258
- - **Lifted from [lossless-claw](https://github.com/Martian-Engineering/lossless-claw)** (LCM paper, Voltropy / Martian Engineering): depth-stratified summaries + drill-down. Adapted to Hippo's score-ranked recall instead of conversation-order assembly. Phase 2 (context-engine assembler) and Phase 3 (sub-agent expansion) on the roadmap.
259
- - 1256 tests passing (+19 from v1.4.0).
260
-
261
- ### What's new in v1.4.0
262
-
263
- - **First repo-level CI workflow + provenance gate enforced on every PR.** `.github/workflows/ci.yml` runs build + 1237 vitest cases + a CI-only seed that ingests one GitHub webhook + one Slack message through the real connectors then runs `hippo provenance --strict`. Drop a connector's owner stamp and the PR fails. Read-only permissions, 25-minute timeout, uploads `provenance-coverage.json` as a workflow artifact.
264
- - **Slack `bot_message` provenance gap closed.** `slack/transform.ts` shipped `owner: undefined` for userless bot messages, which would have failed any future strict gate. Now derives `owner: bot:<bot_id>` (or `bot:unknown` as a sentinel). Skipping userless messages was rejected during plan review because `slack/ingest.ts:54-65` would have silently dropped existing bot ingestion via the "skipped but seen" path.
265
- - **`SlackMessageEvent.bot_id` added** as an optional field on the public type.
266
- - **Slack provenance parity test** mirrors `tests/github-provenance-parity.test.ts` and covers user, `bot_message`, threaded replies, and `message_changed` edits.
267
-
268
- ### What's new in v1.3.2
269
-
270
- - **Hotfix for v1.3.1.** Codex round 3 + senior code reviewer caught residual bugs in v1.3.1's own fix.
271
- - **Deletion idempotency uses a `deleted:` namespace** so it doesn't collide with the ingest path's key for the same artifact. v1.3.1 made them share a key (the obvious fix), which broke the deletion path: every first-deletion-of-an-ingested-comment short-circuited as `'duplicate'`. v1.3.2 splits the namespaces.
272
- - **DLQ replay routes deleted comments correctly.** v1.3.1 wrote replayed `*.deleted` rows as fresh raw memories; v1.3.2 dispatches to the deletion handler.
273
- - **`compareSemver` is loud on pre-release tags** instead of silently miscomparing `1.3.2-beta` as less than `1.3.2`. Defends the rollback guard.
274
- - **`IngestHook` type cleaned up** — phantom `idempotencyKey` arg removed.
275
-
276
- ### What's new in v1.3.1
277
-
278
- - **Hotfix for v1.3.0.** Retroactive `/codex review` (round 2) + `/review` (senior code reviewer) caught 3 P0s and 6 P1s the plan-only review missed. All addressed.
279
- - **Rollback guard is now actually enforced.** Older binaries opening a v1.3-migrated DB throw on open instead of silently leaking private rows.
280
- - **Multi-row comment deletion is atomic.** Edit histories archive in one transaction; any failure rolls back the whole batch and leaves idempotency unset for retry.
281
- - **Backfill and webhook share an idempotency key.** Same source revision delivered via either path collapses to one memory row. Key derives from `sha256(artifact_ref + ':' + updated_at)` instead of `sha256(eventName + ':' + rawBody)`.
282
- - **DLQ replay actually replays** instead of being a dry-run that printed "replay ok". Plus `GITHUB_WEBHOOK_SECRET_PREVIOUS` support and proper HTTP/MCP version reporting.
283
-
284
- ### What's new in v1.3.0
285
-
286
- - **GitHub connector.** Stream issues, issue comments, PRs, and PR review comments into hippo as `kind='raw'` rows. Webhook route at `POST /v1/connectors/github/events` (HMAC-verified). CLI: `hippo github backfill --repo <owner/name>`, `hippo github dlq list`, `hippo github dlq replay <id>`. Required env: `GITHUB_WEBHOOK_SECRET` (route), `GITHUB_TOKEN` (backfill).
287
- - **Replay-safe idempotency.** Keyed on `sha256(eventName + ':' + rawBody)`, not `X-GitHub-Delivery` (which GitHub does not sign). Attackers cannot bypass dedupe by minting fresh delivery UUIDs.
288
- - **Three-stream backfill.** Issues, issue comments, PR review comments each have their own high-water mark. A crash mid-stream-2 leaves stream-1's HWM intact and stream-2's HWM unchanged so resume is safe and idempotent.
289
- - **PAT and App tenant routing.** `github_installations` for App webhooks; `github_repositories` for PAT-mode multi-tenant. Fail-closed: an unknown installation in a multi-tenant install routes to the DLQ rather than to `HIPPO_TENANT`.
290
- - **Schema v24** with rollback safety. The migration writes `meta.min_compatible_binary='1.2.1'` so older binaries refuse to open the DB and cannot leak private rows.
291
- - 1214 tests passing. Independent codex audit on the plan caught 5 P0 and 8 P1 issues before coding began; all addressed.
292
-
293
- ### What's new in v1.2.1
294
-
295
- - **Source-agnostic default-deny.** The v1.2 filter only blocked `slack:private:*`. v1.2.1 generalizes to ANY `<source>:private:*` scope so the v1.3 GitHub connector (and future Jira/Linear/etc.) cannot leak private rows to no-scope callers. Single source of truth via the new `isPrivateScope` export.
296
- - **Pre-flight for v1.3.** Codex audit on the v1.3 plan flagged this as a P0 to fix BEFORE any GitHub work began, so rollback after v1.3 ships stays safe.
297
- - **No behavior change for existing users.** Public scopes, null scope, and exact-match queries are unchanged. Only no-scope callers facing a private row from any future connector see different behavior (which is the entire point).
298
-
299
- ### What's new in v1.2.0
300
-
301
- - **Continuity exposed everywhere.** MCP `hippo_recall` accepts `include_continuity: true`, HTTP `GET /v1/memories` accepts `?include_continuity=1`. CLI `--continuity` shipped in v1.1.
302
- - **Scope filter, end-to-end.** `api.recall`, `cmdRecall`, MCP `hippo_recall`, MCP `hippo_context`, and HTTP `GET /v1/memories` all enforce the same rule: explicit scope = exact match, no scope = default-deny on `slack:private:*` and quarantined legacy rows.
303
- - **Schema v23.** `task_snapshots` gains `scope`. Pre-existing rows with NULL scope are quarantined as `'unknown:legacy'` so they default-deny.
304
- - **Closes v1.0.0 + v1.1.0 known limitations.** Continuity tables no longer carry NULL scope as a load-bearing data state. The "Deferred to v1.2.0" line is gone.
305
- - **Security note.** Codex review caught two real issues that this release fixes: (1) v1.1's "explicit scope" actually meant "allow all" (latent leak), now exact-match; (2) `loadLatestHandoff` SELECTs were missing scope, would have leaked private handoffs to no-scope callers post-writers.
306
-
307
- ### What's new in v1.1.0
308
-
309
- - **Continuity-first recall.** `api.recall` accepts `includeContinuity: true` to return the active task snapshot, latest matching session handoff, and last 5 session events alongside the ranked memories. One call, agent boot ready. CLI: `hippo recall <query> --continuity`.
310
- - **Anchored, no resurrection.** Continuity is anchored to the active snapshot's session_id. No anchor = no handoff/events. The explicit handoff-without-snapshot path is still `hippo session resume`.
311
- - **Hot path unchanged.** Default-off everywhere. Existing recall callers see no behavior change.
312
- - **Deferred to v1.2.0.** MCP `hippo_recall` continuity and HTTP `GET /v1/memories?include_continuity=true` ship together with the `scope` read-side filter on continuity tables.
313
-
314
- ### What's new in v1.0.0
315
-
316
- - **Tenant-isolation security release.** v0.40.0's measurement gates surfaced a real cross-tenant data leak on the continuity tables (`task_snapshots`, `session_events`, `session_handoffs`). Schema migration v22 closes the gap: every continuity helper now scopes reads and writes by `tenantId`. Markdown mirror files (`buffer/active-task.md`, `buffer/recent-session.md`) are tenant-scoped too; the default tenant keeps the unsuffixed filename for on-disk back-compat.
317
- - **Slack envelope fix.** `messageToRememberOpts` now sets `owner: 'user:<slack_user_id>'` so ingested Slack rows pass the v0.40.0 `hippo provenance --strict` gate.
318
- - **Breaking change for JS callers.** 10 store helpers (`saveActiveTaskSnapshot`, `loadLatestHandoff`, `appendSessionEvent`, etc.) gained a required `tenantId` second argument. TypeScript callers get a compile error. JS callers get a runtime guard error (`assertTenantId`) that detects the most common misbinding (passing a `sess-*` session id) and points at the migration. See CHANGELOG for the full helper list.
319
- - **Schema v22 migration.** Idempotent, transactional, with smart tenant backfill via unambiguous `task_snapshots.session_id` joins.
320
-
321
- ### What's new in v0.40.0
322
-
323
- - **Company Brain measurement gates.** Two new diagnostic commands close the last blocked rows of the Company Brain scorecard. `hippo provenance [--json] [--strict]` audits every `kind='raw'` row for `owner` + `artifact_ref`; `--strict` exits non-zero so CI can block on coverage regressions. `hippo correction-latency [--json]` reports p50 / p95 / max wall-clock lag from receipt to supersession across `superseded_by` chains. Both helpers (`buildProvenanceCoverage`, `buildCorrectionLatency`) are importable from `src/`.
324
- - **No behavioral change to remember / recall.** Additive only: schema unchanged, retrieval untouched.
325
-
326
- ### What's new in v0.39.0
327
-
328
- - Security hardening release: 5 CRITICAL cross-tenant fixes (CVE candidates), GDPR Path A on archive (true RTBF), MCP per-client isolation, Slack ingestion race + idempotency hardening, auth timing leak reduction.
329
-
330
- ### What's new in v0.38.0
331
-
332
- - **B3 dlPFC persistent goal stack (depth 3).** Schema v18 adds `goal_stack`, `retrieval_policy`, `goal_recall_log`. New CLI subcommands: `hippo goal push|list|complete|suspend|resume`. With `HIPPO_SESSION_ID` set, `hippo recall` auto-boosts memories tagged with the active goal (final multiplier hard-capped at 3.0x). Retrieval policies (`error-prioritized`, `schema-fit-biased`, `recency-first`, `hybrid`) further shape ranking.
333
- - **Outcome propagation with lifespan window.** `hippo goal complete --outcome <score>` adjusts strength only on memories actually recalled while the goal was alive. `outcome >= 0.7` boosts (×1.10), `outcome < 0.3` decays (×0.85), neutral band leaves strength alone. UNIQUE(memory_id, goal_id) prevents double-propagation.
334
- - **B3 cluster-discrimination micro-benchmark.** `benchmarks/micro/fixtures/dlpfc_depth.json` — 3 disjoint memory clusters under 3 named goals. Each query asserts the active goal's cluster is in top-3 AND the other two clusters are NOT, a deterministic test BM25 alone cannot pass. Receipt: 3/3 queries pass in [`benchmarks/micro/results/b3-depth.json`](benchmarks/micro/results/b3-depth.json).
335
- - **Deferred to v0.39:** sequential-learning trap-rate lift (needs adapter contract change), MCP/REST `session_id` plumbing, vlPFC interference handling, `--no-propagate` flag.
336
-
337
- ### What's new in v0.37.0
338
-
339
- - **Slack ingestion (E1.3).** First end-to-end ingestion connector. `POST /v1/connectors/slack/events` accepts HMAC-signed Events API webhooks; messages land as `kind='raw'` memories with `slack://team/channel/ts` provenance and a `slack:public:*` or `slack:private:*` scope. Source deletions route through `archiveRawMemory` (GDPR). Backfill via `hippo slack backfill --channel <id>`; malformed events to `hippo slack dlq list`.
340
- - **Schema v17.** New tables: `slack_event_log` (idempotency), `slack_cursors` (backfill resume), `slack_dlq` (parse failures), `slack_workspaces` (team_id to tenant_id routing).
341
- - **`PUBLIC_ROUTES` allow-list + `HIPPO_REQUIRE_AUTH` knob.** The Slack webhook is the first explicit public `/v1/*` route (HMAC-signed, no Bearer). Every other `/v1/*` route returns 401 without auth when `HIPPO_REQUIRE_AUTH=1`.
342
- - **Recall default-deny on private scopes.** No-scope queries cannot see `slack:private:*` memories. Frontend callers passing undefined scope no longer leak private content.
343
- - **`api.remember.afterWrite` hook.** Connectors stamp idempotency rows atomically with the memory row via a SAVEPOINT-scoped callback.
344
-
345
- For everything since v0.8.0, see [CHANGELOG.md](./CHANGELOG.md).
92
+ Full release history: **[CHANGELOG.md](./CHANGELOG.md)** · [GitHub Releases](https://github.com/kitfunso/hippo-memory/releases)
346
93
 
347
94
 
348
95
  ### Zero-config agent integration
@@ -502,34 +249,45 @@ hippo recall "data pipeline" --why --limit 5
502
249
 
503
250
  Input enters the buffer. Important things get encoded into episodic memory. During "sleep," repeated episodes compress into semantic patterns. Weak memories decay and disappear.
504
251
 
505
- ```
506
- New information
507
- |
508
- v
509
- +-----------+
510
- | Buffer | Working memory. Current session only. No decay.
511
- | (session) |
512
- +-----+-----+
513
- | encoded (tags, strength, half-life assigned)
514
- v
515
- +-----------+
516
- | Episodic | Timestamped memories. Decay by default.
517
- | Store | Retrieval strengthens. Errors stick longer.
518
- +-----+-----+
519
- | consolidation (hippo sleep)
520
- v
521
- +-----------+
522
- | Semantic | Compressed patterns. Stable. Schema-aware.
523
- | Store | Extracted from repeated episodes.
524
- +-----------+
525
-
526
- hippo sleep: decay + replay + merge
252
+ ```mermaid
253
+ flowchart TD
254
+ I[New information] --> B[Buffer<br/>session-only, no decay]
255
+ B -->|encode: tags, strength, half-life| E[Episodic Store<br/>timestamped, decay by default<br/>retrieval strengthens, errors stick]
256
+ E -->|hippo sleep<br/>replay + merge| S[Semantic Store<br/>compressed patterns, stable<br/>schema-aware]
257
+ E -.->|decay| X[forgotten]
258
+ S -.->|recall| E
259
+ classDef bio fill:#fff4dc,stroke:#a8742d,color:#2b1b00
260
+ classDef forgotten fill:#f5f5f5,stroke:#999,color:#666,stroke-dasharray:5 5
261
+ class B,E,S bio
262
+ class X forgotten
527
263
  ```
528
264
 
529
265
  ---
530
266
 
531
267
  ## Key Features
532
268
 
269
+ A memory's life across a typical session, before walking each feature in turn:
270
+
271
+ ```mermaid
272
+ sequenceDiagram
273
+ autonumber
274
+ actor Agent
275
+ participant B as Buffer
276
+ participant E as Episodic
277
+ participant S as Semantic
278
+ Agent->>B: hippo remember "cache dropped tips_10y" --error
279
+ B->>E: encode (half_life=14d, valence=neg)
280
+ Note over E: strength=1.0
281
+ Agent->>E: hippo recall "data pipeline"
282
+ E-->>Agent: returns memory (rank 1)
283
+ Note over E: half_life 14d → 16d, retrieval_count++
284
+ Agent->>E: hippo outcome --good
285
+ Note over E: reward_factor 1.0 → 1.15
286
+ Agent->>S: hippo sleep
287
+ S->>E: merge 3 related episodic → 1 semantic
288
+ Note over E,S: original episodic decays, pattern survives
289
+ ```
290
+
533
291
  ### Decay by default
534
292
 
535
293
  Every memory has a half-life. 7 days by default. Persistence is earned.
@@ -868,7 +626,7 @@ hippo watch "npm run build"
868
626
  | Codex | `AGENTS.md` or `.codex` | `AGENTS.md` + automatic in-place Codex launcher wrapper |
869
627
  | Cursor | `.cursorrules` or `.cursor/rules` | `.cursorrules` |
870
628
  | OpenClaw | `.openclaw` or `AGENTS.md` | native OpenClaw plugin or `AGENTS.md` |
871
- | OpenCode | `.opencode/` or `opencode.json` | `AGENTS.md` |
629
+ | OpenCode | `.opencode/` or `opencode.json` | `AGENTS.md` + TS plugin at `~/.config/opencode/plugins/hippo.ts` (subscribes to `session.idle` + `session.created`) |
872
630
 
873
631
  No extra commands needed. Just `hippo init` and your agent knows about Hippo.
874
632
 
@@ -881,7 +639,7 @@ hippo hook install claude-code # patches CLAUDE.md + adds SessionStart/Session
881
639
  hippo hook install codex # optional repair/manual run: patches AGENTS.md + wraps the detected Codex launcher
882
640
  hippo hook install cursor # patches .cursorrules
883
641
  hippo hook install openclaw # patches AGENTS.md
884
- hippo hook install opencode # patches AGENTS.md
642
+ hippo hook install opencode # patches AGENTS.md + installs the opencode TS plugin
885
643
  ```
886
644
 
887
645
  This adds a `<!-- hippo:start -->` ... `<!-- hippo:end -->` block that tells the agent to:
@@ -988,29 +746,36 @@ For how these mechanisms connect to LLM training, continual learning, and open r
988
746
 
989
747
  ## Comparison
990
748
 
991
- | Feature | Hippo | MemPalace | Mem0 | Basic Memory |
992
- |---------|-------|-----------|------|-------------|
993
- | Decay by default | Yes | No | No | No |
994
- | Retrieval strengthening | Yes | No | No | No |
995
- | Reward-proportional decay | Yes | No | No | No |
996
- | Hybrid search (BM25 + embeddings) | Yes | Embeddings + spatial | Embeddings only | No |
997
- | Schema acceleration | Yes | No | No | No |
998
- | Conflict detection + resolution | Yes | No | No | No |
999
- | Multi-agent shared memory | Yes | No | No | No |
1000
- | Transfer scoring | Yes | No | No | No |
1001
- | Outcome tracking | Yes | No | No | No |
1002
- | Confidence tiers | Yes | No | No | No |
1003
- | Spatial organization | No | Yes (wings/halls/rooms) | No | No |
1004
- | Lossless compression | No | Yes (AAAK, 30x) | No | No |
1005
- | Cross-tool import | Yes | No | No | No |
1006
- | Auto-hook install | Yes | No | No | No |
1007
- | MCP server | Yes | Yes | No | No |
1008
- | Zero dependencies | Yes | No (ChromaDB) | No | No |
1009
- | LongMemEval R@5 (retrieval) | 73.8% (hybrid, v0.28) | 96.6% (raw) / 100% (reranked) | ~49-85% | N/A |
1010
- | Git-friendly | Yes | No | No | Yes |
1011
- | Framework agnostic | Yes | Yes | Partial | Yes |
1012
-
1013
- Different tools answer different questions. Mem0 and Basic Memory implement "save everything, search later." MemPalace implements "store everything, organize spatially for retrieval." Hippo implements "forget by default, earn persistence through use." These are complementary approaches: MemPalace's retrieval precision + Hippo's lifecycle management would be stronger than either alone.
749
+ The AI-memory category matured fast in 2026. Hippo's specific take bio-decay, strengthen-on-use, outcome-weighted half-lives — is one stance among several. The table below is a feature snapshot, not a verdict: graph-first systems ([gbrain](https://hermesatlas.com/projects/garrytan/gbrain), [Zep](https://www.getzep.com/), [Cognee](https://www.cognee.ai/)), agent-managed systems ([Letta](https://github.com/letta-ai/letta)), and version-control / skill-distillation takes ([Memoria](https://github.com/matrixorigin/Memoria), [EverMind](https://evermind.ai/)) all solve adjacent problems with different mechanics.
750
+
751
+ | Feature | Hippo | [MemPalace](https://github.com/milla-jovovich/mempalace) | [Mem0](https://github.com/mem0ai/mem0) | [Basic Memory](https://github.com/basicmachines-co/basic-memory) | [gbrain](https://hermesatlas.com/projects/garrytan/gbrain) | [Zep](https://www.getzep.com/) | [Letta](https://github.com/letta-ai/letta) | [Cognee](https://www.cognee.ai/) | [Memoria](https://github.com/matrixorigin/Memoria) | [EverMind](https://evermind.ai/) |
752
+ |---------|-------|-----------|------|-------------|--------|-----|-------|--------|---------|----------|
753
+ | Decay by default | Yes | No | No | No | No | No | No | No | No | No |
754
+ | Retrieval strengthening | Yes | No | No | No | No | No | No | Partial (recall tuning) | No | Partial (Skill Memory distills patterns) |
755
+ | Reward-proportional decay | Yes | No | No | No | No | No | No | No | No | No |
756
+ | Hybrid search (BM25 + embeddings) | Yes | Embeddings + spatial | Embeddings only | No | Yes (vec + rerank + graph) | Yes (graph + vec) | ? | Yes (GraphRAG) | Yes (vector + full-text) | Yes (mRAG, multi-modal) |
757
+ | Schema acceleration / knowledge graph | Yes (schema) | No | No | No | Yes (typed KG, self-wiring) | Yes (temporal KG) | No | Yes (auto-ontologies) | No (typed claims) | Yes (hierarchical: user/group/agent) |
758
+ | Conflict detection + resolution | Yes | No | No | No | Yes (eval-surfaced) | Yes (auto-invalidate stale facts) | No | No | Yes (auto-detect + quarantine) | Partial (temporal tracking) |
759
+ | Multi-agent shared memory | Yes | No | No | No | Yes (brain repo, team mounts) | Yes | No (single-agent state) | Yes | Yes (branch/merge across sessions) | Yes (multi-agent coordination) |
760
+ | Transfer scoring | Yes | No | No | No | No | No | No | No | No | No |
761
+ | Outcome tracking | Yes | No | No | No | No | No | No | No | No | Partial (Cases: agent trajectories) |
762
+ | Confidence tiers | Yes | No | No | No | No (typed facts) | No | No | No | No | No |
763
+ | Spatial organization | No | Yes (wings/halls/rooms) | No | No | No | No | No | No | No | No |
764
+ | Lossless compression | No | Yes (AAAK, 30x) | No | No | No | No | No | No | No | No |
765
+ | Cross-tool import (ChatGPT/Claude/Cursor) | Yes | No | No | No | Partial (data sources) | ? | No | Partial (28 data sources) | No (Git ops) | Partial (mRAG: PDFs/images/URLs) |
766
+ | Auto-hook install | Yes | No | No | No | No | No | No | No | No | No |
767
+ | MCP server | Yes | Yes | No | No | Yes (stdio + HTTP/OAuth) | Partial (managed) | Yes (via Letta Code) | Yes (first-party Claude/LangGraph) | Yes | ? |
768
+ | Zero runtime deps | Yes | No (ChromaDB) | No | No | No (PGLite or PG+pgvector) | No (managed service) | No (Python deps) | No (Python deps) | Yes (single Rust binary) | No (managed + OSS) |
769
+ | LongMemEval (best published) | 86.8% R@5 (F13+F9, oracle\*) | 96.6% raw / 100% reranked R@5 | ~49-85% R@5 | N/A | 97.6-97.9% R@5 (s_cleaned\*) | N/A (LoCoMo 80.3%) | N/A | N/A | 88.78% overall accuracy w/ reader\*\* | 83.00% overall\*\* (LoCoMo 93.05%, HaluMem 93.04%) |
770
+ | Git-friendly | Yes | No | No | Yes | Yes | No | No | No | Yes (Git is the model) | ? |
771
+ | Framework agnostic | Yes | Yes | Partial | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
772
+ | License | MIT | (open) | Apache-2.0 | (open) | MIT | Apache-2.0 (community) | Apache-2.0 | MIT (core) | Apache-2.0 | Apache-2.0 (OSS) + cloud |
773
+
774
+ \* Split-mismatched: Hippo's 86.8% is on `longmemeval_oracle` (3 sessions per haystack); gbrain's 97.6% is on `longmemeval_s_cleaned` (~40 sessions per haystack). Different splits, different difficulty. Not directly comparable.
775
+
776
+ \*\* Different metric: Memoria's 88.78% and EverMind's 83% are reported as overall accuracy with a reader LLM, not retrieval R@5. Higher denominator + LLM helps. Not directly comparable to retrieval-only R@5 numbers above.
777
+
778
+ Different tools answer different questions. Mem0 and Basic Memory implement "save everything, search later." MemPalace implements "store everything, organize spatially for retrieval." gbrain, Zep, and Cognee implement "extract typed entities and relationships into a knowledge graph." Letta implements "the agent edits its own memory blocks." Memoria implements "Git-style version control over the memory state itself." EverMind implements "self-evolving Skill Memory + multi-modal retrieval over hierarchical scopes." Hippo implements "forget by default, earn persistence through use." These are complementary takes, not a single-axis ranking: bio-lifecycle (Hippo) + GraphRAG (gbrain/Cognee/Zep) + agent-self-edit (Letta) + memory-VCS (Memoria) + skill-distillation (EverMind) cover different parts of the same problem.
1014
779
 
1015
780
  ---
1016
781
 
package/dist/cli.js CHANGED
@@ -30,7 +30,7 @@ import * as fs from 'fs';
30
30
  import * as os from 'os';
31
31
  import { fileURLToPath } from 'node:url';
32
32
  import { execFileSync, execSync, spawn } from 'child_process';
33
- import { installJsonHooks, uninstallJsonHooks, resolveJsonHookPaths, detectInstalledTools, defaultSleepLogPath, ensureCodexWrapperInstalled, installCodexWrapper, uninstallCodexWrapper, resolveCodexSessionTranscript, resolveCodexWrapperPaths, } from './hooks.js';
33
+ import { installJsonHooks, uninstallJsonHooks, resolveJsonHookPaths, detectInstalledTools, defaultSleepLogPath, ensureCodexWrapperInstalled, installCodexWrapper, uninstallCodexWrapper, resolveCodexSessionTranscript, resolveCodexWrapperPaths, installOpencodePlugin, uninstallOpencodePlugin, resolveOpencodePluginPath, } from './hooks.js';
34
34
  import { createMemory, calculateStrength, calculateRewardFactor, deriveHalfLife, resolveConfidence, applyOutcome, computeSchemaFit, Layer, DECISION_HALF_LIFE_DAYS, } from './memory.js';
35
35
  import { getHippoRoot, isInitialized, initStore, writeEntry, readEntry, deleteEntry, loadAllEntries, loadSearchEntries, loadIndex, saveIndex, loadStats, updateStats, saveActiveTaskSnapshot, loadActiveTaskSnapshot, clearActiveTaskSnapshot, appendSessionEvent, listSessionEvents, listMemoryConflicts, resolveConflict, saveSessionHandoff, loadLatestHandoff, loadHandoffById, RECALL_DEFAULT_DENY_SCOPES, } from './store.js';
36
36
  import { markRetrieved, estimateTokens, hybridSearch, physicsSearch, explainMatch, textOverlap } from './search.js';
@@ -391,10 +391,10 @@ function autoInstallHooks(quiet) {
391
391
  installed.add(targetPath);
392
392
  console.log(` Auto-installed ${hook} hook in ${hookDef.file}`);
393
393
  }
394
- // For JSON-hook tools, also install SessionEnd+SessionStart entries.
395
- // Keeps `hippo init` in lockstep with `hippo hook install <target>` and
396
- // `hippo setup`, which both cover claude-code + opencode now.
397
- if (hook === 'claude-code' || hook === 'opencode') {
394
+ // For Claude Code, also install SessionEnd+SessionStart entries in its
395
+ // settings.json. Keeps `hippo init` in lockstep with `hippo hook install
396
+ // claude-code` and `hippo setup`.
397
+ if (hook === 'claude-code') {
398
398
  const result = installJsonHooks(hook);
399
399
  if (result.installedSessionEnd) {
400
400
  console.log(` Auto-installed hippo session-end SessionEnd hook in ${hook} settings`);
@@ -415,6 +415,21 @@ function autoInstallHooks(quiet) {
415
415
  console.log(` Migrated legacy SessionEnd entry to the new detached form`);
416
416
  }
417
417
  }
418
+ else if (hook === 'opencode') {
419
+ // opencode uses a TS plugin, not Claude Code's JSON-hook schema.
420
+ // See OPENCODE_PLUGIN_SOURCE in src/hooks.ts for the plugin file
421
+ // content + design rationale.
422
+ const result = installOpencodePlugin();
423
+ if (result.installed) {
424
+ console.log(` Auto-installed hippo opencode plugin -> ${result.pluginPath}`);
425
+ }
426
+ if (result.migratedLegacyHooks) {
427
+ console.log(` Removed legacy Claude Code-style hooks block from opencode.json — opencode can now launch`);
428
+ }
429
+ if (result.jsonRepairFailed) {
430
+ console.log(` WARNING: opencode.json is unparseable; legacy hooks block could not be auto-removed. Fix the file manually.`);
431
+ }
432
+ }
418
433
  }
419
434
  }
420
435
  /**
@@ -3780,9 +3795,9 @@ function cmdHook(args, flags) {
3780
3795
  console.log(`${hook.file} not found in ${process.cwd()} — skipping agent-instructions patch.`);
3781
3796
  console.log(` Create ${hook.file} and re-run \`hippo hook install ${target}\` if you want the agent prompt.`);
3782
3797
  }
3783
- // For tools with JSON hook systems, also install SessionEnd+SessionStart
3784
- // entries in their settings file. Currently: claude-code + opencode.
3785
- if (target === 'claude-code' || target === 'opencode') {
3798
+ // For Claude Code, also install SessionEnd+SessionStart entries in its
3799
+ // settings file.
3800
+ if (target === 'claude-code') {
3786
3801
  const result = installJsonHooks(target);
3787
3802
  if (result.installedSessionEnd) {
3788
3803
  console.log(`Installed hippo session-end SessionEnd hook in ${result.target} settings`);
@@ -3803,6 +3818,22 @@ function cmdHook(args, flags) {
3803
3818
  console.log(`Migrated legacy SessionEnd entry to the new detached form`);
3804
3819
  }
3805
3820
  }
3821
+ else if (target === 'opencode') {
3822
+ // opencode uses a TS plugin, not JSON hooks. See src/hooks.ts.
3823
+ const result = installOpencodePlugin();
3824
+ if (result.installed) {
3825
+ console.log(`Installed hippo opencode plugin at ${result.pluginPath}`);
3826
+ }
3827
+ else {
3828
+ console.log(`opencode plugin already up to date at ${result.pluginPath}`);
3829
+ }
3830
+ if (result.migratedLegacyHooks) {
3831
+ console.log(`Removed legacy Claude Code-style hooks block from opencode.json — opencode can now launch`);
3832
+ }
3833
+ if (result.jsonRepairFailed) {
3834
+ console.log(`WARNING: opencode.json is unparseable; legacy hooks block could not be auto-removed. Fix the file manually.`);
3835
+ }
3836
+ }
3806
3837
  else if (target === 'codex') {
3807
3838
  const result = installCodexWrapper();
3808
3839
  console.log(`Installed Codex session-end integration -> ${result.metadataPath}`);
@@ -3832,12 +3863,20 @@ function cmdHook(args, flags) {
3832
3863
  else {
3833
3864
  console.log(`${hook.file} not found, skipping agent-instructions uninstall.`);
3834
3865
  }
3835
- // For JSON-hook tools, also strip their SessionEnd/SessionStart entries.
3836
- if (target === 'claude-code' || target === 'opencode') {
3866
+ // For Claude Code, also strip its SessionEnd/SessionStart entries.
3867
+ if (target === 'claude-code') {
3837
3868
  if (uninstallJsonHooks(target)) {
3838
3869
  console.log(`Removed hippo hooks from ${target} settings`);
3839
3870
  }
3840
3871
  }
3872
+ else if (target === 'opencode') {
3873
+ // opencode uses a TS plugin; uninstall removes the plugin file AND
3874
+ // also runs the legacy-hooks migration so the downgrade/remove path
3875
+ // leaves opencode launchable.
3876
+ if (uninstallOpencodePlugin()) {
3877
+ console.log(`Removed hippo opencode plugin (and any legacy hooks block from opencode.json)`);
3878
+ }
3879
+ }
3841
3880
  else if (target === 'codex') {
3842
3881
  if (uninstallCodexWrapper()) {
3843
3882
  console.log('Removed Codex wrapper integration');
@@ -3866,7 +3905,7 @@ function cmdSetup(flags) {
3866
3905
  const markdownTools = tools.filter((t) => t.kind === 'markdown-instruction' && t.detected);
3867
3906
  const pluginTools = tools.filter((t) => t.kind === 'plugin' && t.detected);
3868
3907
  if (jsonTools.length === 0 && !forceAll) {
3869
- console.log('No JSON-hook-capable tools detected (checked: claude-code, opencode).');
3908
+ console.log('No JSON-hook-capable tools detected (checked: claude-code).');
3870
3909
  console.log('Run with --all to install hooks anyway.');
3871
3910
  }
3872
3911
  for (const tool of jsonTools) {
@@ -3923,7 +3962,31 @@ function cmdSetup(flags) {
3923
3962
  console.log('');
3924
3963
  console.log('Plugin-based tools (hook API via plugin, not JSON):');
3925
3964
  for (const tool of pluginTools) {
3926
- console.log(` ${tool.name.padEnd(14)} ${tool.notes}`);
3965
+ if (tool.name === 'opencode') {
3966
+ if (dryRun) {
3967
+ console.log(` ${tool.name.padEnd(14)} [dry-run] would install hippo plugin at ${resolveOpencodePluginPath()}`);
3968
+ continue;
3969
+ }
3970
+ const result = installOpencodePlugin();
3971
+ const bits = [];
3972
+ if (result.installed)
3973
+ bits.push('installed plugin');
3974
+ if (result.migratedLegacyHooks)
3975
+ bits.push('migrated legacy hooks block');
3976
+ if (result.jsonRepairFailed)
3977
+ bits.push('WARNING: opencode.json unparseable — manual fix needed');
3978
+ if (bits.length === 0) {
3979
+ console.log(` ${tool.name.padEnd(14)} already configured (${result.pluginPath})`);
3980
+ }
3981
+ else {
3982
+ console.log(` ${tool.name.padEnd(14)} ${bits.join(', ')} -> ${result.pluginPath}`);
3983
+ }
3984
+ }
3985
+ else {
3986
+ // Other plugin tools (openclaw) have their own installer; the notes
3987
+ // line points the user at it.
3988
+ console.log(` ${tool.name.padEnd(14)} ${tool.notes}`);
3989
+ }
3927
3990
  }
3928
3991
  }
3929
3992
  if (markdownTools.length > 0) {