hippo-memory 1.7.7 → 1.7.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -36,7 +36,7 @@ It also fixes the portability problem. Your ChatGPT memories don't travel to Cla
36
36
 
37
37
  Numbers, not adjectives. Every claim links to the benchmark or the test that proves it.
38
38
 
39
- - **78% 14% trap rate.** [Sequential Learning Benchmark](benchmarks/sequential-learning/). 50 tasks, 10 buried traps. Measures whether agents actually learn from past mistakes, not just retrieve text.
39
+ - **Sequential Learning Benchmark.** [benchmarks/sequential-learning/](benchmarks/sequential-learning/). 50 tasks, 10 buried traps. Measures whether agents learn from past mistakes, not just retrieve text. v0.11.0 informal magnitude RETRACTED v1.7.9; mechanism remains shipped. See "What's new in v1.7.9".
40
40
  - **R@5 = 74.0%** on [LongMemEval](benchmarks/longmemeval/). 500-question industry retrieval benchmark, BM25 only, no embeddings.
41
41
  - **10 of 10 incident scenarios beat transcript replay** on a staged Slack corpus ([benchmarks/e1.3/](benchmarks/e1.3/)). Recall surfaces the cause faster than scrolling the last N messages.
42
42
  - **0 outbound HTTP** on the 1000-event ingestion smoke. Proven by a `globalThis.fetch` spy that throws on call, not a hardcoded zero.
@@ -85,14 +85,32 @@ hippo recall "data pipeline issues" --budget 2000
85
85
 
86
86
  ---
87
87
 
88
+ ### What's new in v1.7.9
89
+
90
+ - **−10pp goal-stack lift magnitude RETRACTED.** Three pre-registered workload variants (v1.7.5 full-late SANITY_FAIL, v1.7.6 budget sweep B*=NULL, v1.7.7 `--restrict-late-to 4` SANITY_FAIL) all returned C2 hippo-base late mean = 0.0% across every seed. The 78% → 14% headline does not reproduce on the formal harness. Mechanism (dlPFC goal-stack) remains shipped; **no magnitude is currently claimed.**
91
+ - **Pre-emptive retraction (deliberate departure from v1.7.7 prereg).** The prereg explicitly distinguished SANITY_FAIL (no retraction) from NOT_SUPPORTED (retraction). v1.7.9 deviates on cumulative-evidence grounds; the deviation is declared, not silent. v1.8 still runs as planned; retraction is independent of v1.8 outcome.
92
+ - **`docs/RETRACTION.md`** pinned this release as a magnitude-smuggling guard for v1.8 and beyond.
93
+ - **3 P2 polish items folded in** from the v1.7.8 audit (README/result.md rounding consistency with raw-data disclosure, `pairedPermutationCI` docstring, `BAND_LOW`/`BAND_HIGH` provenance comment). The 4th (Float64Array micro-opt in `analyze-v1.7.7.mjs`) is **deferred to v1.7.10** to keep this release doc-only and audit-clean.
94
+
95
+ ### What's new in v1.7.8
96
+
97
+ - **Audit-fix patch.** Retroactive `/review` on v1.7.5/v1.7.6/v1.7.7 found 9 P0+P1 items (the review chain was partially skipped on those releases). All 9 fixed surgically across 3 atomic commits. No behavior change for end users; integrity fixes for the eval audit trail.
98
+ - **(P0)** Analyzer sanity gate now matches the v1.7.7 pre-reg (N=4 lattice rule: mean ∈ [5%, 50%] AND ≥3 distinct seeds non-zero, not the inherited [4%, 24%] band). v1.7.6 calibration result doc replaces overstated "pre-registration discipline" framing with explicit citation of the plan v2 commit + calibrate.mjs commit as the actual pre-registration anchors.
99
+ - **(P1)** Hippo benchmark adapter instance state hoisted from module-level to per-instance fields (race-condition-free for future parallel benchmarks). `selectBStar` reason string honesty fix. v1.7.7 prereg SUPPORTED template band corrected. ROADMAP-RESEARCH:156 status update on the −10pp claim. Defensive throw in `runOneBudget`. Verdict-precedence and selectBStar defensive tests added.
100
+ - **Tests:** 1480 passing (+4 from v1.7.7), 0 regressions.
101
+
102
+ > Updated v1.7.9: the −10pp magnitude is RETRACTED. See "What's new in v1.7.9" above and `CHANGELOG.md` v1.7.9 entry.
103
+
88
104
  ### What's new in v1.7.7
89
105
 
90
106
  - **`--restrict-late-to <int>` flag** on the sequential-learning runner. Narrows the late-phase metric to the last N trap encounters; early/mid re-split (Option A) so the three slices stay disjoint. Default null preserves chronological-third behavior.
91
- - **C2 sanity preflight at N=4 lattice — FAILED.** 20 seeds at `--restrict-late-to 4`. Late mean = 0.00% across all seeds; floor effect persists at last-4 just as it did at last-7. **C3 (goal-stack ON) was NOT collected** — no goal-stack data leak under SANITY_FAIL. Adapter not starved (early=77%, mid=5%); the workload is structurally easy in late phase regardless of window size.
107
+ - **C2 sanity preflight at N=4 lattice — FAILED.** 20 seeds at `--restrict-late-to 4`. Late mean = 0.00% across all seeds; floor effect persists at last-4 just as it did at last-7. **C3 (goal-stack ON) was NOT collected** — no goal-stack data leak under SANITY_FAIL. Adapter not starved (early=77.3%, mid=4.5%); the workload is structurally easy in late phase regardless of window size.
92
108
  - **Cumulative evidence:** three pre-registered workload variants tested (v1.7.5 full-late, v1.7.6 budget sweep, v1.7.7 window restriction); none discriminating. The −10pp goal-stack lift claim remains untested. Hard-stop retraction fires on NOT_SUPPORTED, not SANITY_FAIL — magnitude is not auto-retracted yet. v1.8 (adversarial trap categories) is the last pre-registered escalation.
93
109
  - **`run.mjs` + `calibrate.mjs` now import-safe.** Stripped leading shebangs that broke vitest's importer; `run.mjs` wraps `main()` in an `invokedAsScript` guard. Latent fix from v1.7.6.
94
110
  - **17 new tests** (11 slice-math + 6 verdict). 1476 total passing.
95
111
 
112
+ > Updated v1.7.9: the −10pp magnitude is RETRACTED. See "What's new in v1.7.9" above and `CHANGELOG.md` v1.7.9 entry.
113
+
96
114
  ### What's new in v1.7.6
97
115
 
98
116
  - **Fresh-tail pinned context injection.** `hippo context --pinned-only --include-recent <n>` now includes the last N writes regardless of pinning, so memories saved mid-session can appear in the next Claude Code `UserPromptSubmit` injection before they are explicitly pinned. New Claude hook installs use `--include-recent 5`; legacy pinned-only hooks are migrated on `hippo hook install`.
@@ -101,12 +119,16 @@ hippo recall "data pipeline issues" --budget 2000
101
119
  - **Bug-fix on `calibrate.mjs` starvation guard.** Read a non-existent JSON field; false-positive `starved=true` on every candidate. Did not affect the verdict (lateMean=0% was load-bearing). Fix: drop the broken extraction.
102
120
  - **Hypothesis still untested.** The −10pp goal-stack lift claim remains unsupported by a discriminating workload. Mechanism still shipped from v1.7.4. Honest reporting: see `docs/evals/2026-05-09-v1.7.6-calibration-result.md`.
103
121
 
122
+ > Updated v1.7.9: the −10pp magnitude is RETRACTED. See "What's new in v1.7.9" above and `CHANGELOG.md` v1.7.9 entry.
123
+
104
124
  ### What's new in v1.7.5
105
125
 
106
126
  - **Sequential-learning benchmark gains `pushGoal`/`completeGoal` hooks** + a multi-seed eval harness with seeded category-to-slot variance, exact paired permutation CI, and `--eval-strict` mode. The dlPFC goal-stack mechanism is now exercisable on the public benchmark.
107
127
  - **Tag-fix on memory store** so the goal-stack boost can actually match. Pre-fix the boost would have matched zero memories.
108
128
  - **Eval ran but stopped per pre-registered sanity gate.** Both hippo-base and hippo+goal-stack hit 0% late-phase trap rate across 20 seeds — floor effect prevents H1/H0 discrimination. The −10pp hypothesis remains untested on a discriminating workload. Mechanism shipped, hypothesis open. Pre-reg + result in `docs/evals/`.
109
129
 
130
+ > Updated v1.7.9: the −10pp magnitude is RETRACTED. See "What's new in v1.7.9" above and `CHANGELOG.md` v1.7.9 entry.
131
+
110
132
  ### What's new in v1.7.4
111
133
 
112
134
  - **Goal-stack boost on MCP + HTTP.** Set `RecallOpts.sessionId` (or HTTP `?session_id=...`, or MCP `hippo_recall { session_id }`) and the dlPFC goal-stack boost — previously CLI-only — applies on MCP and HTTP too. Both `api.recall` (primary BM25 band, before fresh-tail / summary appendix) AND MCP's separate `physicsSearch`/`hybridSearch` path are boosted. New `RecallOpts.goalTag` lets callers opt out per-call.
@@ -987,15 +1009,16 @@ No other public benchmark tests whether memory systems produce learning curves.
987
1009
 
988
1010
  50 tasks, 10 trap categories, each appearing 2-3 times across the sequence.
989
1011
 
990
- **Hippo v0.11.0 results:**
1012
+ > **v0.11.0 informal results — RETRACTED v1.7.9.** The 78% → 14% magnitude does NOT reproduce on the formal sequential-learning benchmark. Three pre-registered workload variants (v1.7.5 full-late, v1.7.6 budget sweep, v1.7.7 `--restrict-late-to 4`) all returned C2 hippo-base late mean = 0.0% across every seed (the workload's late phase saturates structurally). The mechanism (dlPFC goal-stack: `pushGoal`/`completeGoal` hooks, `--use-goal-stack`) is shipped and exercisable. **The magnitude is RETRACTED. The mechanism is shipped; no magnitude is currently claimed.** v1.8.0 (queued) explores adversarial trap categories as mechanism characterisation under the magnitude-smuggling guard in `docs/RETRACTION.md`. Pre-registration trail: `docs/evals/2026-05-07-v1.7.5-goal-stack-eval-prereg.md`, `docs/evals/2026-05-09-v1.7.6-calibration-result.md`, `docs/evals/2026-05-09-v1.7.7-goal-stack-eval-result.md`. CHANGELOG: see v1.7.9 entry.
1013
+
1014
+ <details>
1015
+ <summary>Original v0.11.0 informal numbers (RETRACTED — preserved as audit trail in git, not reproduced here)</summary>
1016
+
1017
+ v0.11.0 reported a single-run informal headline citing late-phase trap-rate decline on the sequential-learning benchmark. The specific numbers are archived at git tag `v0.11.0` and the corresponding `CHANGELOG.md` historical entry. Retained in version control, not reproduced here, since reproduction risks accidental re-citation. See `git show v0.11.0 -- README.md` for the original wording.
991
1018
 
992
- | Condition | Overall | Early | Mid | Late | Learns? |
993
- |-----------|---------|-------|-----|------|---------|
994
- | No memory | 100% | 100% | 100% | 100% | No |
995
- | Static memory | 20% | 33% | 11% | 14% | No |
996
- | Hippo | 40% | 78% | 22% | 14% | Yes |
1019
+ </details>
997
1020
 
998
- The hippo agent's trap-hit rate drops from 78% to 14% as it accumulates error memories with 2x half-life. Static pre-loaded memory helps from the start but doesn't improve. Any memory system can run this benchmark by implementing the [adapter interface](benchmarks/sequential-learning/adapters/interface.mjs).
1021
+ The benchmark, harness, and adapter contract remain shipped. Any memory system can run this benchmark by implementing the [adapter interface](benchmarks/sequential-learning/adapters/interface.mjs).
999
1022
 
1000
1023
  ```bash
1001
1024
  cd benchmarks/sequential-learning
@@ -16,7 +16,7 @@
16
16
  * an ESM `import` can resolve cleanly, and a hardcoded constant survives
17
17
  * any packager that drops .json files.
18
18
  */
19
- export const PACKAGE_VERSION = '1.7.7';
19
+ export const PACKAGE_VERSION = '1.7.9';
20
20
  // Bump on every release alongside the 4 manifests + lockfile.
21
21
  /**
22
22
  * Compare two semver strings. Returns positive if a > b, 0 if equal, negative
package/dist/version.d.ts CHANGED
@@ -16,7 +16,7 @@
16
16
  * an ESM `import` can resolve cleanly, and a hardcoded constant survives
17
17
  * any packager that drops .json files.
18
18
  */
19
- export declare const PACKAGE_VERSION = "1.7.7";
19
+ export declare const PACKAGE_VERSION = "1.7.9";
20
20
  /**
21
21
  * Compare two semver strings. Returns positive if a > b, 0 if equal, negative
22
22
  * if a < b.
package/dist/version.js CHANGED
@@ -16,7 +16,7 @@
16
16
  * an ESM `import` can resolve cleanly, and a hardcoded constant survives
17
17
  * any packager that drops .json files.
18
18
  */
19
- export const PACKAGE_VERSION = '1.7.7';
19
+ export const PACKAGE_VERSION = '1.7.9';
20
20
  // Bump on every release alongside the 4 manifests + lockfile.
21
21
  /**
22
22
  * Compare two semver strings. Returns positive if a > b, 0 if equal, negative
@@ -2,7 +2,7 @@
2
2
  "id": "hippo-memory",
3
3
  "name": "Hippo Memory",
4
4
  "description": "Biologically-inspired memory for AI agents. Decay by default, retrieval strengthening, sleep consolidation.",
5
- "version": "1.7.7",
5
+ "version": "1.7.9",
6
6
 
7
7
  "configSchema": {
8
8
  "type": "object",
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "hippo-memory",
3
- "version": "1.7.7",
3
+ "version": "1.7.9",
4
4
  "description": "Hippo Memory plugin for OpenClaw - biologically-inspired agent memory",
5
5
  "main": "index.ts",
6
6
  "openclaw": {
@@ -2,7 +2,7 @@
2
2
  "id": "hippo-memory",
3
3
  "name": "Hippo Memory",
4
4
  "description": "Biologically-inspired memory for AI agents. Decay by default, retrieval strengthening, sleep consolidation.",
5
- "version": "1.7.7",
5
+ "version": "1.7.9",
6
6
  "configSchema": {
7
7
  "type": "object",
8
8
  "additionalProperties": false,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "hippo-memory",
3
- "version": "1.7.7",
3
+ "version": "1.7.9",
4
4
  "description": "Biologically-inspired memory system for AI agents. Decay by default, strength through use.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",