hippo-memory 1.8.1 → 1.9.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/README.md +27 -0
  2. package/bin/hippo.js +2 -2
  3. package/dist/cli.js +44 -0
  4. package/dist/cli.js.map +1 -1
  5. package/dist/embeddings.d.ts +34 -1
  6. package/dist/embeddings.d.ts.map +1 -1
  7. package/dist/embeddings.js +74 -14
  8. package/dist/embeddings.js.map +1 -1
  9. package/dist/rerankers/cross-encoder.d.ts +18 -0
  10. package/dist/rerankers/cross-encoder.d.ts.map +1 -0
  11. package/dist/rerankers/cross-encoder.js +90 -0
  12. package/dist/rerankers/cross-encoder.js.map +1 -0
  13. package/dist/rerankers/index.d.ts +4 -0
  14. package/dist/rerankers/index.d.ts.map +1 -0
  15. package/dist/rerankers/index.js +16 -0
  16. package/dist/rerankers/index.js.map +1 -0
  17. package/dist/rerankers/llm.d.ts +15 -0
  18. package/dist/rerankers/llm.d.ts.map +1 -0
  19. package/dist/rerankers/llm.js +77 -0
  20. package/dist/rerankers/llm.js.map +1 -0
  21. package/dist/rerankers/types.d.ts +30 -0
  22. package/dist/rerankers/types.d.ts.map +1 -0
  23. package/dist/rerankers/types.js +2 -0
  24. package/dist/rerankers/types.js.map +1 -0
  25. package/dist/search.d.ts +17 -0
  26. package/dist/search.d.ts.map +1 -1
  27. package/dist/search.js +13 -2
  28. package/dist/search.js.map +1 -1
  29. package/dist/src/cli.js +44 -0
  30. package/dist/src/cli.js.map +1 -1
  31. package/dist/src/embeddings.js +74 -14
  32. package/dist/src/embeddings.js.map +1 -1
  33. package/dist/src/rerankers/cross-encoder.js +90 -0
  34. package/dist/src/rerankers/cross-encoder.js.map +1 -0
  35. package/dist/src/rerankers/index.js +16 -0
  36. package/dist/src/rerankers/index.js.map +1 -0
  37. package/dist/src/rerankers/llm.js +77 -0
  38. package/dist/src/rerankers/llm.js.map +1 -0
  39. package/dist/src/rerankers/types.js +2 -0
  40. package/dist/src/rerankers/types.js.map +1 -0
  41. package/dist/src/search.js +13 -2
  42. package/dist/src/search.js.map +1 -1
  43. package/extensions/openclaw-plugin/openclaw.plugin.json +1 -1
  44. package/extensions/openclaw-plugin/package.json +1 -1
  45. package/openclaw.plugin.json +1 -1
  46. package/package.json +2 -1
package/README.md CHANGED
@@ -85,6 +85,33 @@ hippo recall "data pipeline issues" --budget 2000
85
85
 
86
86
  ---
87
87
 
88
+ ### What's new in v1.9.3
89
+
90
+ - **Reranker review-tail patch.** Closes the three follow-ups raised on PR #25: `src/rerankers/llm.ts` now wires `AbortController` + `setTimeout` around the fetch (default 30 s, overridable via `HIPPO_LLM_RERANKER_TIMEOUT_MS`) so recall never hangs on a wedged endpoint; `src/rerankers/cross-encoder.ts` emits a single `console.warn` on first identity-fallback per process so silent fallback no longer masquerades as a working reranker; the orphan `RerankSignals` type (sole consumer retracted in v1.9.1) is removed at both the re-export and the definition.
91
+ - **Version alignment.** `package.json` bumped 1.8.1 → 1.9.3. v1.9.0 / v1.9.1 / v1.9.2 were on-master research milestones never published to npm; v1.9.3 is the first published `1.9.x` release and carries the cumulative scope from F6 (rerankers) through F13 (chunk-per-turn) plus the F10 HARD RETRACTION.
92
+ - **Mechanism cumulative-null status unaffected.** Per `docs/RETRACTION.md:94-113`. No `src/` change in this patch touches the dlPFC goal-stack mechanism. **This release does not re-assert the retracted −10pp magnitude.**
93
+
94
+ ### What's new in v1.9.2
95
+
96
+ - **F13 chunk-per-turn LongMemEval R@5 = 86.8 on oracle (Gate-B PASS).** Plan F13 (`docs/evals/2026-05-12-r5-track6-chunk-per-turn-prereg.md`) addresses the structural pathology that limited every prior LongMemEval track (F8–F12): sessions in `data/longmemeval_oracle.json` are ~14k chars median (~3,500 tokens), but the embedders we can reach (MiniLM, BGE-base, multilingual-e5-large) cap at 512–514 tokens. Every prior track embedded only the first ~2 turns of each 12-turn session and truncated the rest. F13 replaces session-level embedding with turn-level embedding (10,866 turns over the 940 oracle sessions, max-pool by `session_id` at retrieval). Gate-A PASS (10,866 turns, all 940 sessions covered, 768-dim normalized). **Gate-B PASS:** F13 + F9 sub-agent rerank stack R@5 = 86.8 on `data/longmemeval_oracle.json` (threshold ≥ 83.2 = F11+F9 deployable best 78.2 + 5pp; margin 3.6). R@1 = 70.8, R@10 = 90.2, R@20 = 93.4.
97
+ - **Roadmap target met (oracle split).** R@5 ≥ 85% was NON-binding per every prior prereg; observed 86.8 on `data/longmemeval_oracle.json` as of this release. Descriptive characterisation; not a re-assertion of any retracted magnitude.
98
+ - **Split-mismatch with gbrain (unchanged).** `longmemeval_oracle` carries 3 sessions per haystack; gbrain v0.28.8's 97.60 figure is on `longmemeval_s_cleaned` (~40 sessions per haystack) with OpenAI `text-embedding-3-large@1536`. Both HF Hub and OpenAI API are host-blocked from this sandbox (verified 2026-05-12). F13's 86.8 is NOT directly comparable to gbrain's 97.60.
99
+ - **F12 retracted.** Plan F12 (`docs/evals/2026-05-11-r5-track5-e5-large-top100-prereg.md`) vendored `intfloat/multilingual-e5-large` and widened the candidate pool to top-100. Gate-A PASS; Gate-B FAIL with best variant R@5 = 78.8 (threshold 83.2). HARD RETRACTION executed: `hippo_store2/` reverted to BGE-base; the `prefixFor` / `preferredBackend` dispatch helpers stay in `src/embeddings.ts` per the dispatch-shape carve-out (they return the legacy behaviour for non-e5 models).
100
+ - **No `src/` changes in v1.9.2.** F13 is implemented as `benchmarks/longmemeval/chunk_per_turn_{embed,retrieve}.mjs` and reuses F11/F12's existing dispatch helpers. The cumulative-null status of the dlPFC goal-stack mechanism (`docs/RETRACTION.md:94-113`) is unaffected. **This release does not re-assert the retracted −10pp magnitude.**
101
+
102
+ ### What's new in v1.9.1
103
+
104
+ - **F10 features-reranker retraction.** Plan F10 (`docs/plans/2026-05-11-r5-track3-richer-ingest.md`) tested whether populating entry-level signals via 19 Claude-sub-agent invocations would let the features reranker move R@5 above features-default + 5pp on LongMemEval. Observed: features-enriched R@5 = 59.2 vs features-default R@5 = 75.8 (same bge-base embedding model), a 21.6pp shortfall against the binding gate. Per the prereg's HARD RETRACTION clause, `src/rerankers/features.ts` + its test + its micro-fixture + its dispatcher case are removed in v1.9.1. The Track 2 cross-encoder and Track 3 LLM-rerank skeletons are preserved. **This release does not re-assert the retracted −10pp magnitude.** Per `docs/RETRACTION.md`.
105
+ - **F11 embedding upgrade tested and documented (not shipped as default).** Plan F11 (`docs/plans/2026-05-11-r5-track4-embedding-upgrade.md`) swapped `Xenova/all-MiniLM-L6-v2` for `BAAI/bge-base-en-v1.5` (768-dim, CLS pooling). Gate-A PASS; Gate-B FAIL (R@5 = 77.0% vs threshold 81.8%). The `poolingFor` per-model dispatch in `src/embeddings.ts` and the `--model` flag in `scripts/fetch_embedding_model.mjs` ship; MiniLM remains the project default.
106
+ - **Cross-track R@5 status (as of v1.9.1):** F8 hybrid tuning (MiniLM) 76.8, F9 v2 sub-agent LLM rerank (MiniLM) 78.0, F11 bge-base baseline 77.0, F11+F9 stack (BGE-base + sub-agent rerank) 78.2 — cross-track best at v1.9.1 — F10 features-enriched (retracted) 59.2. Roadmap target R@5 ≥ 85% was NOT MET at v1.9.1. NON-binding per each prereg. *(Superseded in v1.9.2 by F13 + F9 stack R@5 = 86.8 on oracle.)*
107
+
108
+ ### What's new in v1.9.0
109
+
110
+ - **F6 reranker hardening shipped.** New `RerankerFn` seam in `hybridSearch` with three reranker tracks: Track 1 features (`MemoryEntry`-level signals, no external deps), Track 2 cross-encoder (MS-MARCO MiniLM via optional `@xenova/transformers`, identity-fallback on load failure), Track 3 LLM (env-gated skeleton against an OpenAI-compatible endpoint). Opt in via `hippo recall --reranker <name>`.
111
+ - **Workload-validity verdicts on the LongMemEval sweep** (`docs/evals/2026-05-10-f6-reranker-result.md`, prereg `docs/evals/2026-05-10-f6-reranker-prereg.md`): Gate-A (firing rate, binding) PASS for the features track, PASS-with-caveat for cross-encoder (500/500 invocations all took the identity-fallback branch — HF model download was blocked in the test environment, so this is NOT a real cross-encoder evaluation). Gate-B (hyperparameter discrimination, binding) FAIL — features_topk{20,50,100} produced byte-identical R@K, so no per-hyperparameter R@5 effect is claimed.
112
+ - **Roadmap R@5 ≥ 85% target NOT met on the workload tested.** Observed R@5 = 75.4% (features, all three top-K settings) and 75.6% (baseline). Per the prereg this is descriptive characterisation, not a binding gate; the mechanism ships, and a real attempt at the target requires either a real cross-encoder evaluation (HF access) or a richer ingest path that populates entry-level reranker signals.
113
+ - **This release does not re-assert the retracted −10pp magnitude.** Per `docs/RETRACTION.md`. The dlPFC goal-stack cumulative-null status (`docs/RETRACTION.md:94-113`) is independent of this release.
114
+
88
115
  ### What's new in v1.8.1
89
116
 
90
117
  - **v1.8 prereg's v1.9 LongMemEval cross-validation pre-commitment RETRACTED.** Outside-voice review on two iterations of the v1.9 plan found six structural barriers (canonical harness bypasses the boost path; ingest tag namespace excludes content-derived stems; pushGoal API field mismatch; depth-cap suspension; trigger AND clause unreachable; workload-validity gate ceremonial). Per Root Cause Over Patches, public retraction over re-architecture. **This release does not re-assert the retracted −10pp magnitude.** Per `docs/RETRACTION.md`.
package/bin/hippo.js CHANGED
@@ -1,2 +1,2 @@
1
- #!/usr/bin/env node
2
- import '../dist/cli.js';
1
+ #!/usr/bin/env node
2
+ import '../dist/cli.js';
package/dist/cli.js CHANGED
@@ -64,6 +64,7 @@ import { runFeatureEval, formatResult, resultToBaseline, detectRegressions } fro
64
64
  import { refineStore } from './refine-llm.js';
65
65
  import { wmPush, wmRead, wmClear, wmFlush } from './working-memory.js';
66
66
  import { multihopSearch } from './multihop.js';
67
+ import { getReranker } from './rerankers/index.js';
67
68
  import { computeSalience } from './salience.js';
68
69
  import { computeAmbientState, renderAmbientSummary } from './ambient.js';
69
70
  import { listDlq, replayDlqEntry } from './connectors/slack/dlq.js';
@@ -701,17 +702,21 @@ async function cmdRecall(hippoRoot, query, flags) {
701
702
  physicsConfig: config.physics,
702
703
  minResults,
703
704
  scope: recallActiveScope,
705
+ includeSuperseded,
706
+ asOf,
704
707
  });
705
708
  }
706
709
  else if (hasGlobal) {
707
710
  // Use searchBothHybrid for merged results with embedding support
708
711
  results = await searchBothHybrid(query, hippoRoot, globalRoot, {
709
712
  budget, mmr: mmrEnabled, mmrLambda, localBump, minResults, scope: recallActiveScope, tenantId,
713
+ includeSuperseded, asOf,
710
714
  });
711
715
  }
712
716
  else {
713
717
  results = await hybridSearch(query, localEntries, {
714
718
  budget, hippoRoot, mmr: mmrEnabled, mmrLambda, minResults, scope: recallActiveScope,
719
+ includeSuperseded, asOf,
715
720
  });
716
721
  }
717
722
  // ACC EVC-adaptive recall (RESEARCH.md §PFC.ACC). When the initial top-K is
@@ -818,6 +823,39 @@ async function cmdRecall(hippoRoot, query, flags) {
818
823
  })
819
824
  .sort((a, b) => b.score - a.score);
820
825
  }
826
+ // F6 reranker pass (docs/plans/2026-05-10-f6-reranker-hardening.md). When
827
+ // --reranker <name> is set, look up the reranker fn from the registry
828
+ // (src/rerankers/index.ts) and apply it to the top-K candidates. The
829
+ // reranker reorders (and may rescale) results; the post-budget set is
830
+ // returned. Default off; opt-in via --reranker <cross-encoder|llm>. The
831
+ // structurally similar --rerank-utility block above is the OFC MVP and is
832
+ // independent — both can run in the same recall, with --rerank-utility
833
+ // applied first. Available rerankers: cross-encoder, llm (see
834
+ // src/rerankers/index.ts). The Track 1 `features` reranker was removed in
835
+ // v1.9.1 per the F10 HARD RETRACTION; it is no longer a valid value.
836
+ const rerankerName = flags['reranker'] !== undefined ? String(flags['reranker']).trim() : '';
837
+ if (rerankerName) {
838
+ const rerankerFn = getReranker(rerankerName);
839
+ if (rerankerFn) {
840
+ const topK = flags['reranker-top-k'] !== undefined
841
+ ? parseInt(String(flags['reranker-top-k']), 10)
842
+ : 50;
843
+ const head = results.slice(0, topK);
844
+ const tail = results.slice(topK);
845
+ const rerankInput = head.map((r, i) => ({ ...r, preRerankRank: i + 1 }));
846
+ const reranked = await rerankerFn(query, rerankInput, { topK });
847
+ // Copy rerankScore into score so downstream blocks (--goal, goal-stack,
848
+ // salience) that sort by `r.score` honor the reranker's order rather
849
+ // than unwinding it. Original score is preserved on rerankScore's
850
+ // input, but downstream sorters key on `score`.
851
+ const withPostRank = reranked.map((r, i) => ({
852
+ ...r,
853
+ score: r.rerankScore,
854
+ postRerankRank: i + 1,
855
+ }));
856
+ results = [...withPostRank, ...tail];
857
+ }
858
+ }
821
859
  // dlPFC goal-conditioned recall MVP (RESEARCH.md §PFC.dlPFC). When --goal
822
860
  // <tag> is set, memories whose `tags` array contains the goal tag receive
823
861
  // a 1.5x score boost and results are re-sorted. The full dlPFC spec
@@ -4649,6 +4687,12 @@ Commands:
4649
4687
  = score * (0.5 + 0.5 * strength) * (1 - cost_factor)
4650
4688
  where cost_factor = min(0.3, tokens / 10000). Re-sorts
4651
4689
  results by utility. Default off. RESEARCH.md §PFC.OFC.
4690
+ --reranker <name> Apply a reranker pass after retrieval
4691
+ (cross-encoder|llm). Looks up the named
4692
+ reranker from src/rerankers/index.ts and re-orders
4693
+ the top-K candidates. Default unset (no reranker).
4694
+ See docs/plans/2026-05-10-f6-reranker-hardening.md.
4695
+ --reranker-top-k <n> Cap candidates passed to the reranker (default 50).
4652
4696
  --goal <tag> dlPFC goal-conditioned recall: memories tagged with
4653
4697
  the goal tag get a 1.5x score boost and results are
4654
4698
  re-sorted. Default off. RESEARCH.md §PFC.dlPFC.