hippo-memory 0.28.0 → 0.29.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +38 -22
- package/dist/cli.js +98 -13
- package/dist/cli.js.map +1 -1
- package/dist/config.d.ts +12 -0
- package/dist/config.d.ts.map +1 -1
- package/dist/config.js +9 -0
- package/dist/config.js.map +1 -1
- package/dist/consolidate.d.ts +1 -0
- package/dist/consolidate.d.ts.map +1 -1
- package/dist/consolidate.js +38 -1
- package/dist/consolidate.js.map +1 -1
- package/dist/hooks.d.ts +1 -0
- package/dist/hooks.d.ts.map +1 -1
- package/dist/hooks.js +24 -0
- package/dist/hooks.js.map +1 -1
- package/dist/replay.d.ts +41 -0
- package/dist/replay.d.ts.map +1 -0
- package/dist/replay.js +117 -0
- package/dist/replay.js.map +1 -0
- package/extensions/openclaw-plugin/openclaw.plugin.json +1 -1
- package/extensions/openclaw-plugin/package.json +1 -1
- package/openclaw.plugin.json +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -60,6 +60,17 @@ hippo recall "data pipeline issues" --budget 2000
|
|
|
60
60
|
|
|
61
61
|
---
|
|
62
62
|
|
|
63
|
+
### What's new in v0.29.1
|
|
64
|
+
|
|
65
|
+
- **Raise default `pinnedInject.budget` to 1500.** Smoke-testing on a real 10-pinned-memory store showed 500 tokens truncated new invariants off the bottom. 1500 matches `defaultContextBudget` and fits typical mature installs. Explicit `.hippo/config.json` overrides are untouched; only the default changes.
|
|
66
|
+
|
|
67
|
+
### What's new in v0.29.0
|
|
68
|
+
|
|
69
|
+
- **Mid-session pinned re-injection (Claude Code).** Pinned memories now re-enter context every turn via a new `UserPromptSubmit` hook — not just at SessionStart — so invariants survive long sessions where Opus 4.7 might otherwise forget them. `hippo context --pinned-only --format additional-context` is the command the hook runs; it's read-only so retrieval_count doesn't inflate. Existing users must re-run `hippo hook install claude-code` to pick it up. Opt out with `{"pinnedInject":{"enabled":false}}` in `.hippo/config.json`.
|
|
70
|
+
- **Replay consolidation pass.** `hippo sleep` now rehearses 5 high-value memories per cycle (weighted by outcome feedback, emotional valence, under-rehearsal, idle time, strength). Closes the "replay" gap in the 7 hippocampal mechanisms. Non-destructive; opt out with `{"replay":{"count":0}}`.
|
|
71
|
+
- **Model profile benchmark (null result).** New reusable eval harness at `evals/model-profile-bench.json` + `scripts/run-model-profile-bench.mjs` measures invariant honor, hallucination guard, noise rejection, and contradiction rejection. 4.6 and 4.7 both score 100% with hippo context injection — no per-model profile tuning needed. See `docs/plans/2026-04-21-phase-a-decision.md`.
|
|
72
|
+
- **Physics soak test harness.** `scripts/soak-test.mjs` + 10 synthetic workload profiles. All 10 bounded at 100-tick smoke scale; grant-scale 100hr runs are separate follow-up work.
|
|
73
|
+
|
|
63
74
|
### What's new in v0.28.0
|
|
64
75
|
|
|
65
76
|
- **Budget saturation fix.** Large memories (14k+ chars) no longer starve retrieval. New `minResults` option guarantees at least N results regardless of token budget. `hippo recall <q> --min-results 5`.
|
|
@@ -748,7 +759,7 @@ No extra commands needed. Just `hippo init` and your agent knows about Hippo.
|
|
|
748
759
|
If you prefer explicit control:
|
|
749
760
|
|
|
750
761
|
```bash
|
|
751
|
-
hippo hook install claude-code # patches CLAUDE.md + adds SessionStart/SessionEnd hooks
|
|
762
|
+
hippo hook install claude-code # patches CLAUDE.md + adds SessionStart/SessionEnd + UserPromptSubmit hooks
|
|
752
763
|
hippo hook install codex # optional repair/manual run: patches AGENTS.md + wraps the detected Codex launcher
|
|
753
764
|
hippo hook install cursor # patches .cursorrules
|
|
754
765
|
hippo hook install openclaw # patches AGENTS.md
|
|
@@ -760,7 +771,10 @@ This adds a `<!-- hippo:start -->` ... `<!-- hippo:end -->` block that tells the
|
|
|
760
771
|
2. Run `hippo remember "<lesson>" --error` on errors
|
|
761
772
|
3. Run `hippo outcome --good` on completion
|
|
762
773
|
|
|
763
|
-
For Claude Code, it also adds
|
|
774
|
+
For Claude Code, it also adds:
|
|
775
|
+
- a `SessionEnd` hook so `hippo sleep` runs automatically when the session exits
|
|
776
|
+
- a `SessionStart` hook that prints the previous session's consolidation output
|
|
777
|
+
- a `UserPromptSubmit` hook that re-injects pinned memories (`hippo remember <text> --pin`) into every turn's context — so invariants survive long sessions where Opus 4.7 might otherwise "forget" them. Budget: 500 tokens per turn, skipped entirely when no pinned memories exist. Opt out with `{"pinnedInject":{"enabled":false}}` in `.hippo/config.json`.
|
|
764
778
|
|
|
765
779
|
To remove: `hippo hook uninstall claude-code`
|
|
766
780
|
|
|
@@ -874,7 +888,7 @@ For how these mechanisms connect to LLM training, continual learning, and open r
|
|
|
874
888
|
| Auto-hook install | Yes | No | No | No |
|
|
875
889
|
| MCP server | Yes | Yes | No | No |
|
|
876
890
|
| Zero dependencies | Yes | No (ChromaDB) | No | No |
|
|
877
|
-
| LongMemEval R@5 (retrieval) |
|
|
891
|
+
| LongMemEval R@5 (retrieval) | 73.8% (hybrid, v0.28) | 96.6% (raw) / 100% (reranked) | ~49-85% | N/A |
|
|
878
892
|
| Git-friendly | Yes | No | No | Yes |
|
|
879
893
|
| Framework agnostic | Yes | Yes | Partial | Yes |
|
|
880
894
|
|
|
@@ -890,28 +904,30 @@ Two benchmarks testing two different things. Full details in [`benchmarks/`](ben
|
|
|
890
904
|
|
|
891
905
|
[LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) is the industry-standard benchmark: 500 questions across 5 memory abilities, embedded in 115k+ token chat histories.
|
|
892
906
|
|
|
893
|
-
**Hippo v0.
|
|
907
|
+
**Hippo v0.28.0 results (hybrid BM25 + cosine, full 500 questions):**
|
|
908
|
+
|
|
909
|
+
| Metric | v0.28 | v0.11 (BM25 only) |
|
|
910
|
+
|--------|-------|-------------------|
|
|
911
|
+
| Recall@1 | 46.6% | 50.4% |
|
|
912
|
+
| Recall@3 | **67.0%** | 66.6% |
|
|
913
|
+
| Recall@5 | 73.8% | 74.0% |
|
|
914
|
+
| Recall@10 | 81.0% | 82.6% |
|
|
915
|
+
| Answer in content@5 | **49.6%** | 46.6% |
|
|
894
916
|
|
|
895
|
-
|
|
|
896
|
-
|
|
897
|
-
|
|
|
898
|
-
|
|
|
899
|
-
|
|
|
900
|
-
|
|
|
901
|
-
|
|
|
917
|
+
| Question Type | Count | R@5 | R@10 |
|
|
918
|
+
|---------------|-------|-----|------|
|
|
919
|
+
| single-session-assistant | 56 | 100.0% | 100.0% |
|
|
920
|
+
| knowledge-update | 78 | 89.7% | 96.2% |
|
|
921
|
+
| multi-session | 133 | 72.2% | 82.0% |
|
|
922
|
+
| temporal-reasoning | 133 | 72.9% | 78.9% |
|
|
923
|
+
| single-session-user | 70 | 62.9% | 71.4% |
|
|
924
|
+
| single-session-preference | 30 | 20.0% | 33.3% |
|
|
902
925
|
|
|
903
|
-
|
|
904
|
-
|---------------|-------|-----|
|
|
905
|
-
| single-session-assistant | 56 | 94.6% |
|
|
906
|
-
| knowledge-update | 78 | 88.5% |
|
|
907
|
-
| temporal-reasoning | 133 | 73.7% |
|
|
908
|
-
| multi-session | 133 | 72.2% |
|
|
909
|
-
| single-session-user | 70 | 65.7% |
|
|
910
|
-
| single-session-preference | 30 | 26.7% |
|
|
926
|
+
For context: MemPalace scores 96.6% (raw) using ChromaDB embeddings + spatial indexing. Hippo v0.28 achieves 73.8% R@5 with hybrid BM25 + cosine. Hybrid scoring trades a little R@1 accuracy for better top-5 content relevance (answer_in_content@5 +3pp vs v0.11).
|
|
911
927
|
|
|
912
|
-
|
|
928
|
+
Hippo's strongest categories (single-session-assistant 100% R@5, knowledge-update 89.7%) are where keyword overlap between question and stored content is highest. The weakest (preference 20%) involves indirect references that need deeper semantic understanding.
|
|
913
929
|
|
|
914
|
-
|
|
930
|
+
> Note: v0.28 R@10 is 1.6pp below v0.11's BM25-only result. The earlier v0.27 benchmark showed an apparent 35pp regression — that was a methodology bug (budget-limited retrieval vs unlimited), fixed in v0.28 with the `minResults` option. See [`evals/README.md`](evals/README.md) for the full investigation and per-type breakdown.
|
|
915
931
|
|
|
916
932
|
```bash
|
|
917
933
|
cd benchmarks/longmemeval
|
|
@@ -948,7 +964,7 @@ node run.mjs --adapter all
|
|
|
948
964
|
Issues and PRs welcome. Before contributing, run `hippo status` in the repo root to see the project's own memory.
|
|
949
965
|
|
|
950
966
|
The interesting problems:
|
|
951
|
-
- **Improve LongMemEval score.** Current R@5 is
|
|
967
|
+
- **Improve LongMemEval score.** Current R@5 is 73.8% with hybrid BM25 + cosine (v0.28). Gap to MemPalace's 96.6% likely needs better chunking, reranking, or semantic compression — not just more of the same retrieval.
|
|
952
968
|
- Better consolidation heuristics (LLM-powered merge vs current text overlap)
|
|
953
969
|
- Web UI / dashboard for visualizing decay curves and memory health
|
|
954
970
|
- Optimal decay parameter tuning from real usage data
|
package/dist/cli.js
CHANGED
|
@@ -299,6 +299,9 @@ function autoInstallHooks(quiet) {
|
|
|
299
299
|
if (result.installedSessionStart) {
|
|
300
300
|
console.log(` Auto-installed hippo last-sleep SessionStart hook in ${hook} settings`);
|
|
301
301
|
}
|
|
302
|
+
if (result.installedUserPromptSubmit) {
|
|
303
|
+
console.log(` Auto-installed hippo pinned-inject UserPromptSubmit hook in ${hook} settings`);
|
|
304
|
+
}
|
|
302
305
|
if (result.migratedFromStop) {
|
|
303
306
|
console.log(` Migrated legacy Stop hook → SessionEnd (no longer runs every turn)`);
|
|
304
307
|
}
|
|
@@ -2102,7 +2105,41 @@ async function cmdContext(hippoRoot, args, flags) {
|
|
|
2102
2105
|
const recentSessionEvents = activeSnapshot?.session_id
|
|
2103
2106
|
? listSessionEvents(hippoRoot, { session_id: activeSnapshot.session_id, limit: 5 })
|
|
2104
2107
|
: [];
|
|
2105
|
-
|
|
2108
|
+
// --pinned-only: restrict to pinned entries only. Used by the Claude Code
|
|
2109
|
+
// UserPromptSubmit hook so invariants stay in context every turn.
|
|
2110
|
+
const pinnedOnly = flags['pinned-only'] === true;
|
|
2111
|
+
if (pinnedOnly) {
|
|
2112
|
+
const pinnedCfg = loadConfig(hippoRoot);
|
|
2113
|
+
if (!pinnedCfg.pinnedInject.enabled)
|
|
2114
|
+
return; // user disabled via config
|
|
2115
|
+
// Effective budget: explicit --budget wins over config.
|
|
2116
|
+
const effBudget = flags['budget'] !== undefined ? budget : pinnedCfg.pinnedInject.budget;
|
|
2117
|
+
const pinnedLocal = localEntries.filter((e) => e.pinned);
|
|
2118
|
+
const pinnedGlobal = globalEntries.filter((e) => e.pinned);
|
|
2119
|
+
if (pinnedLocal.length === 0 && pinnedGlobal.length === 0)
|
|
2120
|
+
return; // zero output
|
|
2121
|
+
const nowP = new Date();
|
|
2122
|
+
const rankedPinned = [
|
|
2123
|
+
...pinnedLocal.map((e) => ({ entry: e, isGlobal: false })),
|
|
2124
|
+
...pinnedGlobal.map((e) => ({ entry: e, isGlobal: true })),
|
|
2125
|
+
]
|
|
2126
|
+
.map(({ entry, isGlobal }) => ({
|
|
2127
|
+
entry,
|
|
2128
|
+
score: calculateStrength(entry, nowP) * (isGlobal ? 1 / 1.2 : 1),
|
|
2129
|
+
tokens: estimateTokens(entry.content),
|
|
2130
|
+
isGlobal,
|
|
2131
|
+
}))
|
|
2132
|
+
.sort((a, b) => b.score - a.score);
|
|
2133
|
+
let usedP = 0;
|
|
2134
|
+
for (const r of rankedPinned) {
|
|
2135
|
+
if (usedP + r.tokens > effBudget)
|
|
2136
|
+
continue;
|
|
2137
|
+
selectedItems.push(r);
|
|
2138
|
+
usedP += r.tokens;
|
|
2139
|
+
}
|
|
2140
|
+
totalTokens = usedP;
|
|
2141
|
+
}
|
|
2142
|
+
else if (query === '*') {
|
|
2106
2143
|
// No query: return strongest memories by strength, up to budget
|
|
2107
2144
|
const now = new Date();
|
|
2108
2145
|
const localRanked = localEntries
|
|
@@ -2165,17 +2202,26 @@ async function cmdContext(hippoRoot, args, flags) {
|
|
|
2165
2202
|
}
|
|
2166
2203
|
if (selectedItems.length === 0 && !activeSnapshot && recentSessionEvents.length === 0)
|
|
2167
2204
|
return;
|
|
2168
|
-
//
|
|
2169
|
-
|
|
2170
|
-
|
|
2171
|
-
|
|
2172
|
-
|
|
2173
|
-
|
|
2174
|
-
|
|
2205
|
+
// --pinned-only is called by the UserPromptSubmit hook every turn. Treat it
|
|
2206
|
+
// as read-only so pinned memories don't inflate retrieval_count or extend
|
|
2207
|
+
// their half_life by 2 days * turn-count over a long session.
|
|
2208
|
+
let updatedEntries;
|
|
2209
|
+
if (pinnedOnly) {
|
|
2210
|
+
updatedEntries = selectedItems.map((s) => s.entry);
|
|
2211
|
+
}
|
|
2212
|
+
else {
|
|
2213
|
+
// Mark retrieved and persist
|
|
2214
|
+
const toUpdate = selectedItems.map((s) => s.entry);
|
|
2215
|
+
updatedEntries = markRetrieved(toUpdate);
|
|
2216
|
+
const localIndex = loadIndex(hippoRoot);
|
|
2217
|
+
for (const u of updatedEntries) {
|
|
2218
|
+
const targetRoot = localIndex.entries[u.id] ? hippoRoot : (hasGlobal ? globalRoot : hippoRoot);
|
|
2219
|
+
writeEntry(targetRoot, u);
|
|
2220
|
+
}
|
|
2221
|
+
localIndex.last_retrieval_ids = updatedEntries.map((u) => u.id);
|
|
2222
|
+
saveIndex(hippoRoot, localIndex);
|
|
2223
|
+
updateStats(hippoRoot, { recalled: selectedItems.length });
|
|
2175
2224
|
}
|
|
2176
|
-
localIndex.last_retrieval_ids = updatedEntries.map((u) => u.id);
|
|
2177
|
-
saveIndex(hippoRoot, localIndex);
|
|
2178
|
-
updateStats(hippoRoot, { recalled: selectedItems.length });
|
|
2179
2225
|
const format = String(flags['format'] ?? 'markdown');
|
|
2180
2226
|
const framing = String(flags['framing'] ?? 'observe');
|
|
2181
2227
|
if (format === 'json') {
|
|
@@ -2190,6 +2236,38 @@ async function cmdContext(hippoRoot, args, flags) {
|
|
|
2190
2236
|
}));
|
|
2191
2237
|
console.log(JSON.stringify({ query, activeSnapshot, recentSessionEvents, memories: output, tokens: totalTokens }));
|
|
2192
2238
|
}
|
|
2239
|
+
else if (format === 'additional-context') {
|
|
2240
|
+
// Claude Code UserPromptSubmit hook JSON shape. Capture the markdown that
|
|
2241
|
+
// printContextMarkdown would write and wrap it as `additionalContext`.
|
|
2242
|
+
const lines = [];
|
|
2243
|
+
const realLog = console.log;
|
|
2244
|
+
console.log = (...parts) => { lines.push(parts.map(String).join(' ')); };
|
|
2245
|
+
try {
|
|
2246
|
+
if (activeSnapshot)
|
|
2247
|
+
printActiveTaskSnapshot(activeSnapshot);
|
|
2248
|
+
if (recentSessionEvents.length > 0)
|
|
2249
|
+
printSessionEvents(recentSessionEvents);
|
|
2250
|
+
printContextMarkdown(selectedItems.map((r) => ({
|
|
2251
|
+
entry: updatedEntries.find((u) => u.id === r.entry.id) ?? r.entry,
|
|
2252
|
+
score: r.score,
|
|
2253
|
+
tokens: r.tokens,
|
|
2254
|
+
isGlobal: r.isGlobal ?? false,
|
|
2255
|
+
})), totalTokens, framing);
|
|
2256
|
+
}
|
|
2257
|
+
finally {
|
|
2258
|
+
console.log = realLog;
|
|
2259
|
+
}
|
|
2260
|
+
const textBlock = lines.join('\n');
|
|
2261
|
+
if (!textBlock.trim())
|
|
2262
|
+
return;
|
|
2263
|
+
const payload = {
|
|
2264
|
+
hookSpecificOutput: {
|
|
2265
|
+
hookEventName: 'UserPromptSubmit',
|
|
2266
|
+
additionalContext: textBlock,
|
|
2267
|
+
},
|
|
2268
|
+
};
|
|
2269
|
+
process.stdout.write(JSON.stringify(payload));
|
|
2270
|
+
}
|
|
2193
2271
|
else {
|
|
2194
2272
|
if (activeSnapshot) {
|
|
2195
2273
|
printActiveTaskSnapshot(activeSnapshot);
|
|
@@ -2783,6 +2861,9 @@ function cmdHook(args, flags) {
|
|
|
2783
2861
|
if (result.installedSessionStart) {
|
|
2784
2862
|
console.log(`Installed hippo last-sleep SessionStart hook in ${result.target} settings`);
|
|
2785
2863
|
}
|
|
2864
|
+
if (result.installedUserPromptSubmit) {
|
|
2865
|
+
console.log(`Installed hippo pinned-inject UserPromptSubmit hook in ${result.target} settings`);
|
|
2866
|
+
}
|
|
2786
2867
|
if (result.migratedFromStop) {
|
|
2787
2868
|
console.log(`Migrated legacy Stop hook → SessionEnd (was running every turn; now fires once on session exit)`);
|
|
2788
2869
|
}
|
|
@@ -2873,6 +2954,8 @@ function cmdSetup(flags) {
|
|
|
2873
2954
|
bits.push('SessionEnd (session-end)');
|
|
2874
2955
|
if (result.installedSessionStart)
|
|
2875
2956
|
bits.push('SessionStart');
|
|
2957
|
+
if (result.installedUserPromptSubmit)
|
|
2958
|
+
bits.push('UserPromptSubmit (pinned-inject)');
|
|
2876
2959
|
if (result.migratedFromStop)
|
|
2877
2960
|
bits.push('migrated legacy Stop');
|
|
2878
2961
|
if (result.migratedSplitSessionEnd)
|
|
@@ -2968,7 +3051,8 @@ function installClaudeCodeSessionEndHook() {
|
|
|
2968
3051
|
const result = installJsonHooks('claude-code');
|
|
2969
3052
|
return {
|
|
2970
3053
|
installed: result.installedSessionEnd ||
|
|
2971
|
-
result.installedSessionStart
|
|
3054
|
+
result.installedSessionStart ||
|
|
3055
|
+
result.installedUserPromptSubmit,
|
|
2972
3056
|
migratedFromStop: result.migratedFromStop,
|
|
2973
3057
|
};
|
|
2974
3058
|
}
|
|
@@ -3104,7 +3188,8 @@ Commands:
|
|
|
3104
3188
|
context Smart context injection for AI agents
|
|
3105
3189
|
--auto Auto-detect task from git state
|
|
3106
3190
|
--budget <n> Token budget (default: 1500)
|
|
3107
|
-
--
|
|
3191
|
+
--pinned-only Only inject pinned memories (used by UserPromptSubmit hook)
|
|
3192
|
+
--format <fmt> Output format: markdown (default), json, or additional-context (Claude Code hook JSON)
|
|
3108
3193
|
--framing <mode> Framing: observe (default), suggest, assert
|
|
3109
3194
|
sleep Run consolidation pass (auto-learns + dedup + auto-shares)
|
|
3110
3195
|
--dry-run Preview without writing
|