hippo-memory 0.27.0 → 0.29.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +42 -22
- package/dist/cli.js +214 -17
- package/dist/cli.js.map +1 -1
- package/dist/config.d.ts +17 -0
- package/dist/config.d.ts.map +1 -1
- package/dist/config.js +13 -0
- package/dist/config.js.map +1 -1
- package/dist/consolidate.d.ts +1 -0
- package/dist/consolidate.d.ts.map +1 -1
- package/dist/consolidate.js +38 -1
- package/dist/consolidate.js.map +1 -1
- package/dist/eval.d.ts +35 -0
- package/dist/eval.d.ts.map +1 -1
- package/dist/eval.js +68 -8
- package/dist/eval.js.map +1 -1
- package/dist/hooks.d.ts +1 -0
- package/dist/hooks.d.ts.map +1 -1
- package/dist/hooks.js +24 -0
- package/dist/hooks.js.map +1 -1
- package/dist/refine-llm.d.ts +53 -0
- package/dist/refine-llm.d.ts.map +1 -0
- package/dist/refine-llm.js +147 -0
- package/dist/refine-llm.js.map +1 -0
- package/dist/replay.d.ts +41 -0
- package/dist/replay.d.ts.map +1 -0
- package/dist/replay.js +117 -0
- package/dist/replay.js.map +1 -0
- package/dist/search.d.ts +26 -0
- package/dist/search.d.ts.map +1 -1
- package/dist/search.js +70 -26
- package/dist/search.js.map +1 -1
- package/dist/shared.d.ts +4 -0
- package/dist/shared.d.ts.map +1 -1
- package/dist/shared.js +19 -18
- package/dist/shared.js.map +1 -1
- package/extensions/openclaw-plugin/openclaw.plugin.json +1 -1
- package/extensions/openclaw-plugin/package.json +1 -1
- package/openclaw.plugin.json +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -60,6 +60,21 @@ hippo recall "data pipeline issues" --budget 2000
|
|
|
60
60
|
|
|
61
61
|
---
|
|
62
62
|
|
|
63
|
+
### What's new in v0.29.0
|
|
64
|
+
|
|
65
|
+
- **Mid-session pinned re-injection (Claude Code).** Pinned memories now re-enter context every turn via a new `UserPromptSubmit` hook — not just at SessionStart — so invariants survive long sessions where Opus 4.7 might otherwise forget them. `hippo context --pinned-only --format additional-context` is the command the hook runs; it's read-only so retrieval_count doesn't inflate. Existing users must re-run `hippo hook install claude-code` to pick it up. Opt out with `{"pinnedInject":{"enabled":false}}` in `.hippo/config.json`.
|
|
66
|
+
- **Replay consolidation pass.** `hippo sleep` now rehearses 5 high-value memories per cycle (weighted by outcome feedback, emotional valence, under-rehearsal, idle time, strength). Closes the "replay" gap in the 7 hippocampal mechanisms. Non-destructive; opt out with `{"replay":{"count":0}}`.
|
|
67
|
+
- **Model profile benchmark (null result).** New reusable eval harness at `evals/model-profile-bench.json` + `scripts/run-model-profile-bench.mjs` measures invariant honor, hallucination guard, noise rejection, and contradiction rejection. 4.6 and 4.7 both score 100% with hippo context injection — no per-model profile tuning needed. See `docs/plans/2026-04-21-phase-a-decision.md`.
|
|
68
|
+
- **Physics soak test harness.** `scripts/soak-test.mjs` + 10 synthetic workload profiles. All 10 bounded at 100-tick smoke scale; grant-scale 100hr runs are separate follow-up work.
|
|
69
|
+
|
|
70
|
+
### What's new in v0.28.0
|
|
71
|
+
|
|
72
|
+
- **Budget saturation fix.** Large memories (14k+ chars) no longer starve retrieval. New `minResults` option guarantees at least N results regardless of token budget. `hippo recall <q> --min-results 5`.
|
|
73
|
+
- **LongMemEval parity restored.** The 35pp R@10 gap vs v0.11 was a benchmark methodology issue (budget-limited vs unlimited comparison). Corrected: v0.28 R@3 67.0% (+0.4pp), answer_in_content@5 49.6% (+3.0pp), R@10 81.0% (-1.6pp). Top-5 results now more often contain the actual answer.
|
|
74
|
+
- **MMR performance.** Re-ranking capped at top-100 candidates, dropping per-query time from ~50s to ~9s. `preparedCorpus` option skips per-query tokenization for batch callers.
|
|
75
|
+
- **RRF scoring option.** `hybridSearch` accepts `scoring: 'rrf'` for reciprocal rank fusion as an alternative to score blending.
|
|
76
|
+
- **`hippo refine` command.** LLM-powered semantic rewrite of memories for improved recall quality.
|
|
77
|
+
|
|
63
78
|
### What's new in v0.27.0
|
|
64
79
|
|
|
65
80
|
- **Recall is now debuggable.** `hippo explain <query>` prints the full score breakdown for each retrieved memory: BM25 + cosine, every multiplier (strength, recency, decision, path, source-bump, outcome), age, and final composite. Read-only so it's safe to run as a diagnostic.
|
|
@@ -740,7 +755,7 @@ No extra commands needed. Just `hippo init` and your agent knows about Hippo.
|
|
|
740
755
|
If you prefer explicit control:
|
|
741
756
|
|
|
742
757
|
```bash
|
|
743
|
-
hippo hook install claude-code # patches CLAUDE.md + adds SessionStart/SessionEnd hooks
|
|
758
|
+
hippo hook install claude-code # patches CLAUDE.md + adds SessionStart/SessionEnd + UserPromptSubmit hooks
|
|
744
759
|
hippo hook install codex # optional repair/manual run: patches AGENTS.md + wraps the detected Codex launcher
|
|
745
760
|
hippo hook install cursor # patches .cursorrules
|
|
746
761
|
hippo hook install openclaw # patches AGENTS.md
|
|
@@ -752,7 +767,10 @@ This adds a `<!-- hippo:start -->` ... `<!-- hippo:end -->` block that tells the
|
|
|
752
767
|
2. Run `hippo remember "<lesson>" --error` on errors
|
|
753
768
|
3. Run `hippo outcome --good` on completion
|
|
754
769
|
|
|
755
|
-
For Claude Code, it also adds
|
|
770
|
+
For Claude Code, it also adds:
|
|
771
|
+
- a `SessionEnd` hook so `hippo sleep` runs automatically when the session exits
|
|
772
|
+
- a `SessionStart` hook that prints the previous session's consolidation output
|
|
773
|
+
- a `UserPromptSubmit` hook that re-injects pinned memories (`hippo remember <text> --pin`) into every turn's context — so invariants survive long sessions where Opus 4.7 might otherwise "forget" them. Budget: 500 tokens per turn, skipped entirely when no pinned memories exist. Opt out with `{"pinnedInject":{"enabled":false}}` in `.hippo/config.json`.
|
|
756
774
|
|
|
757
775
|
To remove: `hippo hook uninstall claude-code`
|
|
758
776
|
|
|
@@ -866,7 +884,7 @@ For how these mechanisms connect to LLM training, continual learning, and open r
|
|
|
866
884
|
| Auto-hook install | Yes | No | No | No |
|
|
867
885
|
| MCP server | Yes | Yes | No | No |
|
|
868
886
|
| Zero dependencies | Yes | No (ChromaDB) | No | No |
|
|
869
|
-
| LongMemEval R@5 (retrieval) |
|
|
887
|
+
| LongMemEval R@5 (retrieval) | 73.8% (hybrid, v0.28) | 96.6% (raw) / 100% (reranked) | ~49-85% | N/A |
|
|
870
888
|
| Git-friendly | Yes | No | No | Yes |
|
|
871
889
|
| Framework agnostic | Yes | Yes | Partial | Yes |
|
|
872
890
|
|
|
@@ -882,28 +900,30 @@ Two benchmarks testing two different things. Full details in [`benchmarks/`](ben
|
|
|
882
900
|
|
|
883
901
|
[LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) is the industry-standard benchmark: 500 questions across 5 memory abilities, embedded in 115k+ token chat histories.
|
|
884
902
|
|
|
885
|
-
**Hippo v0.
|
|
903
|
+
**Hippo v0.28.0 results (hybrid BM25 + cosine, full 500 questions):**
|
|
904
|
+
|
|
905
|
+
| Metric | v0.28 | v0.11 (BM25 only) |
|
|
906
|
+
|--------|-------|-------------------|
|
|
907
|
+
| Recall@1 | 46.6% | 50.4% |
|
|
908
|
+
| Recall@3 | **67.0%** | 66.6% |
|
|
909
|
+
| Recall@5 | 73.8% | 74.0% |
|
|
910
|
+
| Recall@10 | 81.0% | 82.6% |
|
|
911
|
+
| Answer in content@5 | **49.6%** | 46.6% |
|
|
886
912
|
|
|
887
|
-
|
|
|
888
|
-
|
|
889
|
-
|
|
|
890
|
-
|
|
|
891
|
-
|
|
|
892
|
-
|
|
|
893
|
-
|
|
|
913
|
+
| Question Type | Count | R@5 | R@10 |
|
|
914
|
+
|---------------|-------|-----|------|
|
|
915
|
+
| single-session-assistant | 56 | 100.0% | 100.0% |
|
|
916
|
+
| knowledge-update | 78 | 89.7% | 96.2% |
|
|
917
|
+
| multi-session | 133 | 72.2% | 82.0% |
|
|
918
|
+
| temporal-reasoning | 133 | 72.9% | 78.9% |
|
|
919
|
+
| single-session-user | 70 | 62.9% | 71.4% |
|
|
920
|
+
| single-session-preference | 30 | 20.0% | 33.3% |
|
|
894
921
|
|
|
895
|
-
|
|
896
|
-
|---------------|-------|-----|
|
|
897
|
-
| single-session-assistant | 56 | 94.6% |
|
|
898
|
-
| knowledge-update | 78 | 88.5% |
|
|
899
|
-
| temporal-reasoning | 133 | 73.7% |
|
|
900
|
-
| multi-session | 133 | 72.2% |
|
|
901
|
-
| single-session-user | 70 | 65.7% |
|
|
902
|
-
| single-session-preference | 30 | 26.7% |
|
|
922
|
+
For context: MemPalace scores 96.6% (raw) using ChromaDB embeddings + spatial indexing. Hippo v0.28 achieves 73.8% R@5 with hybrid BM25 + cosine. Hybrid scoring trades a little R@1 accuracy for better top-5 content relevance (answer_in_content@5 +3pp vs v0.11).
|
|
903
923
|
|
|
904
|
-
|
|
924
|
+
Hippo's strongest categories (single-session-assistant 100% R@5, knowledge-update 89.7%) are where keyword overlap between question and stored content is highest. The weakest (preference 20%) involves indirect references that need deeper semantic understanding.
|
|
905
925
|
|
|
906
|
-
|
|
926
|
+
> Note: v0.28 R@10 is 1.6pp below v0.11's BM25-only result. The earlier v0.27 benchmark showed an apparent 35pp regression — that was a methodology bug (budget-limited retrieval vs unlimited), fixed in v0.28 with the `minResults` option. See [`evals/README.md`](evals/README.md) for the full investigation and per-type breakdown.
|
|
907
927
|
|
|
908
928
|
```bash
|
|
909
929
|
cd benchmarks/longmemeval
|
|
@@ -940,7 +960,7 @@ node run.mjs --adapter all
|
|
|
940
960
|
Issues and PRs welcome. Before contributing, run `hippo status` in the repo root to see the project's own memory.
|
|
941
961
|
|
|
942
962
|
The interesting problems:
|
|
943
|
-
- **Improve LongMemEval score.** Current R@5 is
|
|
963
|
+
- **Improve LongMemEval score.** Current R@5 is 73.8% with hybrid BM25 + cosine (v0.28). Gap to MemPalace's 96.6% likely needs better chunking, reranking, or semantic compression — not just more of the same retrieval.
|
|
944
964
|
- Better consolidation heuristics (LLM-powered merge vs current text overlap)
|
|
945
965
|
- Web UI / dashboard for visualizing decay curves and memory health
|
|
946
966
|
- Optimal decay parameter tuning from real usage data
|
package/dist/cli.js
CHANGED
|
@@ -47,7 +47,8 @@ import { DAILY_TASK_NAME, buildDailyRunnerCommand, listRegisteredWorkspaces, reg
|
|
|
47
47
|
import { importChatGPT, importClaude, importCursor, importGenericFile, importMarkdown, } from './importers.js';
|
|
48
48
|
import { cmdCapture } from './capture.js';
|
|
49
49
|
import { auditMemories } from './audit.js';
|
|
50
|
-
import { runEval, bootstrapCorpus } from './eval.js';
|
|
50
|
+
import { runEval, bootstrapCorpus, compareSummaries } from './eval.js';
|
|
51
|
+
import { refineStore } from './refine-llm.js';
|
|
51
52
|
import { wmPush, wmRead, wmClear, wmFlush } from './working-memory.js';
|
|
52
53
|
// ---------------------------------------------------------------------------
|
|
53
54
|
// Helpers
|
|
@@ -298,6 +299,9 @@ function autoInstallHooks(quiet) {
|
|
|
298
299
|
if (result.installedSessionStart) {
|
|
299
300
|
console.log(` Auto-installed hippo last-sleep SessionStart hook in ${hook} settings`);
|
|
300
301
|
}
|
|
302
|
+
if (result.installedUserPromptSubmit) {
|
|
303
|
+
console.log(` Auto-installed hippo pinned-inject UserPromptSubmit hook in ${hook} settings`);
|
|
304
|
+
}
|
|
301
305
|
if (result.migratedFromStop) {
|
|
302
306
|
console.log(` Migrated legacy Stop hook → SessionEnd (no longer runs every turn)`);
|
|
303
307
|
}
|
|
@@ -445,23 +449,32 @@ async function cmdRecall(hippoRoot, query, flags) {
|
|
|
445
449
|
? parseFloat(String(flags['mmr-lambda']))
|
|
446
450
|
: config.mmr.lambda;
|
|
447
451
|
const mmrEnabled = !noMmr && config.mmr.enabled;
|
|
452
|
+
const localBump = flags['equal-sources']
|
|
453
|
+
? 1.0
|
|
454
|
+
: flags['local-bump'] !== undefined
|
|
455
|
+
? parseFloat(String(flags['local-bump']))
|
|
456
|
+
: config.search.localBump;
|
|
457
|
+
const minResults = flags['min-results'] !== undefined
|
|
458
|
+
? parseInt(String(flags['min-results']), 10)
|
|
459
|
+
: undefined;
|
|
448
460
|
let results;
|
|
449
461
|
if (usePhysics && !hasGlobal) {
|
|
450
462
|
results = await physicsSearch(query, localEntries, {
|
|
451
463
|
budget,
|
|
452
464
|
hippoRoot,
|
|
453
465
|
physicsConfig: config.physics,
|
|
466
|
+
minResults,
|
|
454
467
|
});
|
|
455
468
|
}
|
|
456
469
|
else if (hasGlobal) {
|
|
457
470
|
// Use searchBothHybrid for merged results with embedding support
|
|
458
471
|
results = await searchBothHybrid(query, hippoRoot, globalRoot, {
|
|
459
|
-
budget, mmr: mmrEnabled, mmrLambda,
|
|
472
|
+
budget, mmr: mmrEnabled, mmrLambda, localBump, minResults,
|
|
460
473
|
});
|
|
461
474
|
}
|
|
462
475
|
else {
|
|
463
476
|
results = await hybridSearch(query, localEntries, {
|
|
464
|
-
budget, hippoRoot, mmr: mmrEnabled, mmrLambda,
|
|
477
|
+
budget, hippoRoot, mmr: mmrEnabled, mmrLambda, minResults,
|
|
465
478
|
});
|
|
466
479
|
}
|
|
467
480
|
if (limit < results.length) {
|
|
@@ -553,6 +566,11 @@ async function cmdExplain(hippoRoot, query, flags) {
|
|
|
553
566
|
? parseFloat(String(flags['mmr-lambda']))
|
|
554
567
|
: config.mmr.lambda;
|
|
555
568
|
const mmrEnabled = !noMmr && config.mmr.enabled;
|
|
569
|
+
const localBump = flags['equal-sources']
|
|
570
|
+
? 1.0
|
|
571
|
+
: flags['local-bump'] !== undefined
|
|
572
|
+
? parseFloat(String(flags['local-bump']))
|
|
573
|
+
: config.search.localBump;
|
|
556
574
|
let results;
|
|
557
575
|
let modeUsed;
|
|
558
576
|
if (usePhysics && !hasGlobal) {
|
|
@@ -566,7 +584,7 @@ async function cmdExplain(hippoRoot, query, flags) {
|
|
|
566
584
|
}
|
|
567
585
|
else if (hasGlobal) {
|
|
568
586
|
results = await searchBothHybrid(query, hippoRoot, globalRoot, {
|
|
569
|
-
budget, explain: true, mmr: mmrEnabled, mmrLambda,
|
|
587
|
+
budget, explain: true, mmr: mmrEnabled, mmrLambda, localBump,
|
|
570
588
|
});
|
|
571
589
|
modeUsed = 'searchBothHybrid';
|
|
572
590
|
}
|
|
@@ -663,6 +681,7 @@ async function cmdEval(hippoRoot, corpusPath, flags) {
|
|
|
663
681
|
const asJson = Boolean(flags['json']);
|
|
664
682
|
const minMrr = flags['min-mrr'] !== undefined ? parseFloat(String(flags['min-mrr'])) : null;
|
|
665
683
|
const showCases = Boolean(flags['show-cases']);
|
|
684
|
+
const comparePath = flags['compare'] ? String(flags['compare']) : null;
|
|
666
685
|
const noMmr = Boolean(flags['no-mmr']);
|
|
667
686
|
const mmrLambda = flags['mmr-lambda'] !== undefined ? parseFloat(String(flags['mmr-lambda'])) : undefined;
|
|
668
687
|
const embeddingWeight = flags['embedding-weight'] !== undefined ? parseFloat(String(flags['embedding-weight'])) : undefined;
|
|
@@ -702,11 +721,19 @@ async function cmdEval(hippoRoot, corpusPath, flags) {
|
|
|
702
721
|
console.error(`Failed to read corpus: ${err instanceof Error ? err.message : err}`);
|
|
703
722
|
process.exit(1);
|
|
704
723
|
}
|
|
724
|
+
const globalRoot = getGlobalRoot();
|
|
725
|
+
const localBump = flags['equal-sources']
|
|
726
|
+
? 1.0
|
|
727
|
+
: flags['local-bump'] !== undefined
|
|
728
|
+
? parseFloat(String(flags['local-bump']))
|
|
729
|
+
: loadConfig(hippoRoot).search.localBump;
|
|
705
730
|
const summary = await runEval(cases, entries, {
|
|
706
731
|
hippoRoot,
|
|
732
|
+
globalRoot,
|
|
707
733
|
mmr: !noMmr,
|
|
708
734
|
mmrLambda,
|
|
709
735
|
embeddingWeight,
|
|
736
|
+
localBump,
|
|
710
737
|
});
|
|
711
738
|
if (asJson) {
|
|
712
739
|
console.log(JSON.stringify(summary, null, 2));
|
|
@@ -752,6 +779,52 @@ async function cmdEval(hippoRoot, corpusPath, flags) {
|
|
|
752
779
|
console.error(`MRR ${fmt(summary.meanMrr, 4)} below threshold ${minMrr}`);
|
|
753
780
|
process.exit(1);
|
|
754
781
|
}
|
|
782
|
+
if (comparePath) {
|
|
783
|
+
if (!fs.existsSync(comparePath)) {
|
|
784
|
+
console.error(`Baseline file not found: ${comparePath}`);
|
|
785
|
+
process.exit(1);
|
|
786
|
+
}
|
|
787
|
+
let baseline;
|
|
788
|
+
try {
|
|
789
|
+
baseline = JSON.parse(fs.readFileSync(comparePath, 'utf8'));
|
|
790
|
+
}
|
|
791
|
+
catch (err) {
|
|
792
|
+
console.error(`Failed to parse baseline: ${err instanceof Error ? err.message : err}`);
|
|
793
|
+
process.exit(1);
|
|
794
|
+
}
|
|
795
|
+
const cmp = compareSummaries(baseline, summary);
|
|
796
|
+
if (asJson) {
|
|
797
|
+
// The main JSON output already emitted; append comparison to stderr so
|
|
798
|
+
// both can be captured independently.
|
|
799
|
+
console.error(JSON.stringify({ compare: cmp }, null, 2));
|
|
800
|
+
}
|
|
801
|
+
else {
|
|
802
|
+
console.log();
|
|
803
|
+
console.log('Compare vs baseline:');
|
|
804
|
+
const sign = (d) => (d >= 0 ? '+' : '') + fmt(d, 4);
|
|
805
|
+
console.log(` MRR: ${sign(cmp.aggregate.mrr)}`);
|
|
806
|
+
console.log(` Recall@5: ${sign(cmp.aggregate.recallAt5)}`);
|
|
807
|
+
console.log(` Recall@10: ${sign(cmp.aggregate.recallAt10)}`);
|
|
808
|
+
console.log(` NDCG@10: ${sign(cmp.aggregate.ndcgAt10)}`);
|
|
809
|
+
console.log();
|
|
810
|
+
console.log(` improved: ${cmp.improved.length} regressed: ${cmp.regressed.length} unchanged: ${cmp.unchanged}`);
|
|
811
|
+
if (cmp.onlyInBaseline.length > 0)
|
|
812
|
+
console.log(` only in baseline: ${cmp.onlyInBaseline.length}`);
|
|
813
|
+
if (cmp.onlyInCurrent.length > 0)
|
|
814
|
+
console.log(` only in current: ${cmp.onlyInCurrent.length}`);
|
|
815
|
+
const showPerCase = cmp.improved.length + cmp.regressed.length > 0;
|
|
816
|
+
if (showPerCase) {
|
|
817
|
+
for (const d of cmp.improved.slice(0, 5)) {
|
|
818
|
+
const delta = d.ndcgAfter - d.ndcgBefore;
|
|
819
|
+
console.log(` + [${d.id}] NDCG ${fmt(d.ndcgBefore, 2)} -> ${fmt(d.ndcgAfter, 2)} (+${fmt(delta, 3)})`);
|
|
820
|
+
}
|
|
821
|
+
for (const d of cmp.regressed.slice(0, 5)) {
|
|
822
|
+
const delta = d.ndcgAfter - d.ndcgBefore;
|
|
823
|
+
console.log(` - [${d.id}] NDCG ${fmt(d.ndcgBefore, 2)} -> ${fmt(d.ndcgAfter, 2)} (${fmt(delta, 3)})`);
|
|
824
|
+
}
|
|
825
|
+
}
|
|
826
|
+
}
|
|
827
|
+
}
|
|
755
828
|
}
|
|
756
829
|
function cmdTrace(hippoRoot, id, flags) {
|
|
757
830
|
requireInit(hippoRoot);
|
|
@@ -854,6 +927,34 @@ function cmdTrace(hippoRoot, id, flags) {
|
|
|
854
927
|
}
|
|
855
928
|
}
|
|
856
929
|
}
|
|
930
|
+
async function cmdRefine(hippoRoot, flags) {
|
|
931
|
+
requireInit(hippoRoot);
|
|
932
|
+
const apiKey = process.env.ANTHROPIC_API_KEY;
|
|
933
|
+
if (!apiKey) {
|
|
934
|
+
console.error('hippo refine needs ANTHROPIC_API_KEY in the environment.');
|
|
935
|
+
process.exit(1);
|
|
936
|
+
}
|
|
937
|
+
const dryRun = Boolean(flags['dry-run']);
|
|
938
|
+
const all = Boolean(flags['all']);
|
|
939
|
+
const limit = flags['limit'] !== undefined ? parseInt(String(flags['limit']), 10) : undefined;
|
|
940
|
+
const model = flags['model'] ? String(flags['model']) : undefined;
|
|
941
|
+
const asJson = Boolean(flags['json']);
|
|
942
|
+
const result = await refineStore(hippoRoot, { apiKey, model, limit, dryRun, all });
|
|
943
|
+
if (asJson) {
|
|
944
|
+
console.log(JSON.stringify(result, null, 2));
|
|
945
|
+
return;
|
|
946
|
+
}
|
|
947
|
+
console.log(`Scanned: ${result.scanned} consolidated semantic memories`);
|
|
948
|
+
console.log(`Refined: ${result.refined}${dryRun ? ' (dry-run — no writes)' : ''}`);
|
|
949
|
+
console.log(`Skipped: ${result.skipped}`);
|
|
950
|
+
console.log(`Failed: ${result.failed}`);
|
|
951
|
+
if (result.failed > 0) {
|
|
952
|
+
console.log('\nFailures:');
|
|
953
|
+
for (const d of result.details.filter((x) => x.status === 'failed').slice(0, 5)) {
|
|
954
|
+
console.log(` ${d.id}: ${d.reason}`);
|
|
955
|
+
}
|
|
956
|
+
}
|
|
957
|
+
}
|
|
857
958
|
/**
|
|
858
959
|
* Scan for Claude Code MEMORY.md files and import new entries into hippo.
|
|
859
960
|
* Looks in ~/.claude/projects/<project>/memory/ for .md files with YAML frontmatter.
|
|
@@ -2004,7 +2105,41 @@ async function cmdContext(hippoRoot, args, flags) {
|
|
|
2004
2105
|
const recentSessionEvents = activeSnapshot?.session_id
|
|
2005
2106
|
? listSessionEvents(hippoRoot, { session_id: activeSnapshot.session_id, limit: 5 })
|
|
2006
2107
|
: [];
|
|
2007
|
-
|
|
2108
|
+
// --pinned-only: restrict to pinned entries only. Used by the Claude Code
|
|
2109
|
+
// UserPromptSubmit hook so invariants stay in context every turn.
|
|
2110
|
+
const pinnedOnly = flags['pinned-only'] === true;
|
|
2111
|
+
if (pinnedOnly) {
|
|
2112
|
+
const pinnedCfg = loadConfig(hippoRoot);
|
|
2113
|
+
if (!pinnedCfg.pinnedInject.enabled)
|
|
2114
|
+
return; // user disabled via config
|
|
2115
|
+
// Effective budget: explicit --budget wins over config.
|
|
2116
|
+
const effBudget = flags['budget'] !== undefined ? budget : pinnedCfg.pinnedInject.budget;
|
|
2117
|
+
const pinnedLocal = localEntries.filter((e) => e.pinned);
|
|
2118
|
+
const pinnedGlobal = globalEntries.filter((e) => e.pinned);
|
|
2119
|
+
if (pinnedLocal.length === 0 && pinnedGlobal.length === 0)
|
|
2120
|
+
return; // zero output
|
|
2121
|
+
const nowP = new Date();
|
|
2122
|
+
const rankedPinned = [
|
|
2123
|
+
...pinnedLocal.map((e) => ({ entry: e, isGlobal: false })),
|
|
2124
|
+
...pinnedGlobal.map((e) => ({ entry: e, isGlobal: true })),
|
|
2125
|
+
]
|
|
2126
|
+
.map(({ entry, isGlobal }) => ({
|
|
2127
|
+
entry,
|
|
2128
|
+
score: calculateStrength(entry, nowP) * (isGlobal ? 1 / 1.2 : 1),
|
|
2129
|
+
tokens: estimateTokens(entry.content),
|
|
2130
|
+
isGlobal,
|
|
2131
|
+
}))
|
|
2132
|
+
.sort((a, b) => b.score - a.score);
|
|
2133
|
+
let usedP = 0;
|
|
2134
|
+
for (const r of rankedPinned) {
|
|
2135
|
+
if (usedP + r.tokens > effBudget)
|
|
2136
|
+
continue;
|
|
2137
|
+
selectedItems.push(r);
|
|
2138
|
+
usedP += r.tokens;
|
|
2139
|
+
}
|
|
2140
|
+
totalTokens = usedP;
|
|
2141
|
+
}
|
|
2142
|
+
else if (query === '*') {
|
|
2008
2143
|
// No query: return strongest memories by strength, up to budget
|
|
2009
2144
|
const now = new Date();
|
|
2010
2145
|
const localRanked = localEntries
|
|
@@ -2067,17 +2202,26 @@ async function cmdContext(hippoRoot, args, flags) {
|
|
|
2067
2202
|
}
|
|
2068
2203
|
if (selectedItems.length === 0 && !activeSnapshot && recentSessionEvents.length === 0)
|
|
2069
2204
|
return;
|
|
2070
|
-
//
|
|
2071
|
-
|
|
2072
|
-
|
|
2073
|
-
|
|
2074
|
-
|
|
2075
|
-
|
|
2076
|
-
|
|
2205
|
+
// --pinned-only is called by the UserPromptSubmit hook every turn. Treat it
|
|
2206
|
+
// as read-only so pinned memories don't inflate retrieval_count or extend
|
|
2207
|
+
// their half_life by 2 days * turn-count over a long session.
|
|
2208
|
+
let updatedEntries;
|
|
2209
|
+
if (pinnedOnly) {
|
|
2210
|
+
updatedEntries = selectedItems.map((s) => s.entry);
|
|
2211
|
+
}
|
|
2212
|
+
else {
|
|
2213
|
+
// Mark retrieved and persist
|
|
2214
|
+
const toUpdate = selectedItems.map((s) => s.entry);
|
|
2215
|
+
updatedEntries = markRetrieved(toUpdate);
|
|
2216
|
+
const localIndex = loadIndex(hippoRoot);
|
|
2217
|
+
for (const u of updatedEntries) {
|
|
2218
|
+
const targetRoot = localIndex.entries[u.id] ? hippoRoot : (hasGlobal ? globalRoot : hippoRoot);
|
|
2219
|
+
writeEntry(targetRoot, u);
|
|
2220
|
+
}
|
|
2221
|
+
localIndex.last_retrieval_ids = updatedEntries.map((u) => u.id);
|
|
2222
|
+
saveIndex(hippoRoot, localIndex);
|
|
2223
|
+
updateStats(hippoRoot, { recalled: selectedItems.length });
|
|
2077
2224
|
}
|
|
2078
|
-
localIndex.last_retrieval_ids = updatedEntries.map((u) => u.id);
|
|
2079
|
-
saveIndex(hippoRoot, localIndex);
|
|
2080
|
-
updateStats(hippoRoot, { recalled: selectedItems.length });
|
|
2081
2225
|
const format = String(flags['format'] ?? 'markdown');
|
|
2082
2226
|
const framing = String(flags['framing'] ?? 'observe');
|
|
2083
2227
|
if (format === 'json') {
|
|
@@ -2092,6 +2236,38 @@ async function cmdContext(hippoRoot, args, flags) {
|
|
|
2092
2236
|
}));
|
|
2093
2237
|
console.log(JSON.stringify({ query, activeSnapshot, recentSessionEvents, memories: output, tokens: totalTokens }));
|
|
2094
2238
|
}
|
|
2239
|
+
else if (format === 'additional-context') {
|
|
2240
|
+
// Claude Code UserPromptSubmit hook JSON shape. Capture the markdown that
|
|
2241
|
+
// printContextMarkdown would write and wrap it as `additionalContext`.
|
|
2242
|
+
const lines = [];
|
|
2243
|
+
const realLog = console.log;
|
|
2244
|
+
console.log = (...parts) => { lines.push(parts.map(String).join(' ')); };
|
|
2245
|
+
try {
|
|
2246
|
+
if (activeSnapshot)
|
|
2247
|
+
printActiveTaskSnapshot(activeSnapshot);
|
|
2248
|
+
if (recentSessionEvents.length > 0)
|
|
2249
|
+
printSessionEvents(recentSessionEvents);
|
|
2250
|
+
printContextMarkdown(selectedItems.map((r) => ({
|
|
2251
|
+
entry: updatedEntries.find((u) => u.id === r.entry.id) ?? r.entry,
|
|
2252
|
+
score: r.score,
|
|
2253
|
+
tokens: r.tokens,
|
|
2254
|
+
isGlobal: r.isGlobal ?? false,
|
|
2255
|
+
})), totalTokens, framing);
|
|
2256
|
+
}
|
|
2257
|
+
finally {
|
|
2258
|
+
console.log = realLog;
|
|
2259
|
+
}
|
|
2260
|
+
const textBlock = lines.join('\n');
|
|
2261
|
+
if (!textBlock.trim())
|
|
2262
|
+
return;
|
|
2263
|
+
const payload = {
|
|
2264
|
+
hookSpecificOutput: {
|
|
2265
|
+
hookEventName: 'UserPromptSubmit',
|
|
2266
|
+
additionalContext: textBlock,
|
|
2267
|
+
},
|
|
2268
|
+
};
|
|
2269
|
+
process.stdout.write(JSON.stringify(payload));
|
|
2270
|
+
}
|
|
2095
2271
|
else {
|
|
2096
2272
|
if (activeSnapshot) {
|
|
2097
2273
|
printActiveTaskSnapshot(activeSnapshot);
|
|
@@ -2685,6 +2861,9 @@ function cmdHook(args, flags) {
|
|
|
2685
2861
|
if (result.installedSessionStart) {
|
|
2686
2862
|
console.log(`Installed hippo last-sleep SessionStart hook in ${result.target} settings`);
|
|
2687
2863
|
}
|
|
2864
|
+
if (result.installedUserPromptSubmit) {
|
|
2865
|
+
console.log(`Installed hippo pinned-inject UserPromptSubmit hook in ${result.target} settings`);
|
|
2866
|
+
}
|
|
2688
2867
|
if (result.migratedFromStop) {
|
|
2689
2868
|
console.log(`Migrated legacy Stop hook → SessionEnd (was running every turn; now fires once on session exit)`);
|
|
2690
2869
|
}
|
|
@@ -2775,6 +2954,8 @@ function cmdSetup(flags) {
|
|
|
2775
2954
|
bits.push('SessionEnd (session-end)');
|
|
2776
2955
|
if (result.installedSessionStart)
|
|
2777
2956
|
bits.push('SessionStart');
|
|
2957
|
+
if (result.installedUserPromptSubmit)
|
|
2958
|
+
bits.push('UserPromptSubmit (pinned-inject)');
|
|
2778
2959
|
if (result.migratedFromStop)
|
|
2779
2960
|
bits.push('migrated legacy Stop');
|
|
2780
2961
|
if (result.migratedSplitSessionEnd)
|
|
@@ -2870,7 +3051,8 @@ function installClaudeCodeSessionEndHook() {
|
|
|
2870
3051
|
const result = installJsonHooks('claude-code');
|
|
2871
3052
|
return {
|
|
2872
3053
|
installed: result.installedSessionEnd ||
|
|
2873
|
-
result.installedSessionStart
|
|
3054
|
+
result.installedSessionStart ||
|
|
3055
|
+
result.installedUserPromptSubmit,
|
|
2874
3056
|
migratedFromStop: result.migratedFromStop,
|
|
2875
3057
|
};
|
|
2876
3058
|
}
|
|
@@ -2968,6 +3150,7 @@ Commands:
|
|
|
2968
3150
|
--global Store in global store ($HIPPO_HOME or ~/.hippo/)
|
|
2969
3151
|
recall <query> Search and retrieve memories (local + global)
|
|
2970
3152
|
--budget <n> Token budget (default: 4000)
|
|
3153
|
+
--min-results <n> Minimum results regardless of budget (default: 1)
|
|
2971
3154
|
--json Output as JSON
|
|
2972
3155
|
--why Show match reasons and source annotations
|
|
2973
3156
|
--no-mmr Disable MMR diversity re-ranking
|
|
@@ -2982,20 +3165,31 @@ Commands:
|
|
|
2982
3165
|
trace <id> Memory dossier: content, decay trajectory, retrievals,
|
|
2983
3166
|
outcomes, consolidation parents, open conflicts
|
|
2984
3167
|
--json Output as JSON
|
|
3168
|
+
refine Rewrite consolidated semantic memories with Claude
|
|
3169
|
+
--limit <n> Cap the number of memories processed this run
|
|
3170
|
+
--all Ignore \`llm-refined\` tag and re-refine everything
|
|
3171
|
+
--dry-run Call the API but don't write results back
|
|
3172
|
+
--model <id> Override the default model (claude-sonnet-4-6)
|
|
3173
|
+
--json Output summary as JSON
|
|
3174
|
+
(requires ANTHROPIC_API_KEY in env)
|
|
2985
3175
|
eval [<corpus.json>] Measure recall quality against a test corpus
|
|
2986
3176
|
--bootstrap Generate a synthetic corpus from current memories
|
|
2987
3177
|
--out <path> With --bootstrap, write to file instead of stdout
|
|
2988
3178
|
--max-cases <n> With --bootstrap, cap case count (default: 50)
|
|
2989
3179
|
--show-cases Print per-case details (query, R@10, missed, top 3)
|
|
3180
|
+
--compare <path> JSON from a prior \`eval --json\` run; print deltas
|
|
2990
3181
|
--no-mmr Disable MMR for this eval run
|
|
2991
3182
|
--mmr-lambda <f> Override MMR lambda for this run
|
|
2992
3183
|
--embedding-weight <f> Override cosine weight (default: 0.6)
|
|
3184
|
+
--local-bump <f> Local-over-global priority multiplier (default: 1.2)
|
|
3185
|
+
--equal-sources Shortcut for --local-bump 1.0
|
|
2993
3186
|
--min-mrr <f> Exit non-zero if mean MRR falls below this
|
|
2994
3187
|
--json Output full summary as JSON
|
|
2995
3188
|
context Smart context injection for AI agents
|
|
2996
3189
|
--auto Auto-detect task from git state
|
|
2997
3190
|
--budget <n> Token budget (default: 1500)
|
|
2998
|
-
--
|
|
3191
|
+
--pinned-only Only inject pinned memories (used by UserPromptSubmit hook)
|
|
3192
|
+
--format <fmt> Output format: markdown (default), json, or additional-context (Claude Code hook JSON)
|
|
2999
3193
|
--framing <mode> Framing: observe (default), suggest, assert
|
|
3000
3194
|
sleep Run consolidation pass (auto-learns + dedup + auto-shares)
|
|
3001
3195
|
--dry-run Preview without writing
|
|
@@ -3216,6 +3410,9 @@ async function main() {
|
|
|
3216
3410
|
cmdTrace(hippoRoot, id, flags);
|
|
3217
3411
|
break;
|
|
3218
3412
|
}
|
|
3413
|
+
case 'refine':
|
|
3414
|
+
await cmdRefine(hippoRoot, flags);
|
|
3415
|
+
break;
|
|
3219
3416
|
case 'sleep':
|
|
3220
3417
|
cmdSleep(hippoRoot, flags);
|
|
3221
3418
|
break;
|