npm - hippo-memory - Versions diffs - 0.27.0 → 0.29.0 - Mend

hippo-memory 0.27.0 → 0.29.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/README.md +42 -22
package/dist/cli.js +214 -17
package/dist/cli.js.map +1 -1
package/dist/config.d.ts +17 -0
package/dist/config.d.ts.map +1 -1
package/dist/config.js +13 -0
package/dist/config.js.map +1 -1
package/dist/consolidate.d.ts +1 -0
package/dist/consolidate.d.ts.map +1 -1
package/dist/consolidate.js +38 -1
package/dist/consolidate.js.map +1 -1
package/dist/eval.d.ts +35 -0
package/dist/eval.d.ts.map +1 -1
package/dist/eval.js +68 -8
package/dist/eval.js.map +1 -1
package/dist/hooks.d.ts +1 -0
package/dist/hooks.d.ts.map +1 -1
package/dist/hooks.js +24 -0
package/dist/hooks.js.map +1 -1
package/dist/refine-llm.d.ts +53 -0
package/dist/refine-llm.d.ts.map +1 -0
package/dist/refine-llm.js +147 -0
package/dist/refine-llm.js.map +1 -0
package/dist/replay.d.ts +41 -0
package/dist/replay.d.ts.map +1 -0
package/dist/replay.js +117 -0
package/dist/replay.js.map +1 -0
package/dist/search.d.ts +26 -0
package/dist/search.d.ts.map +1 -1
package/dist/search.js +70 -26
package/dist/search.js.map +1 -1
package/dist/shared.d.ts +4 -0
package/dist/shared.d.ts.map +1 -1
package/dist/shared.js +19 -18
package/dist/shared.js.map +1 -1
package/extensions/openclaw-plugin/openclaw.plugin.json +1 -1
package/extensions/openclaw-plugin/package.json +1 -1
package/openclaw.plugin.json +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -60,6 +60,21 @@ hippo recall "data pipeline issues" --budget 2000
 ---
+### What's new in v0.29.0
+- **Mid-session pinned re-injection (Claude Code).** Pinned memories now re-enter context every turn via a new `UserPromptSubmit` hook — not just at SessionStart — so invariants survive long sessions where Opus 4.7 might otherwise forget them. `hippo context --pinned-only --format additional-context` is the command the hook runs; it's read-only so retrieval_count doesn't inflate. Existing users must re-run `hippo hook install claude-code` to pick it up. Opt out with `{"pinnedInject":{"enabled":false}}` in `.hippo/config.json`.
+- **Replay consolidation pass.** `hippo sleep` now rehearses 5 high-value memories per cycle (weighted by outcome feedback, emotional valence, under-rehearsal, idle time, strength). Closes the "replay" gap in the 7 hippocampal mechanisms. Non-destructive; opt out with `{"replay":{"count":0}}`.
+- **Model profile benchmark (null result).** New reusable eval harness at `evals/model-profile-bench.json` + `scripts/run-model-profile-bench.mjs` measures invariant honor, hallucination guard, noise rejection, and contradiction rejection. 4.6 and 4.7 both score 100% with hippo context injection — no per-model profile tuning needed. See `docs/plans/2026-04-21-phase-a-decision.md`.
+- **Physics soak test harness.** `scripts/soak-test.mjs` + 10 synthetic workload profiles. All 10 bounded at 100-tick smoke scale; grant-scale 100hr runs are separate follow-up work.
+### What's new in v0.28.0
+- **Budget saturation fix.** Large memories (14k+ chars) no longer starve retrieval. New `minResults` option guarantees at least N results regardless of token budget. `hippo recall <q> --min-results 5`.
+- **LongMemEval parity restored.** The 35pp R@10 gap vs v0.11 was a benchmark methodology issue (budget-limited vs unlimited comparison). Corrected: v0.28 R@3 67.0% (+0.4pp), answer_in_content@5 49.6% (+3.0pp), R@10 81.0% (-1.6pp). Top-5 results now more often contain the actual answer.
+- **MMR performance.** Re-ranking capped at top-100 candidates, dropping per-query time from ~50s to ~9s. `preparedCorpus` option skips per-query tokenization for batch callers.
+- **RRF scoring option.** `hybridSearch` accepts `scoring: 'rrf'` for reciprocal rank fusion as an alternative to score blending.
+- **`hippo refine` command.** LLM-powered semantic rewrite of memories for improved recall quality.
 ### What's new in v0.27.0
 - **Recall is now debuggable.** `hippo explain <query>` prints the full score breakdown for each retrieved memory: BM25 + cosine, every multiplier (strength, recency, decision, path, source-bump, outcome), age, and final composite. Read-only so it's safe to run as a diagnostic.
@@ -740,7 +755,7 @@ No extra commands needed. Just `hippo init` and your agent knows about Hippo.
 If you prefer explicit control:
 ```bash
-hippo hook install claude-code   # patches CLAUDE.md + adds SessionStart/SessionEnd hooks
+hippo hook install claude-code   # patches CLAUDE.md + adds SessionStart/SessionEnd + UserPromptSubmit hooks
 hippo hook install codex         # optional repair/manual run: patches AGENTS.md + wraps the detected Codex launcher
 hippo hook install cursor        # patches .cursorrules
 hippo hook install openclaw      # patches AGENTS.md
@@ -752,7 +767,10 @@ This adds a `<!-- hippo:start -->` ... `<!-- hippo:end -->` block that tells the
 2. Run `hippo remember "<lesson>" --error` on errors
 3. Run `hippo outcome --good` on completion
-For Claude Code, it also adds a Stop hook to `~/.claude/settings.json` so `hippo sleep` runs automatically when the session exits.
+For Claude Code, it also adds:
+- a `SessionEnd` hook so `hippo sleep` runs automatically when the session exits
+- a `SessionStart` hook that prints the previous session's consolidation output
+- a `UserPromptSubmit` hook that re-injects pinned memories (`hippo remember <text> --pin`) into every turn's context — so invariants survive long sessions where Opus 4.7 might otherwise "forget" them. Budget: 500 tokens per turn, skipped entirely when no pinned memories exist. Opt out with `{"pinnedInject":{"enabled":false}}` in `.hippo/config.json`.
 To remove: `hippo hook uninstall claude-code`
@@ -866,7 +884,7 @@ For how these mechanisms connect to LLM training, continual learning, and open r
 | Auto-hook install | Yes | No | No | No |
 | MCP server | Yes | Yes | No | No |
 | Zero dependencies | Yes | No (ChromaDB) | No | No |
-| LongMemEval R@5 (retrieval) | 74.0% (BM25 only) | 96.6% (raw) / 100% (reranked) | ~49-85% | N/A |
+| LongMemEval R@5 (retrieval) | 73.8% (hybrid, v0.28) | 96.6% (raw) / 100% (reranked) | ~49-85% | N/A |
 | Git-friendly | Yes | No | No | Yes |
 | Framework agnostic | Yes | Yes | Partial | Yes |
@@ -882,28 +900,30 @@ Two benchmarks testing two different things. Full details in [`benchmarks/`](ben
 [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) is the industry-standard benchmark: 500 questions across 5 memory abilities, embedded in 115k+ token chat histories.
-**Hippo v0.11.0 results (BM25 only, zero dependencies):**
+**Hippo v0.28.0 results (hybrid BM25 + cosine, full 500 questions):**
+| Metric | v0.28 | v0.11 (BM25 only) |
+|--------|-------|-------------------|
+| Recall@1 | 46.6% | 50.4% |
+| Recall@3 | **67.0%** | 66.6% |
+| Recall@5 | 73.8% | 74.0% |
+| Recall@10 | 81.0% | 82.6% |
+| Answer in content@5 | **49.6%** | 46.6% |
-| Metric | Score |
-|--------|-------|
-| Recall@1 | 50.4% |
-| Recall@3 | 66.6% |
-| Recall@5 | 74.0% |
-| Recall@10 | 82.6% |
-| Answer in content@5 | 46.6% |
+| Question Type | Count | R@5 | R@10 |
+|---------------|-------|-----|------|
+| single-session-assistant | 56 | 100.0% | 100.0% |
+| knowledge-update | 78 | 89.7% | 96.2% |
+| multi-session | 133 | 72.2% | 82.0% |
+| temporal-reasoning | 133 | 72.9% | 78.9% |
+| single-session-user | 70 | 62.9% | 71.4% |
+| single-session-preference | 30 | 20.0% | 33.3% |
-| Question Type | Count | R@5 |
-|---------------|-------|-----|
-| single-session-assistant | 56 | 94.6% |
-| knowledge-update | 78 | 88.5% |
-| temporal-reasoning | 133 | 73.7% |
-| multi-session | 133 | 72.2% |
-| single-session-user | 70 | 65.7% |
-| single-session-preference | 30 | 26.7% |
+For context: MemPalace scores 96.6% (raw) using ChromaDB embeddings + spatial indexing. Hippo v0.28 achieves 73.8% R@5 with hybrid BM25 + cosine. Hybrid scoring trades a little R@1 accuracy for better top-5 content relevance (answer_in_content@5 +3pp vs v0.11).
-For context: MemPalace scores 96.6% (raw) using ChromaDB embeddings + spatial indexing. Hippo achieves 74.0% using BM25 keyword matching alone with zero runtime dependencies. Adding embeddings via `hippo embed` (optional `@xenova/transformers` peer dep) enables hybrid search and should close the gap.
+Hippo's strongest categories (single-session-assistant 100% R@5, knowledge-update 89.7%) are where keyword overlap between question and stored content is highest. The weakest (preference 20%) involves indirect references that need deeper semantic understanding.
-Hippo's strongest categories (knowledge-update 88.5%, single-session-assistant 94.6%) are the ones where keyword overlap between question and stored content is highest. The weakest (preference 26.7%) involves indirect references that need semantic understanding.
+> Note: v0.28 R@10 is 1.6pp below v0.11's BM25-only result. The earlier v0.27 benchmark showed an apparent 35pp regression — that was a methodology bug (budget-limited retrieval vs unlimited), fixed in v0.28 with the `minResults` option. See [`evals/README.md`](evals/README.md) for the full investigation and per-type breakdown.
 ```bash
 cd benchmarks/longmemeval
@@ -940,7 +960,7 @@ node run.mjs --adapter all
 Issues and PRs welcome. Before contributing, run `hippo status` in the repo root to see the project's own memory.
 The interesting problems:
-- **Improve LongMemEval score.** Current R@5 is 74.0% with BM25 only. Adding embeddings (`hippo embed`) and hybrid search should close the gap toward MemPalace's 96.6%.
+- **Improve LongMemEval score.** Current R@5 is 73.8% with hybrid BM25 + cosine (v0.28). Gap to MemPalace's 96.6% likely needs better chunking, reranking, or semantic compression — not just more of the same retrieval.
 - Better consolidation heuristics (LLM-powered merge vs current text overlap)
 - Web UI / dashboard for visualizing decay curves and memory health
 - Optimal decay parameter tuning from real usage data

package/dist/cli.js CHANGED Viewed

@@ -47,7 +47,8 @@ import { DAILY_TASK_NAME, buildDailyRunnerCommand, listRegisteredWorkspaces, reg
 import { importChatGPT, importClaude, importCursor, importGenericFile, importMarkdown, } from './importers.js';
 import { cmdCapture } from './capture.js';
 import { auditMemories } from './audit.js';
-import { runEval, bootstrapCorpus } from './eval.js';
+import { runEval, bootstrapCorpus, compareSummaries } from './eval.js';
+import { refineStore } from './refine-llm.js';
 import { wmPush, wmRead, wmClear, wmFlush } from './working-memory.js';
 // ---------------------------------------------------------------------------
 // Helpers
@@ -298,6 +299,9 @@ function autoInstallHooks(quiet) {
             if (result.installedSessionStart) {
                 console.log(`   Auto-installed hippo last-sleep SessionStart hook in ${hook} settings`);
             }
+            if (result.installedUserPromptSubmit) {
+                console.log(`   Auto-installed hippo pinned-inject UserPromptSubmit hook in ${hook} settings`);
+            }
             if (result.migratedFromStop) {
                 console.log(`   Migrated legacy Stop hook → SessionEnd (no longer runs every turn)`);
             }
@@ -445,23 +449,32 @@ async function cmdRecall(hippoRoot, query, flags) {
         ? parseFloat(String(flags['mmr-lambda']))
         : config.mmr.lambda;
     const mmrEnabled = !noMmr && config.mmr.enabled;
+    const localBump = flags['equal-sources']
+        ? 1.0
+        : flags['local-bump'] !== undefined
+            ? parseFloat(String(flags['local-bump']))
+            : config.search.localBump;
+    const minResults = flags['min-results'] !== undefined
+        ? parseInt(String(flags['min-results']), 10)
+        : undefined;
     let results;
     if (usePhysics && !hasGlobal) {
         results = await physicsSearch(query, localEntries, {
             budget,
             hippoRoot,
             physicsConfig: config.physics,
+            minResults,
         });
     }
     else if (hasGlobal) {
         // Use searchBothHybrid for merged results with embedding support
         results = await searchBothHybrid(query, hippoRoot, globalRoot, {
-            budget, mmr: mmrEnabled, mmrLambda,
+            budget, mmr: mmrEnabled, mmrLambda, localBump, minResults,
         });
     }
     else {
         results = await hybridSearch(query, localEntries, {
-            budget, hippoRoot, mmr: mmrEnabled, mmrLambda,
+            budget, hippoRoot, mmr: mmrEnabled, mmrLambda, minResults,
         });
     }
     if (limit < results.length) {
@@ -553,6 +566,11 @@ async function cmdExplain(hippoRoot, query, flags) {
         ? parseFloat(String(flags['mmr-lambda']))
         : config.mmr.lambda;
     const mmrEnabled = !noMmr && config.mmr.enabled;
+    const localBump = flags['equal-sources']
+        ? 1.0
+        : flags['local-bump'] !== undefined
+            ? parseFloat(String(flags['local-bump']))
+            : config.search.localBump;
     let results;
     let modeUsed;
     if (usePhysics && !hasGlobal) {
@@ -566,7 +584,7 @@ async function cmdExplain(hippoRoot, query, flags) {
     }
     else if (hasGlobal) {
         results = await searchBothHybrid(query, hippoRoot, globalRoot, {
-            budget, explain: true, mmr: mmrEnabled, mmrLambda,
+            budget, explain: true, mmr: mmrEnabled, mmrLambda, localBump,
         });
         modeUsed = 'searchBothHybrid';
     }
@@ -663,6 +681,7 @@ async function cmdEval(hippoRoot, corpusPath, flags) {
     const asJson = Boolean(flags['json']);
     const minMrr = flags['min-mrr'] !== undefined ? parseFloat(String(flags['min-mrr'])) : null;
     const showCases = Boolean(flags['show-cases']);
+    const comparePath = flags['compare'] ? String(flags['compare']) : null;
     const noMmr = Boolean(flags['no-mmr']);
     const mmrLambda = flags['mmr-lambda'] !== undefined ? parseFloat(String(flags['mmr-lambda'])) : undefined;
     const embeddingWeight = flags['embedding-weight'] !== undefined ? parseFloat(String(flags['embedding-weight'])) : undefined;
@@ -702,11 +721,19 @@ async function cmdEval(hippoRoot, corpusPath, flags) {
         console.error(`Failed to read corpus: ${err instanceof Error ? err.message : err}`);
         process.exit(1);
     }
+    const globalRoot = getGlobalRoot();
+    const localBump = flags['equal-sources']
+        ? 1.0
+        : flags['local-bump'] !== undefined
+            ? parseFloat(String(flags['local-bump']))
+            : loadConfig(hippoRoot).search.localBump;
     const summary = await runEval(cases, entries, {
         hippoRoot,
+        globalRoot,
         mmr: !noMmr,
         mmrLambda,
         embeddingWeight,
+        localBump,
     });
     if (asJson) {
         console.log(JSON.stringify(summary, null, 2));
@@ -752,6 +779,52 @@ async function cmdEval(hippoRoot, corpusPath, flags) {
         console.error(`MRR ${fmt(summary.meanMrr, 4)} below threshold ${minMrr}`);
         process.exit(1);
     }
+    if (comparePath) {
+        if (!fs.existsSync(comparePath)) {
+            console.error(`Baseline file not found: ${comparePath}`);
+            process.exit(1);
+        }
+        let baseline;
+        try {
+            baseline = JSON.parse(fs.readFileSync(comparePath, 'utf8'));
+        }
+        catch (err) {
+            console.error(`Failed to parse baseline: ${err instanceof Error ? err.message : err}`);
+            process.exit(1);
+        }
+        const cmp = compareSummaries(baseline, summary);
+        if (asJson) {
+            // The main JSON output already emitted; append comparison to stderr so
+            // both can be captured independently.
+            console.error(JSON.stringify({ compare: cmp }, null, 2));
+        }
+        else {
+            console.log();
+            console.log('Compare vs baseline:');
+            const sign = (d) => (d >= 0 ? '+' : '') + fmt(d, 4);
+            console.log(`  MRR:        ${sign(cmp.aggregate.mrr)}`);
+            console.log(`  Recall@5:   ${sign(cmp.aggregate.recallAt5)}`);
+            console.log(`  Recall@10:  ${sign(cmp.aggregate.recallAt10)}`);
+            console.log(`  NDCG@10:    ${sign(cmp.aggregate.ndcgAt10)}`);
+            console.log();
+            console.log(`  improved: ${cmp.improved.length}   regressed: ${cmp.regressed.length}   unchanged: ${cmp.unchanged}`);
+            if (cmp.onlyInBaseline.length > 0)
+                console.log(`  only in baseline: ${cmp.onlyInBaseline.length}`);
+            if (cmp.onlyInCurrent.length > 0)
+                console.log(`  only in current:  ${cmp.onlyInCurrent.length}`);
+            const showPerCase = cmp.improved.length + cmp.regressed.length > 0;
+            if (showPerCase) {
+                for (const d of cmp.improved.slice(0, 5)) {
+                    const delta = d.ndcgAfter - d.ndcgBefore;
+                    console.log(`  + [${d.id}] NDCG ${fmt(d.ndcgBefore, 2)} -> ${fmt(d.ndcgAfter, 2)} (+${fmt(delta, 3)})`);
+                }
+                for (const d of cmp.regressed.slice(0, 5)) {
+                    const delta = d.ndcgAfter - d.ndcgBefore;
+                    console.log(`  - [${d.id}] NDCG ${fmt(d.ndcgBefore, 2)} -> ${fmt(d.ndcgAfter, 2)} (${fmt(delta, 3)})`);
+                }
+            }
+        }
+    }
 }
 function cmdTrace(hippoRoot, id, flags) {
     requireInit(hippoRoot);
@@ -854,6 +927,34 @@ function cmdTrace(hippoRoot, id, flags) {
         }
     }
 }
+async function cmdRefine(hippoRoot, flags) {
+    requireInit(hippoRoot);
+    const apiKey = process.env.ANTHROPIC_API_KEY;
+    if (!apiKey) {
+        console.error('hippo refine needs ANTHROPIC_API_KEY in the environment.');
+        process.exit(1);
+    }
+    const dryRun = Boolean(flags['dry-run']);
+    const all = Boolean(flags['all']);
+    const limit = flags['limit'] !== undefined ? parseInt(String(flags['limit']), 10) : undefined;
+    const model = flags['model'] ? String(flags['model']) : undefined;
+    const asJson = Boolean(flags['json']);
+    const result = await refineStore(hippoRoot, { apiKey, model, limit, dryRun, all });
+    if (asJson) {
+        console.log(JSON.stringify(result, null, 2));
+        return;
+    }
+    console.log(`Scanned:  ${result.scanned} consolidated semantic memories`);
+    console.log(`Refined:  ${result.refined}${dryRun ? '  (dry-run — no writes)' : ''}`);
+    console.log(`Skipped:  ${result.skipped}`);
+    console.log(`Failed:   ${result.failed}`);
+    if (result.failed > 0) {
+        console.log('\nFailures:');
+        for (const d of result.details.filter((x) => x.status === 'failed').slice(0, 5)) {
+            console.log(`  ${d.id}: ${d.reason}`);
+        }
+    }
+}
 /**
  * Scan for Claude Code MEMORY.md files and import new entries into hippo.
  * Looks in ~/.claude/projects/<project>/memory/ for .md files with YAML frontmatter.
@@ -2004,7 +2105,41 @@ async function cmdContext(hippoRoot, args, flags) {
     const recentSessionEvents = activeSnapshot?.session_id
         ? listSessionEvents(hippoRoot, { session_id: activeSnapshot.session_id, limit: 5 })
         : [];
-    if (query === '*') {
+    // --pinned-only: restrict to pinned entries only. Used by the Claude Code
+    // UserPromptSubmit hook so invariants stay in context every turn.
+    const pinnedOnly = flags['pinned-only'] === true;
+    if (pinnedOnly) {
+        const pinnedCfg = loadConfig(hippoRoot);
+        if (!pinnedCfg.pinnedInject.enabled)
+            return; // user disabled via config
+        // Effective budget: explicit --budget wins over config.
+        const effBudget = flags['budget'] !== undefined ? budget : pinnedCfg.pinnedInject.budget;
+        const pinnedLocal = localEntries.filter((e) => e.pinned);
+        const pinnedGlobal = globalEntries.filter((e) => e.pinned);
+        if (pinnedLocal.length === 0 && pinnedGlobal.length === 0)
+            return; // zero output
+        const nowP = new Date();
+        const rankedPinned = [
+            ...pinnedLocal.map((e) => ({ entry: e, isGlobal: false })),
+            ...pinnedGlobal.map((e) => ({ entry: e, isGlobal: true })),
+        ]
+            .map(({ entry, isGlobal }) => ({
+            entry,
+            score: calculateStrength(entry, nowP) * (isGlobal ? 1 / 1.2 : 1),
+            tokens: estimateTokens(entry.content),
+            isGlobal,
+        }))
+            .sort((a, b) => b.score - a.score);
+        let usedP = 0;
+        for (const r of rankedPinned) {
+            if (usedP + r.tokens > effBudget)
+                continue;
+            selectedItems.push(r);
+            usedP += r.tokens;
+        }
+        totalTokens = usedP;
+    }
+    else if (query === '*') {
         // No query: return strongest memories by strength, up to budget
         const now = new Date();
         const localRanked = localEntries
@@ -2067,17 +2202,26 @@ async function cmdContext(hippoRoot, args, flags) {
     }
     if (selectedItems.length === 0 && !activeSnapshot && recentSessionEvents.length === 0)
         return;
-    // Mark retrieved and persist
-    const toUpdate = selectedItems.map((s) => s.entry);
-    const updatedEntries = markRetrieved(toUpdate);
-    const localIndex = loadIndex(hippoRoot);
-    for (const u of updatedEntries) {
-        const targetRoot = localIndex.entries[u.id] ? hippoRoot : (hasGlobal ? globalRoot : hippoRoot);
-        writeEntry(targetRoot, u);
+    // --pinned-only is called by the UserPromptSubmit hook every turn. Treat it
+    // as read-only so pinned memories don't inflate retrieval_count or extend
+    // their half_life by 2 days * turn-count over a long session.
+    let updatedEntries;
+    if (pinnedOnly) {
+        updatedEntries = selectedItems.map((s) => s.entry);
+    }
+    else {
+        // Mark retrieved and persist
+        const toUpdate = selectedItems.map((s) => s.entry);
+        updatedEntries = markRetrieved(toUpdate);
+        const localIndex = loadIndex(hippoRoot);
+        for (const u of updatedEntries) {
+            const targetRoot = localIndex.entries[u.id] ? hippoRoot : (hasGlobal ? globalRoot : hippoRoot);
+            writeEntry(targetRoot, u);
+        }
+        localIndex.last_retrieval_ids = updatedEntries.map((u) => u.id);
+        saveIndex(hippoRoot, localIndex);
+        updateStats(hippoRoot, { recalled: selectedItems.length });
     }
-    localIndex.last_retrieval_ids = updatedEntries.map((u) => u.id);
-    saveIndex(hippoRoot, localIndex);
-    updateStats(hippoRoot, { recalled: selectedItems.length });
     const format = String(flags['format'] ?? 'markdown');
     const framing = String(flags['framing'] ?? 'observe');
     if (format === 'json') {
@@ -2092,6 +2236,38 @@ async function cmdContext(hippoRoot, args, flags) {
         }));
         console.log(JSON.stringify({ query, activeSnapshot, recentSessionEvents, memories: output, tokens: totalTokens }));
     }
+    else if (format === 'additional-context') {
+        // Claude Code UserPromptSubmit hook JSON shape. Capture the markdown that
+        // printContextMarkdown would write and wrap it as `additionalContext`.
+        const lines = [];
+        const realLog = console.log;
+        console.log = (...parts) => { lines.push(parts.map(String).join(' ')); };
+        try {
+            if (activeSnapshot)
+                printActiveTaskSnapshot(activeSnapshot);
+            if (recentSessionEvents.length > 0)
+                printSessionEvents(recentSessionEvents);
+            printContextMarkdown(selectedItems.map((r) => ({
+                entry: updatedEntries.find((u) => u.id === r.entry.id) ?? r.entry,
+                score: r.score,
+                tokens: r.tokens,
+                isGlobal: r.isGlobal ?? false,
+            })), totalTokens, framing);
+        }
+        finally {
+            console.log = realLog;
+        }
+        const textBlock = lines.join('\n');
+        if (!textBlock.trim())
+            return;
+        const payload = {
+            hookSpecificOutput: {
+                hookEventName: 'UserPromptSubmit',
+                additionalContext: textBlock,
+            },
+        };
+        process.stdout.write(JSON.stringify(payload));
+    }
     else {
         if (activeSnapshot) {
             printActiveTaskSnapshot(activeSnapshot);
@@ -2685,6 +2861,9 @@ function cmdHook(args, flags) {
             if (result.installedSessionStart) {
                 console.log(`Installed hippo last-sleep SessionStart hook in ${result.target} settings`);
             }
+            if (result.installedUserPromptSubmit) {
+                console.log(`Installed hippo pinned-inject UserPromptSubmit hook in ${result.target} settings`);
+            }
             if (result.migratedFromStop) {
                 console.log(`Migrated legacy Stop hook → SessionEnd (was running every turn; now fires once on session exit)`);
             }
@@ -2775,6 +2954,8 @@ function cmdSetup(flags) {
             bits.push('SessionEnd (session-end)');
         if (result.installedSessionStart)
             bits.push('SessionStart');
+        if (result.installedUserPromptSubmit)
+            bits.push('UserPromptSubmit (pinned-inject)');
         if (result.migratedFromStop)
             bits.push('migrated legacy Stop');
         if (result.migratedSplitSessionEnd)
@@ -2870,7 +3051,8 @@ function installClaudeCodeSessionEndHook() {
     const result = installJsonHooks('claude-code');
     return {
         installed: result.installedSessionEnd ||
-            result.installedSessionStart,
+            result.installedSessionStart ||
+            result.installedUserPromptSubmit,
         migratedFromStop: result.migratedFromStop,
     };
 }
@@ -2968,6 +3150,7 @@ Commands:
     --global               Store in global store ($HIPPO_HOME or ~/.hippo/)
   recall <query>           Search and retrieve memories (local + global)
     --budget <n>           Token budget (default: 4000)
+    --min-results <n>      Minimum results regardless of budget (default: 1)
     --json                 Output as JSON
     --why                  Show match reasons and source annotations
     --no-mmr               Disable MMR diversity re-ranking
@@ -2982,20 +3165,31 @@ Commands:
   trace <id>               Memory dossier: content, decay trajectory, retrievals,
                            outcomes, consolidation parents, open conflicts
     --json                 Output as JSON
+  refine                   Rewrite consolidated semantic memories with Claude
+    --limit <n>            Cap the number of memories processed this run
+    --all                  Ignore \`llm-refined\` tag and re-refine everything
+    --dry-run              Call the API but don't write results back
+    --model <id>           Override the default model (claude-sonnet-4-6)
+    --json                 Output summary as JSON
+    (requires ANTHROPIC_API_KEY in env)
   eval [<corpus.json>]     Measure recall quality against a test corpus
     --bootstrap            Generate a synthetic corpus from current memories
     --out <path>           With --bootstrap, write to file instead of stdout
     --max-cases <n>        With --bootstrap, cap case count (default: 50)
     --show-cases           Print per-case details (query, R@10, missed, top 3)
+    --compare <path>       JSON from a prior \`eval --json\` run; print deltas
     --no-mmr               Disable MMR for this eval run
     --mmr-lambda <f>       Override MMR lambda for this run
     --embedding-weight <f> Override cosine weight (default: 0.6)
+    --local-bump <f>       Local-over-global priority multiplier (default: 1.2)
+    --equal-sources        Shortcut for --local-bump 1.0
     --min-mrr <f>          Exit non-zero if mean MRR falls below this
     --json                 Output full summary as JSON
   context                  Smart context injection for AI agents
     --auto                 Auto-detect task from git state
     --budget <n>           Token budget (default: 1500)
-    --format <fmt>         Output format: markdown (default) or json
+    --pinned-only          Only inject pinned memories (used by UserPromptSubmit hook)
+    --format <fmt>         Output format: markdown (default), json, or additional-context (Claude Code hook JSON)
     --framing <mode>       Framing: observe (default), suggest, assert
   sleep                    Run consolidation pass (auto-learns + dedup + auto-shares)
     --dry-run              Preview without writing
@@ -3216,6 +3410,9 @@ async function main() {
             cmdTrace(hippoRoot, id, flags);
             break;
         }
+        case 'refine':
+            await cmdRefine(hippoRoot, flags);
+            break;
         case 'sleep':
             cmdSleep(hippoRoot, flags);
             break;