npm - hippo-memory - Versions diffs - 0.10.0 → 0.11.1 - Mend

hippo-memory 0.10.0 → 0.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

package/README.md +97 -15
package/dist/cli.js +93 -12
package/dist/cli.js.map +1 -1
package/dist/config.d.ts +2 -0
package/dist/config.d.ts.map +1 -1
package/dist/config.js +3 -0
package/dist/config.js.map +1 -1
package/dist/consolidate.d.ts +1 -0
package/dist/consolidate.d.ts.map +1 -1
package/dist/consolidate.js +61 -2
package/dist/consolidate.js.map +1 -1
package/dist/db.d.ts.map +1 -1
package/dist/db.js +19 -1
package/dist/db.js.map +1 -1
package/dist/embeddings.d.ts +1 -0
package/dist/embeddings.d.ts.map +1 -1
package/dist/embeddings.js +31 -1
package/dist/embeddings.js.map +1 -1
package/dist/index.d.ts +1 -1
package/dist/index.d.ts.map +1 -1
package/dist/index.js +1 -1
package/dist/index.js.map +1 -1
package/dist/mcp/server.js +9 -3
package/dist/mcp/server.js.map +1 -1
package/dist/memory.d.ts +26 -1
package/dist/memory.d.ts.map +1 -1
package/dist/memory.js +44 -7
package/dist/memory.js.map +1 -1
package/dist/physics-config.d.ts +37 -0
package/dist/physics-config.d.ts.map +1 -0
package/dist/physics-config.js +26 -0
package/dist/physics-config.js.map +1 -0
package/dist/physics-state.d.ts +43 -0
package/dist/physics-state.d.ts.map +1 -0
package/dist/physics-state.js +158 -0
package/dist/physics-state.js.map +1 -0
package/dist/physics.d.ts +115 -0
package/dist/physics.d.ts.map +1 -0
package/dist/physics.js +354 -0
package/dist/physics.js.map +1 -0
package/dist/search.d.ts +13 -0
package/dist/search.d.ts.map +1 -1
package/dist/search.js +105 -0
package/dist/search.js.map +1 -1
package/dist/store.d.ts.map +1 -1
package/dist/store.js +14 -5
package/dist/store.js.map +1 -1
package/extensions/openclaw-plugin/README.md +11 -1
package/extensions/openclaw-plugin/index.ts +62 -1
package/extensions/openclaw-plugin/openclaw.plugin.json +1 -1
package/extensions/openclaw-plugin/package.json +1 -1
package/openclaw.plugin.json +1 -1
package/package.json +6 -6

package/README.md CHANGED Viewed

@@ -43,10 +43,22 @@ hippo recall "data pipeline issues" --budget 2000
 That's it. You have a memory system.
+### What's new in v0.11.1
+- **OpenClaw error capture filtering.** The `autoLearn` hook now applies three filters before storing tool errors: a noise pattern filter for known transient errors, per-session rate limiting (max 5), and per-session deduplication. Prevents memory pollution from infrastructure noise.
+### What's new in v0.11.0
+- **Reward-proportional decay.** Outcome feedback now modulates decay rate continuously instead of fixed half-life deltas. Memories with consistent positive outcomes decay up to 1.5x slower; consistent negatives decay up to 2x faster. Mixed outcomes converge toward neutral. Inspired by R-STDP in spiking neural networks. `hippo inspect` now shows cumulative outcome counts and the computed reward factor.
+- **Public benchmarks.** Two benchmarks in `benchmarks/`: a [Sequential Learning Benchmark](benchmarks/sequential-learning/) (50 tasks, 10 traps, measures agent improvement over time) and a [LongMemEval integration](benchmarks/longmemeval/) (industry-standard 500-question retrieval benchmark, R@5=74.0% with BM25 only). The sequential learning benchmark is unique: no other public benchmark tests whether memory systems produce learning curves.
 ### What's new in v0.10.0
 - **Active invalidation.** `hippo learn --git` detects migration and breaking-change commits and actively weakens memories referencing the old pattern. Manual invalidation via `hippo invalidate "REST API" --reason "migrated to GraphQL"`.
 - **Architectural decisions.** `hippo decide` stores one-off decisions with 90-day half-life and verified confidence. Supports `--context` for reasoning and `--supersedes` to chain decisions when the architecture evolves.
+- **Path-based memory triggers.** Memories auto-tagged with `path:<segment>` from your working directory. Recall boosts memories from the same location (up to 1.3x). Working in `src/api/`? API-related memories surface first.
+- **OpenCode integration.** `hippo hook install opencode` patches AGENTS.md. Auto-detected during `hippo init`. Integration guide with MCP config and skill for progressive discovery.
+- **`hippo export`** outputs all memories as JSON or markdown.
 - **Decision recall boost.** 1.2x scoring multiplier for decision-tagged memories so they surface despite low retrieval frequency.
 ### What's new in v0.9.1
@@ -421,14 +433,15 @@ hippo recall "why is the gold model broken"
 hippo outcome --good
 # Applied positive outcome to 3 memories
-# half_life +5d on each
+# reward factor increases, decay slows
 hippo outcome --bad
 # Applied negative outcome to 3 memories
-# half_life -3d on each
-# irrelevant memories decay faster
+# reward factor decreases, decay accelerates
 ```
+Outcomes are cumulative. A memory with 5 positive outcomes and 0 negative has a reward factor of ~1.42, making its effective half-life 42% longer. A memory with 0 positive and 3 negative has a factor of ~0.63, decaying nearly twice as fast. Mixed outcomes converge toward neutral (1.0).
 ---
 ### Token budgets
@@ -687,34 +700,100 @@ The 7 mechanisms in full: [PLAN.md#core-principles](PLAN.md#core-principles)
 For how these mechanisms connect to LLM training, continual learning, and open research problems: **[RESEARCH.md](RESEARCH.md)**
-**Related work:** [HippoRAG](https://arxiv.org/abs/2405.14831) (Gutierrez et al., 2024) applies hippocampal indexing to RAG via knowledge graphs. Complementary approach — HippoRAG optimizes retrieval quality, Hippo optimizes memory lifecycle. Same brain region, different mechanisms.
+**Why does reward modulate decay?** In spiking neural networks, reward-modulated STDP strengthens synapses that contribute to positive outcomes and weakens those that don't. Hippo's reward-proportional decay (v0.11.0) implements this: memories with consistent positive outcomes decay slower, negatives decay faster, with no fixed deltas. Inspired by [MH-FLOCKE](https://github.com/MarcHesse/mhflocke)'s R-STDP architecture for quadruped locomotion, where the same mechanism produces stable learning with 11.6x lower variance than PPO.
+**Prior art in agent memory simulation.** The idea that human-like memory produces human-like behavior as an emergent property was explored in IEEE research from 2010-2011 ([5952114](https://ieeexplore.ieee.org/document/5952114), [5548405](https://ieeexplore.ieee.org/document/5548405), [5953964](https://ieeexplore.ieee.org/document/5953964)). Walking between rooms and forgetting why you went there doesn't need direct simulation; it emerges naturally from a memory system with capacity limits and decay. Hippo's design follows the same principle: implement the mechanisms, and the behavior follows.
+**Related work:** [HippoRAG](https://arxiv.org/abs/2405.14831) (Gutierrez et al., 2024) applies hippocampal indexing to RAG via knowledge graphs. [MemPalace](https://github.com/milla-jovovich/mempalace) (Sigman & Jovovich, 2026) organizes memory spatially (wings/halls/rooms) with AAAK compression, achieving 100% on [LongMemEval](https://arxiv.org/abs/2410.10813). [MH-FLOCKE](https://github.com/MarcHesse/mhflocke) (Hesse, 2026) uses spiking neurons with R-STDP for embodied cognition. Each system tackles a different facet: HippoRAG optimizes retrieval quality, MemPalace optimizes retrieval organization, MH-FLOCKE optimizes embodied learning, and Hippo optimizes memory lifecycle.
 ---
 ## Comparison
-| Feature | Hippo | Mem0 | Basic Memory | Claude-Mem |
-|---------|-------|------|-------------|-----------|
+| Feature | Hippo | MemPalace | Mem0 | Basic Memory |
+|---------|-------|-----------|------|-------------|
 | Decay by default | Yes | No | No | No |
 | Retrieval strengthening | Yes | No | No | No |
-| Hybrid search (BM25 + embeddings) | Yes | Embeddings only | No | No |
+| Reward-proportional decay | Yes | No | No | No |
+| Hybrid search (BM25 + embeddings) | Yes | Embeddings + spatial | Embeddings only | No |
 | Schema acceleration | Yes | No | No | No |
 | Conflict detection + resolution | Yes | No | No | No |
 | Multi-agent shared memory | Yes | No | No | No |
 | Transfer scoring | Yes | No | No | No |
 | Outcome tracking | Yes | No | No | No |
 | Confidence tiers | Yes | No | No | No |
+| Spatial organization | No | Yes (wings/halls/rooms) | No | No |
+| Lossless compression | No | Yes (AAAK, 30x) | No | No |
 | Cross-tool import | Yes | No | No | No |
-| Conversation capture | Yes | No | No | No |
 | Auto-hook install | Yes | No | No | No |
-| MCP server | Yes | No | No | No |
-| Native plugins | OpenClaw + Claude Code | No | No | No |
-| Multi-repo git learn | Yes | No | No | No |
-| Zero dependencies | Yes | No | No | No |
-| Git-friendly | Yes | No | Yes | No |
-| Framework agnostic | Yes | Partial | Yes | No |
+| MCP server | Yes | Yes | No | No |
+| Zero dependencies | Yes | No (ChromaDB) | No | No |
+| LongMemEval R@5 (retrieval) | 74.0% (BM25 only) | 96.6% (raw) / 100% (reranked) | ~49-85% | N/A |
+| Git-friendly | Yes | No | No | Yes |
+| Framework agnostic | Yes | Yes | Partial | Yes |
+Different tools answer different questions. Mem0 and Basic Memory implement "save everything, search later." MemPalace implements "store everything, organize spatially for retrieval." Hippo implements "forget by default, earn persistence through use." These are complementary approaches: MemPalace's retrieval precision + Hippo's lifecycle management would be stronger than either alone.
+---
+## Benchmarks
+Two benchmarks testing two different things. Full details in [`benchmarks/`](benchmarks/).
+### LongMemEval (retrieval accuracy)
+[LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) is the industry-standard benchmark: 500 questions across 5 memory abilities, embedded in 115k+ token chat histories.
+**Hippo v0.11.0 results (BM25 only, zero dependencies):**
-Mem0, Basic Memory, and Claude-Mem all implement "save everything, search later." Hippo implements all 7 hippocampal mechanisms: two-speed storage, decay, retrieval strengthening, schema acceleration, conflict detection, multi-agent transfer, and explicit working memory. It's the only tool that models what memories are worth keeping.
+| Metric | Score |
+|--------|-------|
+| Recall@1 | 50.4% |
+| Recall@3 | 66.6% |
+| Recall@5 | 74.0% |
+| Recall@10 | 82.6% |
+| Answer in content@5 | 46.6% |
+| Question Type | Count | R@5 |
+|---------------|-------|-----|
+| single-session-assistant | 56 | 94.6% |
+| knowledge-update | 78 | 88.5% |
+| temporal-reasoning | 133 | 73.7% |
+| multi-session | 133 | 72.2% |
+| single-session-user | 70 | 65.7% |
+| single-session-preference | 30 | 26.7% |
+For context: MemPalace scores 96.6% (raw) using ChromaDB embeddings + spatial indexing. Hippo achieves 74.0% using BM25 keyword matching alone with zero runtime dependencies. Adding embeddings via `hippo embed` (optional `@xenova/transformers` peer dep) enables hybrid search and should close the gap.
+Hippo's strongest categories (knowledge-update 88.5%, single-session-assistant 94.6%) are the ones where keyword overlap between question and stored content is highest. The weakest (preference 26.7%) involves indirect references that need semantic understanding.
+```bash
+cd benchmarks/longmemeval
+python ingest_direct.py --data data/longmemeval_oracle.json --store-dir ./store
+python retrieve_fast.py --data data/longmemeval_oracle.json --store-dir ./store --output results/retrieval.jsonl
+python evaluate_retrieval.py --retrieval results/retrieval.jsonl --data data/longmemeval_oracle.json
+```
+### Sequential Learning Benchmark (agent improvement over time)
+No other public benchmark tests whether memory systems produce learning curves. LongMemEval tests retrieval on a fixed corpus. This benchmark tests whether an agent with memory *performs better on task 40 than task 5*.
+50 tasks, 10 trap categories, each appearing 2-3 times across the sequence.
+**Hippo v0.11.0 results:**
+| Condition | Overall | Early | Mid | Late | Learns? |
+|-----------|---------|-------|-----|------|---------|
+| No memory | 100% | 100% | 100% | 100% | No |
+| Static memory | 20% | 33% | 11% | 14% | No |
+| Hippo | 40% | 78% | 22% | 14% | Yes |
+The hippo agent's trap-hit rate drops from 78% to 14% as it accumulates error memories with 2x half-life. Static pre-loaded memory helps from the start but doesn't improve. Any memory system can run this benchmark by implementing the [adapter interface](benchmarks/sequential-learning/adapters/interface.mjs).
+```bash
+cd benchmarks/sequential-learning
+node run.mjs --adapter all
+```
 ---
@@ -723,10 +802,13 @@ Mem0, Basic Memory, and Claude-Mem all implement "save everything, search later.
 Issues and PRs welcome. Before contributing, run `hippo status` in the repo root to see the project's own memory.
 The interesting problems:
+- **Improve LongMemEval score.** Current R@5 is 74.0% with BM25 only. Adding embeddings (`hippo embed`) and hybrid search should close the gap toward MemPalace's 96.6%.
 - Better consolidation heuristics (LLM-powered merge vs current text overlap)
 - Web UI / dashboard for visualizing decay curves and memory health
 - Optimal decay parameter tuning from real usage data
 - Cross-agent transfer learning evaluation
+- **MemPalace-style spatial organization.** Could spatial structure (wings/halls/rooms) improve hippo's semantic layer?
+- **AAAK-style compression for semantic memories.** Lossless token compression for context injection.
 ## License

package/dist/cli.js CHANGED Viewed

@@ -28,11 +28,15 @@ import * as path from 'path';
 import * as fs from 'fs';
 import * as os from 'os';
 import { execSync } from 'child_process';
-import { createMemory, calculateStrength, deriveHalfLife, resolveConfidence, applyOutcome, computeSchemaFit, Layer, DECISION_HALF_LIFE_DAYS, } from './memory.js';
+import { createMemory, calculateStrength, calculateRewardFactor, deriveHalfLife, resolveConfidence, applyOutcome, computeSchemaFit, Layer, DECISION_HALF_LIFE_DAYS, } from './memory.js';
 import { getHippoRoot, isInitialized, initStore, writeEntry, readEntry, deleteEntry, loadAllEntries, loadSearchEntries, loadIndex, saveIndex, loadStats, updateStats, saveActiveTaskSnapshot, loadActiveTaskSnapshot, clearActiveTaskSnapshot, appendSessionEvent, listSessionEvents, listMemoryConflicts, resolveConflict, saveSessionHandoff, loadLatestHandoff, loadHandoffById, } from './store.js';
-import { markRetrieved, estimateTokens, hybridSearch, explainMatch } from './search.js';
+import { markRetrieved, estimateTokens, hybridSearch, physicsSearch, explainMatch } from './search.js';
 import { consolidate } from './consolidate.js';
 import { isEmbeddingAvailable, embedAll, embedMemory, loadEmbeddingIndex, } from './embeddings.js';
+import { loadPhysicsState, resetAllPhysicsState } from './physics-state.js';
+import { computeSystemEnergy, vecNorm } from './physics.js';
+import { loadConfig } from './config.js';
+import { openHippoDb, closeHippoDb } from './db.js';
 import { captureError, extractLessons, deduplicateLesson, runWatched, fetchGitLog, isGitRepo, } from './autolearn.js';
 import { extractInvalidationTarget, invalidateMatching } from './invalidation.js';
 import { extractPathTags } from './path-context.js';
@@ -297,12 +301,26 @@ async function cmdRecall(hippoRoot, query, flags) {
     const limit = parseLimitFlag(flags['limit']);
     const asJson = Boolean(flags['json']);
     const showWhy = Boolean(flags['why']);
+    const forcePhysics = Boolean(flags['physics']);
+    const forceClassic = Boolean(flags['classic']);
     const globalRoot = getGlobalRoot();
     const localEntries = loadSearchEntries(hippoRoot, query);
     const globalEntries = isInitialized(globalRoot) ? loadSearchEntries(globalRoot, query) : [];
     const hasGlobal = globalEntries.length > 0;
+    // Determine search mode: --physics forces physics, --classic forces BM25+cosine,
+    // default uses physics if config.physics.enabled is not false
+    const config = loadConfig(hippoRoot);
+    const usePhysics = forcePhysics
+        || (!forceClassic && config.physics.enabled !== false);
     let results;
-    if (hasGlobal) {
+    if (usePhysics && !hasGlobal) {
+        results = await physicsSearch(query, localEntries, {
+            budget,
+            hippoRoot,
+            physicsConfig: config.physics,
+        });
+    }
+    else if (hasGlobal) {
         // Use searchBothHybrid for merged results with embedding support
         results = await searchBothHybrid(query, hippoRoot, globalRoot, { budget });
     }
@@ -464,8 +482,42 @@ function cmdStatus(hippoRoot) {
     console.log(`Embeddings:        ${embAvail ? 'available' : 'not installed (BM25 only)'}`);
     if (embAvail) {
         const embIndex = loadEmbeddingIndex(hippoRoot);
-        const embCount = Object.keys(embIndex).length;
-        console.log(`Embedded:          ${embCount}/${entries.length} memories`);
+        const activeIds = new Set(entries.map((e) => e.id));
+        const activeEmbedded = Object.keys(embIndex).filter((id) => activeIds.has(id)).length;
+        const orphaned = Object.keys(embIndex).length - activeEmbedded;
+        let line = `Embedded:          ${activeEmbedded}/${entries.length} memories`;
+        if (orphaned > 0)
+            line += ` (${orphaned} orphaned — run \`hippo embed\` to prune)`;
+        console.log(line);
+    }
+    // Physics status
+    try {
+        const db = openHippoDb(hippoRoot);
+        try {
+            const physicsMap = loadPhysicsState(db);
+            if (physicsMap.size > 0) {
+                const particles = Array.from(physicsMap.values());
+                const physConfig = loadConfig(hippoRoot);
+                const energy = computeSystemEnergy(particles, physConfig.physics.G_memory);
+                let sumVelMag = 0;
+                let maxVelMag = 0;
+                for (const p of particles) {
+                    const mag = vecNorm(p.velocity);
+                    sumVelMag += mag;
+                    if (mag > maxVelMag)
+                        maxVelMag = mag;
+                }
+                const avgVelMag = sumVelMag / particles.length;
+                console.log('');
+                console.log(`Physics: ${particles.length} particles, energy: ${fmt(energy.total, 4)} (KE: ${fmt(energy.kinetic, 4)}, PE: ${fmt(energy.potential, 4)}), avg vel: ${fmt(avgVelMag, 4)}`);
+            }
+        }
+        finally {
+            closeHippoDb(db);
+        }
+    }
+    catch {
+        // Physics table may not exist yet — degrade gracefully
     }
 }
 function cmdOutcome(hippoRoot, flags) {
@@ -533,7 +585,13 @@ function cmdInspect(hippoRoot, id) {
     console.log(`Schema fit:       ${entry.schema_fit}`);
     console.log(`Pinned:           ${entry.pinned}`);
     console.log(`Tags:             ${entry.tags.join(', ') || 'none'}`);
-    console.log(`Outcome score:    ${entry.outcome_score ?? 'none'}`);
+    const rewardFactor = calculateRewardFactor(entry);
+    const pos = entry.outcome_positive ?? 0;
+    const neg = entry.outcome_negative ?? 0;
+    const outcomeLabel = pos === 0 && neg === 0
+        ? 'none'
+        : `+${pos} / -${neg} (reward factor: ${fmt(rewardFactor)})`;
+    console.log(`Outcomes:         ${outcomeLabel}`);
     if (entry.conflicts_with.length > 0) {
         console.log(`Conflicts with:   ${entry.conflicts_with.join(', ')}`);
     }
@@ -1021,7 +1079,12 @@ async function cmdContext(hippoRoot, args, flags) {
             }));
         }
         else {
-            results = (await hybridSearch(query, localEntries, { budget, hippoRoot })).map((r) => ({
+            const ctxConfig = loadConfig(hippoRoot);
+            const usePhysicsCtx = ctxConfig.physics?.enabled !== false;
+            const ctxResults = usePhysicsCtx
+                ? await physicsSearch(query, localEntries, { budget, hippoRoot, physicsConfig: ctxConfig.physics })
+                : await hybridSearch(query, localEntries, { budget, hippoRoot });
+            results = ctxResults.map((r) => ({
                 entry: r.entry,
                 score: r.score,
                 tokens: r.tokens,
@@ -1151,11 +1214,29 @@ async function cmdEmbed(hippoRoot, flags) {
         console.log('  npm install @xenova/transformers');
         return;
     }
+    if (flags['reset-physics']) {
+        const entries = loadAllEntries(hippoRoot);
+        const embIndex = loadEmbeddingIndex(hippoRoot);
+        const db = openHippoDb(hippoRoot);
+        try {
+            const count = resetAllPhysicsState(db, entries, embIndex);
+            console.log(`Reset physics state: ${count} particles re-initialized from embeddings.`);
+        }
+        finally {
+            closeHippoDb(db);
+        }
+        return;
+    }
     if (flags['status']) {
         const entries = loadAllEntries(hippoRoot);
         const embIndex = loadEmbeddingIndex(hippoRoot);
-        const embCount = Object.keys(embIndex).length;
-        console.log(`Embedding status: ${embCount}/${entries.length} memories embedded`);
+        const activeIds = new Set(entries.map((e) => e.id));
+        const activeEmbedded = Object.keys(embIndex).filter((id) => activeIds.has(id)).length;
+        const orphaned = Object.keys(embIndex).length - activeEmbedded;
+        console.log(`Embedding status: ${activeEmbedded}/${entries.length} memories embedded`);
+        if (orphaned > 0) {
+            console.log(`  ${orphaned} orphaned embeddings (run \`hippo embed\` to prune)`);
+        }
         const missing = entries.filter((e) => !embIndex[e.id]);
         if (missing.length > 0) {
             console.log(`  ${missing.length} memories need embedding (run \`hippo embed\` to embed them)`);
@@ -1164,9 +1245,9 @@ async function cmdEmbed(hippoRoot, flags) {
     }
     console.log('Embedding all memories (this may take a moment on first run to download model)...');
     const count = await embedAll(hippoRoot);
-    const entries = loadAllEntries(hippoRoot);
-    const embIndex = loadEmbeddingIndex(hippoRoot);
-    console.log(`Done. ${count} new embeddings created. ${Object.keys(embIndex).length}/${entries.length} total.`);
+    const entriesAfter = loadAllEntries(hippoRoot);
+    const embIndexAfter = loadEmbeddingIndex(hippoRoot);
+    console.log(`Done. ${count} new embeddings created. ${Object.keys(embIndexAfter).length}/${entriesAfter.length} total.`);
 }
 // ---------------------------------------------------------------------------
 // Watch command