hippo-memory 0.9.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
  [![license](https://img.shields.io/badge/license-MIT-blue)](./LICENSE)
7
7
 
8
8
  ```
9
- Works with: Claude Code, Codex, Cursor, OpenClaw, any CLI agent
9
+ Works with: Claude Code, Codex, Cursor, OpenClaw, OpenCode, any CLI agent
10
10
  Imports from: ChatGPT, Claude (CLAUDE.md), Cursor (.cursorrules), any markdown
11
11
  Storage: SQLite backbone + markdown/YAML mirrors. Git-trackable and human-readable.
12
12
  Dependencies: Zero runtime deps. Requires Node.js 22.5+. Optional embeddings via @xenova/transformers.
@@ -43,6 +43,24 @@ hippo recall "data pipeline issues" --budget 2000
43
43
 
44
44
  That's it. You have a memory system.
45
45
 
46
+ ### What's new in v0.11.0
47
+
48
+ - **Reward-proportional decay.** Outcome feedback now modulates decay rate continuously instead of fixed half-life deltas. Memories with consistent positive outcomes decay up to 1.5x slower; consistent negatives decay up to 2x faster. Mixed outcomes converge toward neutral. Inspired by R-STDP in spiking neural networks. `hippo inspect` now shows cumulative outcome counts and the computed reward factor.
49
+ - **Public benchmarks.** Two benchmarks in `benchmarks/`: a [Sequential Learning Benchmark](benchmarks/sequential-learning/) (50 tasks, 10 traps, measures agent improvement over time) and a [LongMemEval integration](benchmarks/longmemeval/) (industry-standard 500-question retrieval benchmark, R@5=74.0% with BM25 only). The sequential learning benchmark is unique: no other public benchmark tests whether memory systems produce learning curves.
50
+
51
+ ### What's new in v0.10.0
52
+
53
+ - **Active invalidation.** `hippo learn --git` detects migration and breaking-change commits and actively weakens memories referencing the old pattern. Manual invalidation via `hippo invalidate "REST API" --reason "migrated to GraphQL"`.
54
+ - **Architectural decisions.** `hippo decide` stores one-off decisions with 90-day half-life and verified confidence. Supports `--context` for reasoning and `--supersedes` to chain decisions when the architecture evolves.
55
+ - **Path-based memory triggers.** Memories auto-tagged with `path:<segment>` from your working directory. Recall boosts memories from the same location (up to 1.3x). Working in `src/api/`? API-related memories surface first.
56
+ - **OpenCode integration.** `hippo hook install opencode` patches AGENTS.md. Auto-detected during `hippo init`. Integration guide with MCP config and skill for progressive discovery.
57
+ - **`hippo export`** outputs all memories as JSON or markdown.
58
+ - **Decision recall boost.** 1.2x scoring multiplier for decision-tagged memories so they surface despite low retrieval frequency.
59
+
60
+ ### What's new in v0.9.1
61
+
62
+ - **Auto-sleep on session exit.** `hippo hook install claude-code` now installs a Stop hook in `~/.claude/settings.json` so `hippo sleep` runs automatically when Claude Code exits. `hippo init` does this too when Claude Code is detected. No cron needed, no manual sleep.
63
+
46
64
  ### What's new in v0.9.0
47
65
 
48
66
  - **Working memory layer** (`hippo wm push/read/clear/flush`). Bounded buffer (max 20 per scope) with importance-based eviction. Current-state notes live separately from long-term memory.
@@ -76,7 +94,7 @@ hippo init
76
94
  # Auto-installed claude-code hook in CLAUDE.md
77
95
  ```
78
96
 
79
- If you have a `CLAUDE.md`, it patches it. `AGENTS.md` for Codex/OpenClaw. `.cursorrules` for Cursor. No manual `hook install` needed. Your agent starts using Hippo on its next session.
97
+ If you have a `CLAUDE.md`, it patches it. `AGENTS.md` for Codex/OpenClaw/OpenCode. `.cursorrules` for Cursor. No manual `hook install` needed. Your agent starts using Hippo on its next session.
80
98
 
81
99
  It also sets up a daily cron job (6:15am) that runs `hippo learn --git` and `hippo sleep` automatically. Memories get captured from your commits and consolidated every day without you thinking about it.
82
100
 
@@ -274,6 +292,44 @@ hippo recall "cache issues" # again next week
274
292
 
275
293
  ---
276
294
 
295
+ ### Active invalidation
296
+
297
+ When you migrate from one tool to another, old memories about the replaced tool should die immediately. Hippo detects migration and breaking-change commits during `hippo learn --git` and actively weakens matching memories.
298
+
299
+ ```bash
300
+ hippo learn --git
301
+ # feat: migrate from webpack to vite
302
+ # Invalidated 3 memories referencing "webpack"
303
+ # Learned: migrate from webpack to vite
304
+ ```
305
+
306
+ You can also invalidate manually:
307
+
308
+ ```bash
309
+ hippo invalidate "REST API" --reason "migrated to GraphQL"
310
+ # Invalidated 5 memories referencing "REST API".
311
+ ```
312
+
313
+ ---
314
+
315
+ ### Architectural decisions
316
+
317
+ One-off decisions don't repeat, so they can't earn their keep through retrieval alone. `hippo decide` stores them with a 90-day half-life and verified confidence so they survive long enough to matter.
318
+
319
+ ```bash
320
+ hippo decide "Use PostgreSQL for all new services" --context "JSONB support"
321
+ # Decision recorded: mem_a1b2c3
322
+
323
+ # Later, when the decision changes:
324
+ hippo decide "Use CockroachDB for global services" \
325
+ --context "Need multi-region" \
326
+ --supersedes mem_a1b2c3
327
+ # Superseded mem_a1b2c3 (half-life halved, marked stale)
328
+ # Decision recorded: mem_d4e5f6
329
+ ```
330
+
331
+ ---
332
+
277
333
  ### Error memories stick
278
334
 
279
335
  Tag a memory as an error and it gets 2x the half-life automatically.
@@ -373,14 +429,15 @@ hippo recall "why is the gold model broken"
373
429
 
374
430
  hippo outcome --good
375
431
  # Applied positive outcome to 3 memories
376
- # half_life +5d on each
432
+ # reward factor increases, decay slows
377
433
 
378
434
  hippo outcome --bad
379
435
  # Applied negative outcome to 3 memories
380
- # half_life -3d on each
381
- # irrelevant memories decay faster
436
+ # reward factor decreases, decay accelerates
382
437
  ```
383
438
 
439
+ Outcomes are cumulative. A memory with 5 positive outcomes and 0 negative has a reward factor of ~1.42, making its effective half-life 42% longer. A memory with 0 positive and 3 negative has a factor of ~0.63, decaying nearly twice as fast. Mixed outcomes converge toward neutral (1.0).
440
+
384
441
  ---
385
442
 
386
443
  ### Token budgets
@@ -502,8 +559,13 @@ hippo watch "npm run build"
502
559
  | `hippo share --auto --dry-run` | Preview what would be shared |
503
560
  | `hippo peers` | List projects contributing to global store |
504
561
  | `hippo sync` | Pull global memories into local project |
562
+ | `hippo invalidate "<pattern>"` | Actively weaken memories matching an old pattern |
563
+ | `hippo invalidate "<pattern>" --reason "<why>"` | Include what replaced it |
564
+ | `hippo decide "<decision>"` | Record architectural decision (90-day half-life) |
565
+ | `hippo decide "<decision>" --context "<why>"` | Include reasoning |
566
+ | `hippo decide "<decision>" --supersedes <id>` | Supersede a previous decision |
505
567
  | `hippo hook list` | Show available framework hooks |
506
- | `hippo hook install <target>` | Install hook (claude-code, codex, cursor, openclaw) |
568
+ | `hippo hook install <target>` | Install hook (claude-code also adds Stop hook for auto-sleep) |
507
569
  | `hippo hook uninstall <target>` | Remove hook |
508
570
  | `hippo handoff create --summary "..."` | Create a session handoff |
509
571
  | `hippo handoff latest` | Show the most recent handoff |
@@ -529,10 +591,11 @@ hippo watch "npm run build"
529
591
 
530
592
  | Framework | Detected by | Patches |
531
593
  |-----------|------------|---------|
532
- | Claude Code | `CLAUDE.md` or `.claude/settings.json` | `CLAUDE.md` |
594
+ | Claude Code | `CLAUDE.md` or `.claude/settings.json` | `CLAUDE.md` + Stop hook in `settings.json` |
533
595
  | Codex | `AGENTS.md` or `.codex` | `AGENTS.md` |
534
596
  | Cursor | `.cursorrules` or `.cursor/rules` | `.cursorrules` |
535
597
  | OpenClaw | `.openclaw` or `AGENTS.md` | `AGENTS.md` |
598
+ | OpenCode | `.opencode/` or `opencode.json` | `AGENTS.md` |
536
599
 
537
600
  No extra commands needed. Just `hippo init` and your agent knows about Hippo.
538
601
 
@@ -541,10 +604,11 @@ No extra commands needed. Just `hippo init` and your agent knows about Hippo.
541
604
  If you prefer explicit control:
542
605
 
543
606
  ```bash
544
- hippo hook install claude-code # patches CLAUDE.md
607
+ hippo hook install claude-code # patches CLAUDE.md + adds Stop hook to settings.json
545
608
  hippo hook install codex # patches AGENTS.md
546
609
  hippo hook install cursor # patches .cursorrules
547
610
  hippo hook install openclaw # patches AGENTS.md
611
+ hippo hook install opencode # patches AGENTS.md
548
612
  ```
549
613
 
550
614
  This adds a `<!-- hippo:start -->` ... `<!-- hippo:end -->` block that tells the agent to:
@@ -552,6 +616,8 @@ This adds a `<!-- hippo:start -->` ... `<!-- hippo:end -->` block that tells the
552
616
  2. Run `hippo remember "<lesson>" --error` on errors
553
617
  3. Run `hippo outcome --good` on completion
554
618
 
619
+ For Claude Code, it also adds a Stop hook to `~/.claude/settings.json` so `hippo sleep` runs automatically when the session exits.
620
+
555
621
  To remove: `hippo hook uninstall claude-code`
556
622
 
557
623
  ### What the hook adds (Claude Code example)
@@ -630,32 +696,100 @@ The 7 mechanisms in full: [PLAN.md#core-principles](PLAN.md#core-principles)
630
696
 
631
697
  For how these mechanisms connect to LLM training, continual learning, and open research problems: **[RESEARCH.md](RESEARCH.md)**
632
698
 
699
+ **Why does reward modulate decay?** In spiking neural networks, reward-modulated STDP strengthens synapses that contribute to positive outcomes and weakens those that don't. Hippo's reward-proportional decay (v0.11.0) implements this: memories with consistent positive outcomes decay slower, negatives decay faster, with no fixed deltas. Inspired by [MH-FLOCKE](https://github.com/MarcHesse/mhflocke)'s R-STDP architecture for quadruped locomotion, where the same mechanism produces stable learning with 11.6x lower variance than PPO.
700
+
701
+ **Prior art in agent memory simulation.** The idea that human-like memory produces human-like behavior as an emergent property was explored in IEEE research from 2010-2011 ([5952114](https://ieeexplore.ieee.org/document/5952114), [5548405](https://ieeexplore.ieee.org/document/5548405), [5953964](https://ieeexplore.ieee.org/document/5953964)). Walking between rooms and forgetting why you went there doesn't need direct simulation; it emerges naturally from a memory system with capacity limits and decay. Hippo's design follows the same principle: implement the mechanisms, and the behavior follows.
702
+
703
+ **Related work:** [HippoRAG](https://arxiv.org/abs/2405.14831) (Gutierrez et al., 2024) applies hippocampal indexing to RAG via knowledge graphs. [MemPalace](https://github.com/milla-jovovich/mempalace) (Sigman & Jovovich, 2026) organizes memory spatially (wings/halls/rooms) with AAAK compression, achieving 100% on [LongMemEval](https://arxiv.org/abs/2410.10813). [MH-FLOCKE](https://github.com/MarcHesse/mhflocke) (Hesse, 2026) uses spiking neurons with R-STDP for embodied cognition. Each system tackles a different facet: HippoRAG optimizes retrieval quality, MemPalace optimizes retrieval organization, MH-FLOCKE optimizes embodied learning, and Hippo optimizes memory lifecycle.
704
+
633
705
  ---
634
706
 
635
707
  ## Comparison
636
708
 
637
- | Feature | Hippo | Mem0 | Basic Memory | Claude-Mem |
638
- |---------|-------|------|-------------|-----------|
709
+ | Feature | Hippo | MemPalace | Mem0 | Basic Memory |
710
+ |---------|-------|-----------|------|-------------|
639
711
  | Decay by default | Yes | No | No | No |
640
712
  | Retrieval strengthening | Yes | No | No | No |
641
- | Hybrid search (BM25 + embeddings) | Yes | Embeddings only | No | No |
713
+ | Reward-proportional decay | Yes | No | No | No |
714
+ | Hybrid search (BM25 + embeddings) | Yes | Embeddings + spatial | Embeddings only | No |
642
715
  | Schema acceleration | Yes | No | No | No |
643
716
  | Conflict detection + resolution | Yes | No | No | No |
644
717
  | Multi-agent shared memory | Yes | No | No | No |
645
718
  | Transfer scoring | Yes | No | No | No |
646
719
  | Outcome tracking | Yes | No | No | No |
647
720
  | Confidence tiers | Yes | No | No | No |
721
+ | Spatial organization | No | Yes (wings/halls/rooms) | No | No |
722
+ | Lossless compression | No | Yes (AAAK, 30x) | No | No |
648
723
  | Cross-tool import | Yes | No | No | No |
649
- | Conversation capture | Yes | No | No | No |
650
724
  | Auto-hook install | Yes | No | No | No |
651
- | MCP server | Yes | No | No | No |
652
- | Native plugins | OpenClaw + Claude Code | No | No | No |
653
- | Multi-repo git learn | Yes | No | No | No |
654
- | Zero dependencies | Yes | No | No | No |
655
- | Git-friendly | Yes | No | Yes | No |
656
- | Framework agnostic | Yes | Partial | Yes | No |
725
+ | MCP server | Yes | Yes | No | No |
726
+ | Zero dependencies | Yes | No (ChromaDB) | No | No |
727
+ | LongMemEval R@5 (retrieval) | 74.0% (BM25 only) | 96.6% (raw) / 100% (reranked) | ~49-85% | N/A |
728
+ | Git-friendly | Yes | No | No | Yes |
729
+ | Framework agnostic | Yes | Yes | Partial | Yes |
730
+
731
+ Different tools answer different questions. Mem0 and Basic Memory implement "save everything, search later." MemPalace implements "store everything, organize spatially for retrieval." Hippo implements "forget by default, earn persistence through use." These are complementary approaches: MemPalace's retrieval precision + Hippo's lifecycle management would be stronger than either alone.
732
+
733
+ ---
734
+
735
+ ## Benchmarks
736
+
737
+ Two benchmarks testing two different things. Full details in [`benchmarks/`](benchmarks/).
738
+
739
+ ### LongMemEval (retrieval accuracy)
740
+
741
+ [LongMemEval](https://arxiv.org/abs/2410.10813) (ICLR 2025) is the industry-standard benchmark: 500 questions across 5 memory abilities, embedded in 115k+ token chat histories.
742
+
743
+ **Hippo v0.11.0 results (BM25 only, zero dependencies):**
744
+
745
+ | Metric | Score |
746
+ |--------|-------|
747
+ | Recall@1 | 50.4% |
748
+ | Recall@3 | 66.6% |
749
+ | Recall@5 | 74.0% |
750
+ | Recall@10 | 82.6% |
751
+ | Answer in content@5 | 46.6% |
752
+
753
+ | Question Type | Count | R@5 |
754
+ |---------------|-------|-----|
755
+ | single-session-assistant | 56 | 94.6% |
756
+ | knowledge-update | 78 | 88.5% |
757
+ | temporal-reasoning | 133 | 73.7% |
758
+ | multi-session | 133 | 72.2% |
759
+ | single-session-user | 70 | 65.7% |
760
+ | single-session-preference | 30 | 26.7% |
657
761
 
658
- Mem0, Basic Memory, and Claude-Mem all implement "save everything, search later." Hippo implements all 7 hippocampal mechanisms: two-speed storage, decay, retrieval strengthening, schema acceleration, conflict detection, multi-agent transfer, and explicit working memory. It's the only tool that models what memories are worth keeping.
762
+ For context: MemPalace scores 96.6% (raw) using ChromaDB embeddings + spatial indexing. Hippo achieves 74.0% using BM25 keyword matching alone with zero runtime dependencies. Adding embeddings via `hippo embed` (optional `@xenova/transformers` peer dep) enables hybrid search and should close the gap.
763
+
764
+ Hippo's strongest categories (knowledge-update 88.5%, single-session-assistant 94.6%) are the ones where keyword overlap between question and stored content is highest. The weakest (preference 26.7%) involves indirect references that need semantic understanding.
765
+
766
+ ```bash
767
+ cd benchmarks/longmemeval
768
+ python ingest_direct.py --data data/longmemeval_oracle.json --store-dir ./store
769
+ python retrieve_fast.py --data data/longmemeval_oracle.json --store-dir ./store --output results/retrieval.jsonl
770
+ python evaluate_retrieval.py --retrieval results/retrieval.jsonl --data data/longmemeval_oracle.json
771
+ ```
772
+
773
+ ### Sequential Learning Benchmark (agent improvement over time)
774
+
775
+ No other public benchmark tests whether memory systems produce learning curves. LongMemEval tests retrieval on a fixed corpus. This benchmark tests whether an agent with memory *performs better on task 40 than task 5*.
776
+
777
+ 50 tasks, 10 trap categories, each appearing 2-3 times across the sequence.
778
+
779
+ **Hippo v0.11.0 results:**
780
+
781
+ | Condition | Overall | Early | Mid | Late | Learns? |
782
+ |-----------|---------|-------|-----|------|---------|
783
+ | No memory | 100% | 100% | 100% | 100% | No |
784
+ | Static memory | 20% | 33% | 11% | 14% | No |
785
+ | Hippo | 40% | 78% | 22% | 14% | Yes |
786
+
787
+ The hippo agent's trap-hit rate drops from 78% to 14% as it accumulates error memories with 2x half-life. Static pre-loaded memory helps from the start but doesn't improve. Any memory system can run this benchmark by implementing the [adapter interface](benchmarks/sequential-learning/adapters/interface.mjs).
788
+
789
+ ```bash
790
+ cd benchmarks/sequential-learning
791
+ node run.mjs --adapter all
792
+ ```
659
793
 
660
794
  ---
661
795
 
@@ -664,10 +798,13 @@ Mem0, Basic Memory, and Claude-Mem all implement "save everything, search later.
664
798
  Issues and PRs welcome. Before contributing, run `hippo status` in the repo root to see the project's own memory.
665
799
 
666
800
  The interesting problems:
801
+ - **Improve LongMemEval score.** Current R@5 is 74.0% with BM25 only. Adding embeddings (`hippo embed`) and hybrid search should close the gap toward MemPalace's 96.6%.
667
802
  - Better consolidation heuristics (LLM-powered merge vs current text overlap)
668
803
  - Web UI / dashboard for visualizing decay curves and memory health
669
804
  - Optimal decay parameter tuning from real usage data
670
805
  - Cross-agent transfer learning evaluation
806
+ - **MemPalace-style spatial organization.** Could spatial structure (wings/halls/rooms) improve hippo's semantic layer?
807
+ - **AAAK-style compression for semantic memories.** Lossless token compression for context injection.
671
808
 
672
809
  ## License
673
810
 
package/dist/cli.d.ts CHANGED
@@ -21,6 +21,7 @@
21
21
  * hippo learn --git [--days <n>] [--repos <paths>]
22
22
  * hippo promote <id>
23
23
  * hippo sync
24
+ * hippo decide "<decision>" [--context "<why>"] [--supersedes <id>]
24
25
  * hippo wm <push|read|clear|flush>
25
26
  */
26
27
  export {};
package/dist/cli.d.ts.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"file":"cli.d.ts","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":";AACA;;;;;;;;;;;;;;;;;;;;;;;GAuBG"}
1
+ {"version":3,"file":"cli.d.ts","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":";AACA;;;;;;;;;;;;;;;;;;;;;;;;GAwBG"}