claude_memory 0.11.0 → 0.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/memory.sqlite3 +0 -0
- data/.claude/rules/claude_memory.generated.md +42 -64
- data/.claude/skills/release/SKILL.md +44 -6
- data/.claude/skills/study-repo/SKILL.md +15 -0
- data/.claude-plugin/commands/audit-memory.md +68 -0
- data/.claude-plugin/marketplace.json +1 -1
- data/.claude-plugin/plugin.json +1 -1
- data/CHANGELOG.md +26 -0
- data/CLAUDE.md +9 -2
- data/README.md +29 -1
- data/db/migrations/018_add_otel_telemetry.rb +81 -0
- data/docs/1_0_punchlist.md +318 -66
- data/docs/api_stability.md +341 -0
- data/docs/audit_runbook.md +209 -0
- data/docs/claude_monitoring.md +956 -0
- data/docs/improvements.md +148 -9
- data/docs/influence/ai-memory-systems-2026.md +403 -0
- data/docs/memory_audit_2026-05-21.md +303 -0
- data/docs/plugin.md +1 -1
- data/lib/claude_memory/audit/checks.rb +239 -0
- data/lib/claude_memory/audit/finding.rb +33 -0
- data/lib/claude_memory/audit/runner.rb +73 -0
- data/lib/claude_memory/commands/audit_command.rb +117 -0
- data/lib/claude_memory/commands/dashboard_command.rb +2 -1
- data/lib/claude_memory/commands/import_auto_memory_command.rb +180 -0
- data/lib/claude_memory/commands/otel_command.rb +240 -0
- data/lib/claude_memory/commands/registry.rb +4 -1
- data/lib/claude_memory/configuration.rb +60 -0
- data/lib/claude_memory/core/fact_query_builder.rb +1 -0
- data/lib/claude_memory/dashboard/api.rb +8 -0
- data/lib/claude_memory/dashboard/index.html +140 -1
- data/lib/claude_memory/dashboard/prompt_journey.rb +48 -0
- data/lib/claude_memory/dashboard/server.rb +86 -0
- data/lib/claude_memory/dashboard/telemetry.rb +156 -0
- data/lib/claude_memory/deprecations.rb +106 -0
- data/lib/claude_memory/distill/reference_material_detector.rb +37 -4
- data/lib/claude_memory/hook/auto_memory_mirror.rb +7 -3
- data/lib/claude_memory/hook/context_injector.rb +11 -2
- data/lib/claude_memory/mcp/tool_definitions.rb +3 -3
- data/lib/claude_memory/otel/attributes.rb +118 -0
- data/lib/claude_memory/otel/constants.rb +32 -0
- data/lib/claude_memory/otel/ingestor.rb +54 -0
- data/lib/claude_memory/otel/otlp_json_envelope.rb +254 -0
- data/lib/claude_memory/otel/prompt_scope.rb +108 -0
- data/lib/claude_memory/otel/settings_writer.rb +122 -0
- data/lib/claude_memory/otel/status.rb +58 -0
- data/lib/claude_memory/recall/staleness_annotator.rb +73 -0
- data/lib/claude_memory/resolve/predicate_policy.rb +17 -1
- data/lib/claude_memory/resolve/resolver.rb +30 -3
- data/lib/claude_memory/shortcuts.rb +61 -18
- data/lib/claude_memory/store/prompt_journey_query.rb +87 -0
- data/lib/claude_memory/store/schema_manager.rb +1 -1
- data/lib/claude_memory/store/sqlite_store.rb +136 -0
- data/lib/claude_memory/sweep/maintenance.rb +31 -1
- data/lib/claude_memory/sweep/sweeper.rb +6 -0
- data/lib/claude_memory/version.rb +1 -1
- data/lib/claude_memory.rb +18 -0
- metadata +26 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 935474b9efd1d9fd317c410728ed36b1cedb0854d1db6e71cd45ce6372253b9c
|
|
4
|
+
data.tar.gz: 1514d8b44e8ee25d139cd0544ab527bfa00ff54ace9786b7c47709afc0095024
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f70d8858628ecbd350e32f89da3dcdd8ff99ff2c61c85dc938a512ab79f9d6bae46d87677168a7d9059fdace727657e388aeed48a451ae8340ac2af5dd9384b8
|
|
7
|
+
data.tar.gz: 3164ae23db7e7b8fd66ece557a84953727181e2b06d2db80c4a807678015b064f196489768fb50c980c7d1de8f732b86dd5c0666d9f766a0d05ce52757f1b23d
|
data/.claude/memory.sqlite3
CHANGED
|
Binary file
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
<!--
|
|
2
2
|
This file is auto-generated by claude-memory.
|
|
3
3
|
Do not edit manually - changes will be overwritten.
|
|
4
|
-
Generated: 2026-
|
|
4
|
+
Generated: 2026-06-01T11:23:10Z
|
|
5
5
|
-->
|
|
6
6
|
|
|
7
7
|
# Project Memory
|
|
@@ -14,14 +14,38 @@
|
|
|
14
14
|
|
|
15
15
|
## Conventions
|
|
16
16
|
|
|
17
|
-
-
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
-
|
|
21
|
-
-
|
|
22
|
-
-
|
|
23
|
-
- claude-memory
|
|
24
|
-
-
|
|
17
|
+
- A/B testing methodology for memory plugin evaluation — How to test with/without memory using claude CLI — --plugin-dir doesn't work with --bare, use --mcp-config instead — To A/B test memory's impact on Claude Code responses: (imported from project auto-memory; see source file for full reasoning)
|
|
18
|
+
- do...end blocks over braces when args repeat or block is non-trivial — Block syntax preference for multi-argument or multi-expression blocks — When a block call has repeated argument names, multiple expressions, or reads awkwardly on one line, use `do...end` form rather than `{ }`. One-liners with simple single-expression bodies and short argument lists are fine with braces. (imported from project auto-memory; see source file for full reasoning)
|
|
19
|
+
- Always commit .claude/memory.sqlite3 — Per user direction (2026-05-21), .claude/memory.sqlite3 should be staged and committed alongside any change that updates project memory — even though it's a binary SQLite DB with WAL artifacts. — Always include `.claude/memory.sqlite3` in commits that touch project memory or knowledge. (imported from project auto-memory; see source file for full reasoning)
|
|
20
|
+
- Commit workflow preferences — How the user prefers commits to be structured and when to make them — Wait for the user to ask for commits — don't commit proactively. When asked, group changes into logical atomic commits: (imported from project auto-memory; see source file for full reasoning)
|
|
21
|
+
- Data-driven analysis before design changes — User expects thorough multi-project data surveys and critical questioning of assumptions before committing to architectural changes — When proposing design changes (especially to schemas, vocabularies, or policies), gather real usage data first and present a critical analysis before implementing. (imported from project auto-memory; see source file for full reasoning)
|
|
22
|
+
- Fix hallucination triggers at the source, not via repeated reject churn — When the distiller repeatedly produces the same wrong fact, trace it to the CLAUDE.md / docs example text; fixing the source stops re-appearance — When the dashboard's Conflicts tab accumulates clusters of the same kind of bad fact (e.g. many `uses_database` contradictions against `sqlite`), the root cause is almost always **example text in documentation** that the distiller is interpreting as a literal claim about the current repo. Single-value predicates (`uses_database`, `deployment_platform`, `auth_method`) are especially vulnerable becau... (imported from project auto-memory; see source file for full reasoning)
|
|
23
|
+
- Hooks run the installed gem, not the working copy — always `rake install` after editing hook/MCP code — .claude/settings.json hooks invoke `claude-memory` via PATH, so changes on a branch only take effect after `bundle exec rake install` — `.claude/settings.json` hooks call bare `claude-memory hook ingest` / `claude-memory hook context` / etc. That resolves via PATH to the installed gem, not the working-copy `./exe/claude-memory`. After editing any hook/MCP/distiller code on a branch, the change does NOT reach Claude Code until `bundle exec rake install` rebuilds and reinstalls the gem (which overwrites the prior install at the same ... (imported from project auto-memory; see source file for full reasoning)
|
|
24
|
+
- No extra API costs for features — User strongly prefers using Claude Code itself (skills, context hooks) over separate API calls that cost extra money — Do not add features that require separate Anthropic API calls (e.g., via anthropic-rb gem) when Claude Code itself can perform the same task. Use skills, context hook injection, and MCP tools to leverage the existing Claude Code session instead. (imported from project auto-memory; see source file for full reasoning)
|
|
25
|
+
- Quality review update cycle — Keep quality_review.md current as items are resolved — don't let it drift — When completing quality review items, update `docs/quality_review.md` immediately: (imported from project auto-memory; see source file for full reasoning)
|
|
26
|
+
- Refactoring approach preferences — How to approach god object extraction and structural refactoring in this codebase — Use module inclusion (not class extraction) when breaking up god objects. Include modules directly into the existing class so the public API is unchanged and zero tests need modification. This was validated three times:
|
|
27
|
+
- Round-trip migration specs cover each prior release boundary — For pre-release prep, write end-to-end migration specs from every distinct schema boundary back through ~3 prior releases — Before cutting a release that includes migrations, add round-trip specs that fixture an older DB at each distinct prior release's schema version, open via `SQLiteStore.new`, and assert the full upgrade path: schema_info advancement, data preservation across entities/facts/content_items/provenance, additive table/column creation, predicate-rewrite effects where applicable, and idempotency on re-open... (imported from project auto-memory; see source file for full reasoning)
|
|
28
|
+
- Codify behavioral contracts in tests, not just comments — When code has a deliberate scope limitation (one-shot, advisory-only, intentionally non-idempotent), write a test that fails if someone "fixes" it into being more general — When code has a deliberate scope limitation — a one-shot data migration, an advisory-only field, a method intentionally not idempotent for new inputs — write a test that exercises a scenario which would *break* if someone tried to make it more general. (imported from project auto-memory; see source file for full reasoning)
|
|
29
|
+
- Treat UX gaps as architecture smells — user inspection/debugging questions expose god classes and missing abstractions — When users ask "can I see/debug/act on X in the dashboard?", the answer is almost always "we need a new class or route, not a new button" — Across three architectural reviews in the 2026-04-17 → 20 session, every concrete UX gap the user identified traced back to a structural issue the code already had, not a frontend-only fix. Treating critique as a forcing function for refactoring produced cleaner results than either extracting preemptively (premature) or patching only the surface (symptom). (imported from project auto-memory; see source file for full reasoning)
|
|
30
|
+
- "database disk image is malformed" from FTS5 `ORDER BY rank` after sqlite3 .recover — sqlite3 .recover restores rows but can leave contentless FTS5 auxiliary indexes in a state where basic MATCH works but ORDER BY rank throws "malformed"; fix is `claude-memory compact` to rebuild the FTS index — A DB recovered via `sqlite3 corrupt.db .recover > dump.sql && sqlite3 fresh.db < dump.sql` can end up with an FTS5 index that's *partially* functional: (imported from project auto-memory; see source file for full reasoning)
|
|
31
|
+
- `rake install` uses `git ls-files`; untracked files silently disappear from the gem — Running `bundle exec rake install` before staging new files produces a gem missing those files, causing LoadError in hooks and MCP server — The claude_memory gemspec builds its file list via `IO.popen(%w[git ls-files -z], ...)` (claude_memory.gemspec:24). Any file that hasn't been `git add`ed at build time is **invisible to the gem** even though it exists on disk. The local working copy keeps running fine (dashboard server uses `./exe/claude-memory` against the repo directly), but the installed gem at `~/.gem/ruby/*/gems/claude_memory-... (imported from project auto-memory; see source file for full reasoning)
|
|
32
|
+
- Distiller scope_hint is advisory, not a routing signal — NullDistiller emits scope_hint: "global" for text matching GLOBAL_SCOPE_PATTERNS, but the resolver never routes writes between stores — scope_hint must not override fact.scope — `Distill::NullDistiller#global_scope_signal?` matches text like "always" / "my preference" / "in all projects" and stamps `scope_hint: "global"` on every fact extracted from that text. The hint is advisory metadata for downstream promotion decisions. It is NOT a routing signal — the resolver writes to whichever `SQLiteStore` was injected into it (always the project DB in the normal ingest path), re... (imported from project auto-memory; see source file for full reasoning)
|
|
33
|
+
- Sequel DB reads must use the extralite adapter — Opening a SQLite DB for ad-hoc reads requires the extralite adapter URI; Sequel.sqlite silently depends on an ungem'd sqlite3 — Never use `Sequel.sqlite(db_path)` or `Sequel.sqlite(db_path, readonly: true)` in this codebase. The gemspec lists only `extralite (~> 2.14)` — it does **not** depend on the `sqlite3` gem. `Sequel.sqlite` routes through Sequel's `sqlite` adapter which requires `gem "sqlite3"` and will raise `Sequel::AdapterNotFound: LoadError: cannot load such file -- sqlite3` at runtime. (imported from project auto-memory; see source file for full reasoning)
|
|
34
|
+
- Never `git checkout --` an active SQLite DB with WAL mode — Using `git checkout --` on .claude/memory.sqlite3 while readers/writers are open corrupts the DB via WAL/main file mismatch — Never run `git checkout -- .claude/memory.sqlite3` (or any SQLite DB in WAL mode) while any process has it open. Git replaces only the main DB file, leaving the WAL/SHM sidecar files referencing pages that no longer exist in the replaced file. Next read → "Extralite::Error: database disk image is malformed" and integrity_check shows btree errors across multiple trees. (imported from project auto-memory; see source file for full reasoning)
|
|
35
|
+
- SQLiteStore silently creates in-memory DB for relative paths — `SQLiteStore.new('.claude/memory.sqlite3')` with a relative path opens an empty in-memory DB, not the file — always pass absolute paths in tests/probes — `SQLiteStore.new(path)` builds a Sequel URI as `extralite:#{path}`. With a relative path like `.claude/memory.sqlite3`, the resulting URI `extralite:.claude/memory.sqlite3` is parsed with an empty database component, so Extralite opens an in-memory database. Schema migrations run against the in-memory DB (so `schema_version` reports the current version), but ALL queries return 0 rows and the real f... (imported from project auto-memory; see source file for full reasoning)
|
|
36
|
+
- Two tool_calls tables exist — don't conflate them — tool_calls (v3) stores transcript-observed Claude Code tool usage; mcp_tool_calls (v13) stores MCP server telemetry — There are **two** tables with similar names serving different purposes: (imported from project auto-memory; see source file for full reasoning)
|
|
37
|
+
- Distiller hallucination from CLAUDE.md example text — The scope-system example in CLAUDE.md causes recurring false fact extraction — reject + re-ingest creates rejection churn — CLAUDE.md contains a scope-system explanation with example text: (imported from project auto-memory; see source file for full reasoning)
|
|
38
|
+
- PredicatePolicy is the single source of truth for predicate vocabulary — All predicate knowledge (vocabulary, cardinality, sections, synonyms, LLM guidance) derives from PredicatePolicy — never hardcode predicate names elsewhere — As of 0.9.0, `PredicatePolicy` in `lib/claude_memory/resolve/predicate_policy.rb` is the authoritative source for all predicate-related behavior. This was a deliberate consolidation after finding the same predicate list duplicated in 4 files that drifted independently. (imported from project auto-memory; see source file for full reasoning)
|
|
39
|
+
- Hook-telemetry features need a manual hook trigger to verify in production, not just specs — `bundle exec rake install` AND fire a real hook AND check `sqlite3 ... json_extract(detail_json, '$.<field>')`, because specs assert against working-tree code but `.claude/settings.json` hooks invoke the installed gem via PATH so the asserted field can be silently absent in production. Hit on 2026-04-30 shipping #47 token-budget telemetry: 156 specs green but `context_tokens` was missing from 24h of real activity_events.
|
|
40
|
+
- When verifying any new field on activity_events.detail_json, the canonical smoke test is: `echo '{"hook_event_name":"SessionStart","session_id":"smoketest","source":"startup","cwd":"$(pwd)"}' | claude-memory hook context` then inspect via `sqlite3 .claude/memory.sqlite3 "SELECT json_extract(detail_json, '$.<field>') FROM activity_events WHERE event_type='hook_context' ORDER BY id DESC LIMIT 1"`. If null after rake install, the installed gem code hasn't picked up the change.
|
|
41
|
+
- Treat UX gaps as architecture smells: when a user asks "can I see/debug/act on X in the dashboard?" the answer is almost always a missing class or route, not a new button. Every UX critique in this project's dashboard work traced to a structural gap — god-class growth, four drifting fact serializers, scope_hint as silent scope override, no fact detail endpoint. Pattern: the surface question is usually a router into the architecture. Before reaching for frontend fixes, ask "what server-side data shape would make this easy?" — if that shape doesn't exist cleanly, treat it as the real work. Commit refactor separately from feature it enables ([Refactor]/[Feature]/[Fix] prefixes) so the critique→structural fix→UX fix chain is visible in git log.
|
|
42
|
+
- Four-surface staleness: after any change that touches UI + backend + plugin-launched binaries, refresh all four or the change looks broken. (1) bundle exec rake install so the installed gem catches up (hooks + MCP launched by Claude Code run from PATH). (2) Ctrl-C and re-run ./exe/claude-memory dashboard — server is long-lived Ruby, no live-reload. (3) /mcp reconnect in Claude Code so the MCP subprocess respawns. (4) Hard-refresh browser (Cmd-Shift-R) so cached index.html JS reloads. Skipping any produces confusing "my fix doesn't work." rspec green does NOT mean end-to-end works; before declaring UI-affecting changes done, curl the endpoint and verify shape matches frontend expectation. curl alt-port dashboard (--port 3388 --no-open in background) is fastest smoke test without disturbing user's running dashboard.
|
|
43
|
+
- fact.scope MUST match the DB the fact lives in. The distiller may emit scope_hint="global" from GLOBAL_SCOPE_PATTERNS but scope_hint is advisory only — it never routes writes or overrides the destination store's scope. Using scope_hint as a scope override produced orphaned scope=global rows inside the project DB that global recall couldn't see. Users move project facts to global via claude-memory promote (StoreManager#promote_fact does the proper cross-store copy). Sweep::Maintenance#fix_scope_leakage cleans drift in existing DBs. Invariant documented in gotcha_scope_hint_not_routing memory.
|
|
44
|
+
- "disk image is malformed" from an FTS5 ORDER BY rank query after a sqlite3 .recover restore is usually FTS auxiliary-index corruption, not real DB damage. Diagnostic chain: PRAGMA integrity_check on fresh connection (ok means main DB is fine), plain MATCH (works means b-tree is fine), ORDER BY rank (fails means FTS internals rotted). Fix with claude-memory compact — rebuilds FTS from source content in a few seconds. Do NOT reach for sqlite3 .recover a second time; that's the class of action that leaves FTS in this broken state. Three distinct malformed-error flavors in this project: real corruption (use .recover), WAL stale-cache phantom from long-lived readers (release connections per request), FTS5 rank rot (compact).
|
|
45
|
+
- Name collision: claude-memory recover (CLI) resets stuck operations from OperationTracker; it does NOT repair corrupt SQLite. For real disk-image corruption use sqlite3 corrupt.db ".recover" > dump.sql && sqlite3 fresh.db < dump.sql then open via SQLiteStore for migrations. Recovery can leave contentless FTS5 in a partial state where plain MATCH works but ORDER BY rank fails — follow up with claude-memory compact. Verified recoverable on 2026-04-16 — 76 facts / 56 entities / 183 content_items restored from a DB with multiple btree errors.
|
|
46
|
+
- Hooks in .claude/settings.json invoke bare 'claude-memory' which resolves via PATH to the installed gem, not the working-copy ./exe/claude-memory. After editing Hook::Handler, MCP::Tools, MCP::Server, ActivityLog, Distill::*, or any commands/ file on a branch, run 'bundle exec rake install' before expecting hooks or the Claude-Code-launched MCP server to see the change. The dashboard server (./exe/claude-memory dashboard) is the exception — it runs from the working copy directly. When reinstalling while servers are running, also restart them (Ctrl-C or /mcp reconnect) since old code stays in memory.
|
|
47
|
+
- Never 'git checkout --' .claude/memory.sqlite3 (or any WAL-mode SQLite DB) while a reader/writer has it open. Git replaces the main DB file but leaves the -wal and -shm sidecar files referencing stale pages, which corrupts the DB on next read. Safe sequence: stop all holders (MCP server, dashboard, hooks), delete the -wal and -shm sidecars, then checkout. Recovery path: sqlite3 corrupt.db .recover > dump.sql then reimport into a fresh file.
|
|
48
|
+
- SQLiteStore.new with a relative path silently opens an empty in-memory DB instead of the file. The URI built in retry_handler.rb is extralite:<path>, and extralite:.claude/memory.sqlite3 parses with an empty database option so Extralite treats it as in-memory. Schema migrations run so schema_version looks correct but all queries return 0 rows. Production callers go through StoreManager with absolute paths (via Configuration) so this does not bite in normal use. In ad-hoc probes always pass File.expand_path(path). Diagnostic: if store.facts.count returns 0 while sqlite3 CLI shows rows, suspect this before suspecting corruption.
|
|
25
49
|
- Never use Sequel.sqlite for DB reads; this gem only depends on extralite. Use Sequel.connect("extralite://#{db_path}") or SQLiteStore.new. Sequel.sqlite requires the ungem'd sqlite3 adapter and fails at runtime.
|
|
26
50
|
- Two distinct tool_calls tables exist: tool_calls (v3) for transcript-observed Claude Code tool usage, and mcp_tool_calls (v13) for MCP server telemetry. Disjoint purposes, never join.
|
|
27
51
|
- MCP tool-call telemetry is recorded via MCP::Telemetry wrapping Server#handle_tools_call. Writes to mcp_tool_calls table in the project DB. Swallows DB errors so telemetry never breaks a real tool response. Viewable via 'claude-memory stats --tools [--since DAYS]'.
|
|
@@ -40,74 +64,28 @@
|
|
|
40
64
|
- ContentSanitizer strips system-reminder, local-command-caveat, command-message, command-name, command-args tags in addition to private/no-memory/secret/claude-memory-context.
|
|
41
65
|
- Core::RelativeTime module provides progressive time formatting: just now → Xm ago → Xh ago → Xd ago → YYYY-MM-DD. Used in ResponseFormatter for *_ago fields.
|
|
42
66
|
- MCP server registers memory_guide prompt via prompts/list and prompts/get endpoints. QueryGuide module holds prompt content.
|
|
43
|
-
- Claude Code plugin with marketplace.json, skill definitions, MCP server bundling. 5,700+ stars, by Tobi Lütke. Custom fine-tuned query expansion (Qwen3-1.7B, SFT+GRPO). Dual content/structuredContent MCP pattern.
|
|
44
|
-
- Cloud-backed Claude Code plugin (~1,195 LOC JavaScript) using Supermemory API for persistent memory across sessions. Uses hooks for SessionStart context injection and Stop transcript capture. No local database.
|
|
45
67
|
|
|
46
68
|
## Technical Constraints
|
|
47
69
|
|
|
70
|
+
- **Uses framework**: django
|
|
71
|
+
- **Uses language**: typescript
|
|
72
|
+
- **Uses language**: python
|
|
48
73
|
- **Uses framework**: rails
|
|
49
|
-
- **
|
|
74
|
+
- **Uses framework**: react
|
|
75
|
+
- **Uses framework**: sinatra
|
|
76
|
+
- **Uses language**: javascript
|
|
77
|
+
- **Uses language**: go
|
|
78
|
+
- **Uses language**: ruby
|
|
50
79
|
- **Uses database**: sqlite
|
|
51
80
|
|
|
52
81
|
## Additional Knowledge
|
|
53
82
|
|
|
54
83
|
### Architecture
|
|
55
84
|
|
|
56
|
-
-
|
|
85
|
+
- mcp_server: Claude Code does NOT pass its session_id into plugin-spawned MCP server subprocesses — neither via JSON-RPC transport nor CLAUDE_SESSION_ID env var. Configuration.new.session_id returns nil inside the MCP process, so MCP-originated activity events (recall, store_extraction) get session_id=nil. Hook commands are different — .claude/settings.json payloads explicitly include session_id in their JSON. For dashboards or any feature that needs per-session attribution of MCP-originated events, correlate by time window using hook events (which do carry session_id) rather than strict session_id equality. Dashboard::API#efficacy uses session_window + within_window? for this.
|
|
57
86
|
- MCP::Tools: Thin 104-line dispatcher that includes 6 handler modules in mcp/handlers/: QueryHandlers, ShortcutHandlers, ContextHandlers, ManagementHandlers, StatsHandlers, SetupHandlers
|
|
58
87
|
- Recall: 94-line facade delegating to @engine (DualEngine or LegacyEngine), both include shared QueryCore module with all store-level query logic
|
|
59
88
|
- SQLiteStore: 386-line CRUD class that includes RetryHandler (retry/connection logic) and SchemaManager (migrations/version sync) modules
|
|
60
89
|
- Embeddings: Pluggable providers via Embeddings.resolve(name, env:). Three providers: tfidf (default), fastembed, api. Duck-typed contract: name, dimensions, generate(text). ENV: CLAUDE_MEMORY_EMBEDDING_PROVIDER
|
|
61
90
|
- Embeddings::DimensionCheck: Pure value object — DimensionCheck.call(store, provider) returns Data.define Result with :fresh/:match/:mismatch status. No side effects; caller decides how to handle mismatch.
|
|
62
91
|
|
|
63
|
-
|
|
64
|
-
## Open Conflicts
|
|
65
|
-
|
|
66
|
-
The following facts are in conflict and need resolution:
|
|
67
|
-
|
|
68
|
-
- Conflict #12: Fact 21 vs Fact 43
|
|
69
|
-
- Conflict #13: Fact 21 vs Fact 44
|
|
70
|
-
- Conflict #14: Fact 45 vs Fact 46
|
|
71
|
-
- Conflict #15: Fact 45 vs Fact 47
|
|
72
|
-
- Conflict #16: Fact 48 vs Fact 49
|
|
73
|
-
- Conflict #17: Fact 45 vs Fact 50
|
|
74
|
-
- Conflict #18: Fact 21 vs Fact 51
|
|
75
|
-
- Conflict #19: Fact 48 vs Fact 52
|
|
76
|
-
- Conflict #20: Fact 21 vs Fact 53
|
|
77
|
-
- Conflict #21: Fact 21 vs Fact 54
|
|
78
|
-
- Conflict #22: Fact 21 vs Fact 55
|
|
79
|
-
- Conflict #23: Fact 21 vs Fact 56
|
|
80
|
-
- Conflict #24: Fact 21 vs Fact 57
|
|
81
|
-
- Conflict #25: Fact 48 vs Fact 58
|
|
82
|
-
- Conflict #26: Fact 48 vs Fact 59
|
|
83
|
-
- Conflict #27: Fact 48 vs Fact 60
|
|
84
|
-
- Conflict #28: Fact 21 vs Fact 61
|
|
85
|
-
- Conflict #29: Fact 21 vs Fact 62
|
|
86
|
-
- Conflict #30: Fact 21 vs Fact 63
|
|
87
|
-
- Conflict #31: Fact 45 vs Fact 64
|
|
88
|
-
- Conflict #32: Fact 21 vs Fact 65
|
|
89
|
-
- Conflict #33: Fact 21 vs Fact 66
|
|
90
|
-
- Conflict #34: Fact 21 vs Fact 67
|
|
91
|
-
- Conflict #35: Fact 45 vs Fact 68
|
|
92
|
-
- Conflict #36: Fact 45 vs Fact 69
|
|
93
|
-
- Conflict #37: Fact 48 vs Fact 70
|
|
94
|
-
- Conflict #38: Fact 48 vs Fact 71
|
|
95
|
-
- Conflict #39: Fact 21 vs Fact 72
|
|
96
|
-
- Conflict #40: Fact 21 vs Fact 73
|
|
97
|
-
- Conflict #41: Fact 21 vs Fact 74
|
|
98
|
-
- Conflict #42: Fact 21 vs Fact 75
|
|
99
|
-
- Conflict #43: Fact 21 vs Fact 76
|
|
100
|
-
- Conflict #44: Fact 45 vs Fact 77
|
|
101
|
-
- Conflict #45: Fact 45 vs Fact 78
|
|
102
|
-
- Conflict #46: Fact 45 vs Fact 79
|
|
103
|
-
- Conflict #47: Fact 45 vs Fact 80
|
|
104
|
-
- Conflict #48: Fact 45 vs Fact 81
|
|
105
|
-
- Conflict #49: Fact 48 vs Fact 82
|
|
106
|
-
- Conflict #50: Fact 48 vs Fact 83
|
|
107
|
-
- Conflict #51: Fact 48 vs Fact 84
|
|
108
|
-
- Conflict #52: Fact 48 vs Fact 85
|
|
109
|
-
- Conflict #53: Fact 48 vs Fact 86
|
|
110
|
-
- Conflict #54: Fact 48 vs Fact 87
|
|
111
|
-
- Conflict #55: Fact 48 vs Fact 88
|
|
112
|
-
- Conflict #56: Fact 48 vs Fact 89
|
|
113
|
-
- Conflict #57: Fact 48 vs Fact 90
|
|
@@ -63,7 +63,43 @@ bundle exec rspec
|
|
|
63
63
|
|
|
64
64
|
All tests must pass. Do not proceed with any failures. Fix them first.
|
|
65
65
|
|
|
66
|
-
### Step 6: Run the
|
|
66
|
+
### Step 6: Run the pre-release hook smoke gate
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
bin/pre-release-smoke
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
This script:
|
|
73
|
+
|
|
74
|
+
1. Re-runs `bundle exec rake install` so the PATH-resolved `claude-memory` binary matches the working tree.
|
|
75
|
+
2. Triggers each gem-managed hook against a temp DB.
|
|
76
|
+
3. Verifies every field listed in `spec/smoke/expected_fields.yml` is populated on the resulting `activity_events.detail_json`.
|
|
77
|
+
4. Exits non-zero with the missing field name(s) and `since_version` if any expected field is null/absent.
|
|
78
|
+
|
|
79
|
+
**This catches the class of bug specs cannot:** a code change that adds a new `detail_json` field but forgets `rake install`, leaving the installed gem stale and production hooks silently missing the field. Sprung that trap on 2026-04-16 (ActivityLog) and again on 2026-04-30 (#47 token-budget) — the gate is here so it can't happen a third time.
|
|
80
|
+
|
|
81
|
+
If the gate fails, **stop the release**, address the missing field (usually `bundle exec rake install` followed by re-running the gate), and only proceed when it exits 0.
|
|
82
|
+
|
|
83
|
+
### Step 7: Run the benchmark scoreboard diff
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
bin/run-evals --benchmarks
|
|
87
|
+
bin/bench-diff
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
`bin/run-evals --benchmarks` writes `spec/benchmarks/results/<version>.json` — the diff-friendly snapshot of the current release's pass rates by category and per-scenario. `bin/bench-diff` then compares that snapshot against the most recent prior tagged version's scoreboard and exits non-zero if any tracked pass-rate dropped beyond the threshold (default -5%; configurable via `--threshold`).
|
|
91
|
+
|
|
92
|
+
The first release with this gate (0.12.0) has no prior scoreboard to compare against — `bench-diff` exits 0 with a "No baseline scoreboard available" note. From 0.13.0 onward it actively gates.
|
|
93
|
+
|
|
94
|
+
If the diff reports a regression, **stop the release**, investigate (the regressed metric path is named in stderr — e.g. `metrics.evals.by_scenario.tech_stack_recall.pass_rate`), and only proceed once you've either (a) fixed the regression or (b) made a deliberate decision that the lower pass rate is acceptable. If (b), document the new baseline in CHANGELOG so future-you isn't surprised.
|
|
95
|
+
|
|
96
|
+
For real-mode E2E coverage (~$2-8 per run), pass `EVAL_MODE=real`:
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
EVAL_MODE=real bin/run-evals --all && bin/bench-diff
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### Step 8: Run the linter
|
|
67
103
|
|
|
68
104
|
```bash
|
|
69
105
|
bundle exec rake standard:fix
|
|
@@ -71,7 +107,7 @@ bundle exec rake standard:fix
|
|
|
71
107
|
|
|
72
108
|
Ensure no remaining violations.
|
|
73
109
|
|
|
74
|
-
### Step
|
|
110
|
+
### Step 9: Verify CHANGELOG.md
|
|
75
111
|
|
|
76
112
|
The CHANGELOG should already have a release section written during development (via `/improve`, manual commits, or other workflow). **Do not auto-generate release notes** — they should reflect the actual development narrative.
|
|
77
113
|
|
|
@@ -83,7 +119,7 @@ Check that:
|
|
|
83
119
|
|
|
84
120
|
If the CHANGELOG section is missing or incomplete, **stop and ask the user**. Do not fabricate release notes.
|
|
85
121
|
|
|
86
|
-
### Step
|
|
122
|
+
### Step 10: Commit the version bump
|
|
87
123
|
|
|
88
124
|
```bash
|
|
89
125
|
git add lib/claude_memory/version.rb .claude-plugin/plugin.json .claude-plugin/marketplace.json Gemfile.lock
|
|
@@ -111,7 +147,7 @@ Wait for the user to confirm before proceeding to Phase 3.
|
|
|
111
147
|
|
|
112
148
|
## Phase 3: Announce
|
|
113
149
|
|
|
114
|
-
### Step
|
|
150
|
+
### Step 11: Fix any stale "Latest" flags on GitHub releases
|
|
115
151
|
|
|
116
152
|
Check current release state:
|
|
117
153
|
|
|
@@ -125,7 +161,7 @@ If an older release is incorrectly marked "Latest" (this happens when releases a
|
|
|
125
161
|
gh release edit v<old-version> --latest=false
|
|
126
162
|
```
|
|
127
163
|
|
|
128
|
-
### Step
|
|
164
|
+
### Step 12: Create the GitHub release
|
|
129
165
|
|
|
130
166
|
Extract the release notes from CHANGELOG.md — everything between `## [X.Y.Z]` and the next `## [` heading. Write to a temp file:
|
|
131
167
|
|
|
@@ -146,7 +182,7 @@ gh release create vX.Y.Z \
|
|
|
146
182
|
|
|
147
183
|
The title should capture the theme of the release in a few words (e.g., "Predicate Design Overhaul, Reject/Restore, Telemetry"). Read the CHANGELOG to derive this — don't ask the user unless the theme isn't obvious.
|
|
148
184
|
|
|
149
|
-
### Step
|
|
185
|
+
### Step 13: Verify the release
|
|
150
186
|
|
|
151
187
|
```bash
|
|
152
188
|
gh release list --limit 5
|
|
@@ -161,6 +197,8 @@ Report the release URL to the user.
|
|
|
161
197
|
|
|
162
198
|
## Error Handling
|
|
163
199
|
|
|
200
|
+
- **Smoke gate fails (`bin/pre-release-smoke` exits non-zero)**: The script names the missing `detail_json` field and `since_version` in stderr. Most common cause: code that adds a new field landed without a follow-up `bundle exec rake install`, so the installed gem is stale. Re-run `rake install`, then re-run the gate. If the field was newly added but no `rake install` was run, that's the bug the gate is designed to catch — don't bypass it. If the manifest needs updating because a field was intentionally removed, edit `spec/smoke/expected_fields.yml` AND add a CHANGELOG breaking-change note (removing a `detail_json` field is a public API change per `docs/api_stability.md` §4).
|
|
201
|
+
- **Bench-diff fails (`bin/bench-diff` exits 1)**: Stderr names the metric path that regressed (e.g. `metrics.evals.by_scenario.tech_stack_recall.pass_rate`). Investigate the regression — is it a real correctness issue, or a measurement-noise issue (e.g. real-mode flake)? If real, fix before releasing. If a deliberate baseline change (we knowingly traded N% in metric X for some other gain), update CHANGELOG with the new baseline and re-run with a temporarily looser `--threshold` to ship; the next release picks up the new floor automatically. **Don't bypass the gate without an explicit baseline-change note** — that defeats the entire scoreboard.
|
|
164
202
|
- **Tests fail**: Fix first. Never release with failing tests.
|
|
165
203
|
- **CHANGELOG missing**: Ask the user. Never fabricate release notes.
|
|
166
204
|
- **Version already tagged**: The tag may exist from a prior attempt. Ask the user whether to delete and recreate, or use a different version.
|
|
@@ -36,6 +36,21 @@ Then invoke: `/study-repo /tmp/study-repos/project-name`
|
|
|
36
36
|
|
|
37
37
|
See `.claude/skills/study-repo/focus-examples.md` for more examples.
|
|
38
38
|
|
|
39
|
+
## CRITICAL: Memory Discipline (no external-tech misattribution)
|
|
40
|
+
|
|
41
|
+
When studying an external repo you will read its README, gemspec, and source — and you will see things like *"uses Postgres"*, *"runs on AWS"*, *"built with Rails"*. These are facts **about the external project, not about this project**.
|
|
42
|
+
|
|
43
|
+
Do NOT call `memory.store_extraction` with the external project's tech stack as `uses_database` / `uses_framework` / `uses_language` / `deployment_platform` / `auth_method` predicates. That misattribution caused 27 facts to be stored about ClaudeMemory in the 2026-04-23/24 window that all had to be hand-rejected (see `improvements.md` #61, `quality_review.md` 2026-04-30 note). The corpus damage was real even though the cleanup worked — every misattributed fact takes a round trip through the database, conflict-detection, and the user's `claude-memory reject` queue.
|
|
44
|
+
|
|
45
|
+
**The rule.** While `/study-repo` is running, the only `memory.store_extraction` calls allowed are:
|
|
46
|
+
|
|
47
|
+
- `predicate=reference` for descriptions of the external project ("X is a plugin/library/CLI that…"). The dashboard's Knowledge → References panel is the right home for these.
|
|
48
|
+
- Facts genuinely about *this* project ClaudeMemory that you derive from contrast with the studied repo (e.g., a decision: "Adopt RRF fusion from QMD because…"). These belong as `decision` / `convention` / `architecture` with `subject=repo` or `subject=claude_memory` AND a reason clause embedded.
|
|
49
|
+
|
|
50
|
+
**The hard ban.** Any single-value cardinality predicate (`uses_database`, `deployment_platform`, `auth_method`) populated with the studied project's tech is forbidden. If in doubt, write the observation into the influence document (`docs/influence/<project>.md`) — that file IS the right artifact for "what does this external project use" — and skip `memory.store_extraction` entirely.
|
|
51
|
+
|
|
52
|
+
If the user asks "did the studied project use X?" later, the answer lives in `docs/influence/`, not in memory facts.
|
|
53
|
+
|
|
39
54
|
## Analysis Phases
|
|
40
55
|
|
|
41
56
|
Follow these phases systematically to ensure comprehensive coverage:
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
# Audit Memory
|
|
2
|
+
|
|
3
|
+
Run a health audit on the ClaudeMemory database and walk the user through resolving findings. Detects inconsistencies (open conflicts, single-cardinality contract violations, recurring contamination), regressions (shortcut filters losing predicate semantics), and optimizations (auto-memory files not yet imported, bare-conclusion ratio, duplicate global conventions).
|
|
4
|
+
|
|
5
|
+
## Usage
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
/audit-memory
|
|
9
|
+
/audit-memory --json # machine-readable output (no walkthrough)
|
|
10
|
+
/audit-memory --severity=error # only errors
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
## Instructions
|
|
14
|
+
|
|
15
|
+
You are a ClaudeMemory health auditor. Your job is to run the audit, present findings to the user with concrete remediation options, and apply fixes the user approves. Be efficient — read-only inspection is free, but every write needs user approval.
|
|
16
|
+
|
|
17
|
+
### Step 1: Run the audit
|
|
18
|
+
|
|
19
|
+
Call the CLI directly to get structured findings:
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
claude-memory audit --json
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
If the user passed `--json`, just dump the output verbatim and stop. Otherwise continue to step 2.
|
|
26
|
+
|
|
27
|
+
If `claude-memory audit` returns `{"ok": true, "counts": {"error": 0, ...}}`, congratulate briefly and stop. Don't fabricate problems.
|
|
28
|
+
|
|
29
|
+
### Step 2: Triage findings
|
|
30
|
+
|
|
31
|
+
Group the findings by severity. Present them to the user in this order:
|
|
32
|
+
|
|
33
|
+
1. **Errors (must fix)** — these block CI/quality contracts. Walk through each one. Each error has a `suggestion` field with the concrete CLI command(s) to run. Ask "shall I run this?" before executing.
|
|
34
|
+
2. **Warnings (should investigate)** — surface but don't auto-fix. Many warnings (like `single_cardinality_churn`) require finding the contamination source, which needs human context.
|
|
35
|
+
3. **Info (optimizations)** — present as suggestions, not blockers. Things like auto-memory imports, bare-conclusion reduction, duplicate cleanup.
|
|
36
|
+
|
|
37
|
+
For each finding, the output already includes:
|
|
38
|
+
- `id` (C001…C010) — stable across releases; users can refer to them
|
|
39
|
+
- `title` — one-line summary
|
|
40
|
+
- `detail` — why it matters
|
|
41
|
+
- `suggestion` — the literal CLI command to run
|
|
42
|
+
- `fact_ids` — the rows involved (use with `claude-memory explain <id>` for details)
|
|
43
|
+
|
|
44
|
+
### Step 3: Investigate before mass-rejecting
|
|
45
|
+
|
|
46
|
+
For `C002` (single-cardinality multiplicity) and `C010` (churn), DO NOT immediately bulk-reject. Recurring contamination has a source. Investigate first:
|
|
47
|
+
|
|
48
|
+
1. Pick one of the offending fact IDs.
|
|
49
|
+
2. Run `claude-memory explain <fact_id>` to see provenance.
|
|
50
|
+
3. Read the `quote` and `content_item_id` to find the trigger text.
|
|
51
|
+
4. Decide: is this a real claim or example text? Real claims should win the supersession; example text should be wrapped in `<no-memory>` tags at the source.
|
|
52
|
+
|
|
53
|
+
### Step 4: Apply fixes with user approval
|
|
54
|
+
|
|
55
|
+
For approved remediations, run the exact command from the `suggestion` field. Don't paraphrase. After each batch, re-run `claude-memory audit` to confirm the finding is gone.
|
|
56
|
+
|
|
57
|
+
### Step 5: Wrap up
|
|
58
|
+
|
|
59
|
+
When the audit reports `ok: true`, suggest the user:
|
|
60
|
+
- Commit `.claude/memory.sqlite3` if they want to lock in the cleanup.
|
|
61
|
+
- Run `claude-memory publish` to refresh `.claude/rules/claude_memory.generated.md`.
|
|
62
|
+
- Wire `claude-memory audit` into CI / pre-release so future drift is caught early.
|
|
63
|
+
|
|
64
|
+
## Background
|
|
65
|
+
|
|
66
|
+
This skill is part of the systemic audit pipeline established in `docs/memory_audit_2026-05-21.md`. The contract definitions (single-cardinality, shortcut predicate filters, distillation backlog thresholds) live in `lib/claude_memory/audit/checks.rb`. Adding a new check there propagates automatically to this skill.
|
|
67
|
+
|
|
68
|
+
See `docs/audit_runbook.md` for per-check rationale, common contamination sources, and worked examples.
|
|
@@ -7,7 +7,7 @@
|
|
|
7
7
|
"plugins": [
|
|
8
8
|
{
|
|
9
9
|
"name": "claude-memory",
|
|
10
|
-
"version": "0.
|
|
10
|
+
"version": "0.12.0",
|
|
11
11
|
"source": "./",
|
|
12
12
|
"description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
|
|
13
13
|
"repository": "https://github.com/codenamev/claude_memory"
|
data/.claude-plugin/plugin.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-memory",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.12.0",
|
|
4
4
|
"description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Valentino Stoll",
|
data/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,32 @@ All notable changes to this project will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
## [Unreleased]
|
|
6
6
|
|
|
7
|
+
## [0.12.0] - 2026-05-29
|
|
8
|
+
|
|
9
|
+
Theme: **Release Discipline + Observability + Self-Audit** — the infrastructure that makes a 1.0 semver promise defensible. This release locks down the public API surface, adds the observability primitives (OTel ingestion, dashboard Telemetry) and the self-audit toolkit (`claude-memory audit`) that serve the visibility pillar, and ships the negative-fact harm benchmark + staleness guard that make the long-horizon-quality claim measurable rather than aspirational.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- **Staleness guard for single-value facts** — single-value predicates (`uses_database` / `deployment_platform` / `auth_method`) are exclusive claims Claude follows authoritatively, so a *stale* one is the most dangerous kind of memory. The 0.12 harm benchmark caught Claude emitting `git push heroku HEAD:main` from a stale `deployment_platform` fact with zero hedge — and supersession only protects against this if the replacement was recorded. New `Recall::StalenessAnnotator` (pure function) flags single-value facts that are old (`valid_from`/`created_at` older than `injection_stale_days`, default 180) AND not recently confirmed (`last_recalled_at` null or stale); `Hook::ContextInjector` appends a `⚠ stale: recorded YYYY-MM-DD … verify before relying` marker at SessionStart so Claude can hedge or verify instead of blindly following. Multi-value predicates are never annotated (they accumulate; one stale entry isn't authoritative). New `Configuration#injection_stale_days` (`CLAUDE_MEMORY_INJECTION_STALE_DAYS`), deliberately much longer than the 14-day dashboard review window. Serves the 1.0 long-horizon-quality pillar — it's the first defense against memory degrading session quality over months.
|
|
14
|
+
- **Negative-fact harm benchmark — full 13-scenario corpus + release gate** — expands the 0.11 3-scenario prototype to 13 cases across four harm classes (stale_tech, mismatched_scope, superseded_undetected, and the new reference_material_as_fact). Each scenario ships a `project_files` scaffold whose current state contradicts the wrong memory fact, so the test measures "does Claude follow stale/wrong memory over the project's actual state?" rather than reacting to an empty directory. Scored best-of-N (default 3 runs, majority vote per scenario via `HARM_BENCH_RUNS`) to absorb single-shot LLM nondeterminism. `HARM_RATE_THRESHOLD` (default 1%) fails the run if the majority-harmed scenario rate is exceeded — making "memory doesn't make Claude wrong" a measurable release gate rather than a marketing claim. The first full-corpus real-mode run surfaced a real harm (stale deployment fact) and a harness confound (empty-tmpdir noise), which drove both the staleness guard above and the scaffold + best-of-N harness hardening.
|
|
15
|
+
- **`claude-memory audit` — memory health diagnostic** — productionizes the 2026-05-21 contamination audit into a stable diagnostic surface anyone using claude_memory can run on their own setup. Ten contract checks (C001-C010) cover open conflicts, single-cardinality multiplicity, distillation backlog, shortcut-leak detection, duplicate global conventions, bare-conclusion rate, project starvation, auto-memory import gaps, and single-cardinality churn. `--json` is the stable contract for CI; `--severity` filters; `--no-exit` always exits 0. The `/audit-memory` slash command wraps the same runner for an interactive walkthrough. `docs/audit_runbook.md` documents each check's rationale and remediation. `CHECK_METHODS` is append-only by design so JSON consumers don't break when new checks land. New `claude-memory import-auto-memory` retroactively pulls `~/.claude/projects/<slug>/memory/*.md` entries that `AutoMemoryMirror` previously missed (slug bug: `tr("/", "-")` left underscores intact, so `claude_memory` paths never matched). Contributes to the **visibility** pillar of 1.0.
|
|
16
|
+
- **Contamination guardrails — `ReferenceMaterialDetector` example-quote guard + `Resolver` `:discard` path** — the distiller used to treat example sentences in docs/CLAUDE.md ("e.g., postgres", "for example, mysql") as literal claims about the project, accumulating 103 rejected single-cardinality facts over six weeks before being caught by the 2026-05-21 audit. Two defenses now: (1) `ReferenceMaterialDetector` flags single-cardinality predicate extractions whose source text contains `e.g.,` / `for example` / `i.e.` quote patterns so they're tagged reference material at write time; (2) `Resolver` gains a `:discard` resolution path for the same shape so the fact never lands even if the detector misses. Memory shortcuts (`memory.decisions` / `.conventions` / `.architecture`) refactored from FTS text search (which returned facts whose *object* matched the predicate keyword) to predicate-based filtering via `PredicatePolicy`, with project-DB precedence over global. Closes a class of "is memory still trustworthy?" bugs that erode the 1.0 stability claim.
|
|
17
|
+
- **OpenTelemetry ingestion + dashboard Telemetry tab** — Claude Code can now export metrics, log-style events, and (opt-in) traces straight into the dashboard via OTLP/HTTP/JSON. New `claude-memory otel` CLI manages the env block in `.claude/settings.json` (`--enable`, `--disable`, `--enable-traces`, `--capture-prompts`, `--status`, `--verify`); the dashboard exposes `/v1/metrics`, `/v1/logs`, `/v1/traces` on `127.0.0.1:3377` and a new "Telemetry" drawer showing cost per hour, tokens by model, top tools by latency, and a per-prompt journey waterfall that UNIONs `otel_events` with the existing `activity_events`. Schema v18 adds `otel_metrics`/`otel_events`/`otel_traces` plus an additive `prompt_id` column on `activity_events` for journey correlation. Privacy posture: nothing past metric counts is captured by default; `OTEL_LOG_USER_PROMPTS` only flips on with explicit `--capture-prompts` confirmation; traces remain 501-gated until the user opts in. Sweep retention defaults: 30 days metrics, 14 days events, 7 days traces.
|
|
18
|
+
- **Pre-release hook smoke gate** (`bin/pre-release-smoke`) — verifies the *installed* claude-memory gem actually fires hooks correctly and populates expected `detail_json` fields per `spec/smoke/expected_fields.yml`. Codifies the verification convention from `feedback_hooks_run_installed_gem.md` into a machine-enforced release gate. The trap has been sprung twice (2026-04-16 ActivityLog, 2026-04-30 #47 token-budget); the gate exists so it can't be sprung a third time. Wired into the `/release` skill as Phase 1 Step 6 (after specs, before lint). First 0.12.0 milestone item.
|
|
19
|
+
- **`/study-repo` memory-discipline guard (prompt-only)** — top-level "CRITICAL: Memory Discipline" section in `.claude/skills/study-repo/SKILL.md` explicitly forbids the LLM from extracting external projects' tech stack as project-level facts. Roots the cleanup work `claude-memory reject` had to do during 0.11 (27-fact misattribution cluster on 2026-04-23/24, see `quality_review.md` 2026-04-30 cause-4 finding). Defense-in-depth detector deferred to 0.12.x or later, only built if measurement shows persistent leakage.
|
|
20
|
+
- **API stability audit (`docs/api_stability.md`)** — authoritative public-API contract enumerating which CLI commands, MCP tools, hook events, Ruby classes, and schema surfaces are stable / experimental / internal. Default-to-internal applied throughout; the doc is the source of truth for what 1.0's semver promise will lock down. New `ClaudeMemory::Deprecations.warn(name:, replacement:, removed_in:)` module wired into `PredicatePolicy.canonicalize` as the first soft-rename — `has_convention` and `primary_language` synonyms now emit deprecation warnings scheduled for removal in `1.0.0`. README + CLAUDE.md link to the new doc; suppress noise via `CLAUDE_MEMORY_NO_DEPRECATIONS=1`.
|
|
21
|
+
- **Release-to-release benchmark scoreboard** — `bin/run-evals` now writes `spec/benchmarks/results/<version>.json` after each run; new `bin/bench-diff` compares the current scoreboard against the most recent prior tagged version's and exits non-zero if any tracked pass-rate dropped beyond the threshold (default -5%, configurable via `--threshold`). Wired into `/release` skill Phase 1 as Step 7 — the release aborts on regressions before publish. First release with this gate is 0.12.0 itself; from 0.13.0 onward bench-diff actively gates against 0.12 baselines.
|
|
22
|
+
|
|
23
|
+
### Deferred to 0.13
|
|
24
|
+
|
|
25
|
+
- **CLAUDE.md comparative baseline numbers (#4)** — the comparative E2E harness compares static CLAUDE.md (auto-loaded into context) against ClaudeMemory's MCP-tool retrieval, but in headless `claude -p` mode Claude doesn't proactively call the recall tools, so the comparison doesn't yet exercise ClaudeMemory's retrieval path fairly (first run returned a misleading ClaudeMemory 0/10 = no-memory 0/10 vs CLAUDE.md 8/10). Publishing that would mislead, so the numbers are withheld and the harness fix is tracked for 0.13. This surfaced a genuine separable observation — in fully headless, non-tool-forcing usage, ClaudeMemory's contribution rides entirely on the SessionStart context-hook injection — also tracked for 0.13. See `docs/1_0_punchlist.md` #4 / #16.
|
|
26
|
+
|
|
27
|
+
### Upgrade Notes
|
|
28
|
+
|
|
29
|
+
- **Schema migrates automatically to v18** (OTel telemetry tables + `prompt_id` on `activity_events`) on first DB open via `Sequel::Migrator` — no manual step. Round-trip migration specs cover the upgrade path from prior release boundaries.
|
|
30
|
+
- **The staleness marker now appears in SessionStart context** for single-value facts (`uses_database` / `deployment_platform` / `auth_method`) older than 180 days and not recently recalled. This is additive and advisory (a `⚠ stale … verify before relying` note). Tune the window with `CLAUDE_MEMORY_INJECTION_STALE_DAYS`; the existing `CLAUDE_MEMORY_STALE_DAYS` (dashboard review window) is unchanged.
|
|
31
|
+
- No breaking API changes. `has_convention` / `primary_language` predicate synonyms continue to emit deprecation warnings (scheduled for removal in 1.0.0); suppress via `CLAUDE_MEMORY_NO_DEPRECATIONS=1`.
|
|
32
|
+
|
|
7
33
|
## [0.11.0] - 2026-04-30
|
|
8
34
|
|
|
9
35
|
Theme: **Trust & Cost** — five user-visible signals that answer "is memory still worth it?" with numbers a skeptical user can read in <30 seconds.
|
data/CLAUDE.md
CHANGED
|
@@ -15,6 +15,10 @@ ClaudeMemory is a Ruby gem that provides long-term, self-managed memory for Clau
|
|
|
15
15
|
|
|
16
16
|
**Check memory before exploring code.** Use `memory.recall`, `memory.decisions`, `memory.architecture`, or `memory.conventions` to find existing knowledge before reading files.
|
|
17
17
|
|
|
18
|
+
**Public API contract:** [docs/api_stability.md](docs/api_stability.md) is the authoritative stable-surface list (CLI, MCP, hooks, Ruby API, schema, predicate vocabulary). When changing any of those surfaces, update the doc in the same commit; if it's a soft-rename, wire `ClaudeMemory::Deprecations.warn`.
|
|
19
|
+
|
|
20
|
+
**Audit memory health:** run `claude-memory audit` (or `/audit-memory` for an interactive walkthrough) to surface inconsistencies, regressions, and optimization opportunities. See [docs/audit_runbook.md](docs/audit_runbook.md) for per-check rationale and remediation steps.
|
|
21
|
+
|
|
18
22
|
### Git Usage & Best Practices
|
|
19
23
|
|
|
20
24
|
- Before each commit, apply the quality-review skill
|
|
@@ -229,7 +233,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
|
|
|
229
233
|
- Modes: shared (repo), local (uncommitted), home (user directory)
|
|
230
234
|
|
|
231
235
|
- **`MCP`**: Model Context Protocol server and tools (`mcp/`)
|
|
232
|
-
- Exposes memory tools to Claude Code (
|
|
236
|
+
- Exposes memory tools to Claude Code (23 tools total)
|
|
233
237
|
- `Telemetry`: Records tool invocations to `mcp_tool_calls` table for usage stats
|
|
234
238
|
- Dual content/structuredContent responses with compact mode
|
|
235
239
|
|
|
@@ -262,10 +266,13 @@ Facts include:
|
|
|
262
266
|
### Scope System
|
|
263
267
|
|
|
264
268
|
Facts are scoped to control where they apply:
|
|
269
|
+
|
|
270
|
+
<no-memory>
|
|
265
271
|
- **project**: Current project only (e.g., "claude_memory uses SQLite for storage")
|
|
266
272
|
- **global**: All projects (e.g., "I prefer 4-space indentation")
|
|
267
273
|
|
|
268
274
|
Distiller detects signals like "always", "in all projects", "my preference" and sets `scope_hint: "global"`. Users can manually promote facts via `claude-memory promote <fact_id>` or the `memory.promote` MCP tool.
|
|
275
|
+
</no-memory>
|
|
269
276
|
|
|
270
277
|
## Testing Strategy
|
|
271
278
|
|
|
@@ -347,7 +354,7 @@ Also update `SECTION_MAP` if the predicate should appear in a specific snapshot
|
|
|
347
354
|
|
|
348
355
|
The gem includes an MCP server (`claude-memory serve-mcp`) that exposes memory operations as tools. Configuration should be in `.mcp.json` at project root.
|
|
349
356
|
|
|
350
|
-
Available MCP tools (
|
|
357
|
+
Available MCP tools (23 total):
|
|
351
358
|
- **Query & Recall**: `memory.recall`, `memory.recall_index`, `memory.recall_details`, `memory.recall_semantic`, `memory.search_concepts`
|
|
352
359
|
- **Provenance**: `memory.explain`, `memory.fact_graph`
|
|
353
360
|
- **Shortcuts**: `memory.decisions`, `memory.conventions`, `memory.architecture`
|
data/README.md
CHANGED
|
@@ -141,6 +141,16 @@ File-searchable questions ("what version is this?") and one-shot code generation
|
|
|
141
141
|
- **Token Efficient**: 10x reduction in memory queries with progressive disclosure
|
|
142
142
|
- **Database Maintenance**: Compact, export, and backup commands
|
|
143
143
|
- **Built-in Observability** (0.10.0+): `claude-memory dashboard` opens a local web UI with a moments feed, trust panel (token budget, quality score, utilization, feedback), conflicts dedup, knowledge index, and 👍/👎 feedback. See **[Dashboard guide →](docs/dashboard.md)**. `claude-memory digest` writes a weekly markdown report (Activity, Context cost, Quality, New knowledge, Utilization, Conflicts, Feedback); `claude-memory show` prints what would be injected next SessionStart; `claude-memory census` audits the predicate vocabulary across projects.
|
|
144
|
+
- **OpenTelemetry ingestion** (Unreleased): point Claude Code's OTLP exporter at the dashboard and the new "Telemetry" tab shows per-API-call cost in USD, token usage by model, top tools by latency, and a per-prompt event waterfall. One-line setup:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
claude-memory dashboard --port 3377 & # start the receiver
|
|
148
|
+
claude-memory otel --enable # writes telemetry env into .claude/settings.json
|
|
149
|
+
claude-memory otel --enable-traces # optional: include OpenTelemetry spans
|
|
150
|
+
claude-memory otel --status # confirm metrics are flowing
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Only metrics and event names are captured by default — verbatim prompts and bodies stay off until you explicitly opt in via `claude-memory otel --capture-prompts`. The receiver binds to `127.0.0.1` only.
|
|
144
154
|
|
|
145
155
|
## What's New in 0.11.0
|
|
146
156
|
|
|
@@ -307,11 +317,26 @@ The uninstall command removes:
|
|
|
307
317
|
- 📊 [Dashboard](docs/dashboard.md) - Local web UI for inspection and trust signals (0.10.0+)
|
|
308
318
|
- 🔧 [Plugin Setup](docs/plugin.md) - Claude Code integration
|
|
309
319
|
- 🏗️ [Architecture](docs/architecture.md) - Technical deep dive
|
|
320
|
+
- 🔒 [API Stability](docs/api_stability.md) - What's stable / experimental / internal across releases (0.12.0+)
|
|
310
321
|
- 📝 [Changelog](CHANGELOG.md) - Release notes
|
|
311
322
|
|
|
312
323
|
## Benchmarks
|
|
313
324
|
|
|
314
|
-
ClaudeMemory includes **DevMemBench**, a developer-domain benchmark suite that measures retrieval quality
|
|
325
|
+
ClaudeMemory includes **DevMemBench**, a developer-domain benchmark suite that measures retrieval quality, truth maintenance accuracy, **negative-fact harm**, and **uplift over a hand-written CLAUDE.md baseline**. All offline benchmarks run locally at zero cost; end-to-end and comparative runs use real Claude (~$5-15 per full run).
|
|
326
|
+
|
|
327
|
+
### Does memory ever make Claude *wrong*?
|
|
328
|
+
|
|
329
|
+
Every other benchmark measures whether memory helps. The negative-fact harm benchmark measures whether memory can hurt — injecting a stale, mis-scoped, superseded, or reference-material fact and watching Claude follow it. 13 scenarios across 4 harm classes, each with a realistic project scaffold whose actual state contradicts the wrong fact, scored best-of-3 by majority vote. The run fails the build if any scenario reliably produces a harm (>1%).
|
|
330
|
+
|
|
331
|
+
```bash
|
|
332
|
+
EVAL_MODE=real HARM_BENCH_RUNS=3 EVAL_MAX_BUDGET_USD=0.50 bundle exec rspec spec/benchmarks/e2e/harm_bench_spec.rb
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
**0.12 baseline (2026-05-28): 0/13 harm.** See [`spec/benchmarks/README.md`](spec/benchmarks/README.md#harm_scenariosyml-13-scenarios-full-corpus-0120) for the full corpus and methodology.
|
|
336
|
+
|
|
337
|
+
### Is this better than a hand-written CLAUDE.md?
|
|
338
|
+
|
|
339
|
+
The single most important question for adoption is whether dynamic retrieval beats static context injection. ClaudeMemory ships a `CLAUDE.md baseline` adapter and a comparative E2E harness for exactly this. **The numbers aren't published yet (as of 0.12):** the current harness compares static CLAUDE.md (auto-loaded into every prompt) against ClaudeMemory's MCP-tool retrieval, but in headless `claude -p` mode Claude doesn't proactively call the recall tools, so the comparison doesn't yet exercise ClaudeMemory's retrieval path fairly. Publishing that gap as a headline number would mislead. The harness fix is tracked for 0.13 — see [`docs/1_0_punchlist.md`](docs/1_0_punchlist.md) #4.
|
|
315
340
|
|
|
316
341
|
### Latest Results
|
|
317
342
|
|
|
@@ -324,6 +349,9 @@ ClaudeMemory includes **DevMemBench**, a developer-domain benchmark suite that m
|
|
|
324
349
|
| **Hybrid Retrieval** | Recall@5 (100 queries aggregate) | **72.7%** |
|
|
325
350
|
| **Hybrid Retrieval** | Recall@10 (20 hard queries) | **62.8%** |
|
|
326
351
|
| **Scope Ranking** | Queries returning expected facts | **5/5** |
|
|
352
|
+
| **Negative-Fact Harm (prototype)** | 0.11 baseline (3 scenarios, real Claude) | **0/3** |
|
|
353
|
+
| **Negative-Fact Harm (full corpus)** | 0.12 baseline (13 scenarios, best-of-3, real Claude) | **0/13 (0.0%)** |
|
|
354
|
+
| **E2E vs CLAUDE.md baseline** | 0.12 acceptance-rate delta (10 scenarios) | *deferred to 0.13 — harness doesn't exercise headless retrieval (#4)* |
|
|
327
355
|
|
|
328
356
|
Semantic and hybrid retrieval use [fastembed-rb](https://github.com/khasinski/fastembed-rb) with the BAAI/bge-small-en-v1.5 model (384-dim, runs locally, no API key needed).
|
|
329
357
|
|