RubyGems - claude_memory - Versions diffs - 0.10.0 → 0.12.0 - Mend

claude_memory 0.10.0 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (72) hide show

checksums.yaml +4 -4
data/.claude/memory.sqlite3 +0 -0
data/.claude/rules/claude_memory.generated.md +42 -64
data/.claude/skills/release/SKILL.md +44 -6
data/.claude/skills/study-repo/SKILL.md +15 -0
data/.claude-plugin/commands/audit-memory.md +68 -0
data/.claude-plugin/marketplace.json +1 -1
data/.claude-plugin/plugin.json +1 -1
data/CHANGELOG.md +70 -0
data/CLAUDE.md +20 -5
data/README.md +64 -2
data/db/migrations/018_add_otel_telemetry.rb +81 -0
data/docs/1_0_punchlist.md +522 -89
data/docs/GETTING_STARTED.md +3 -1
data/docs/api_stability.md +341 -0
data/docs/architecture.md +3 -3
data/docs/audit_runbook.md +209 -0
data/docs/claude_monitoring.md +956 -0
data/docs/dashboard.md +23 -3
data/docs/improvements.md +329 -5
data/docs/influence/ai-memory-systems-2026.md +403 -0
data/docs/memory_audit_2026-05-21.md +303 -0
data/docs/plugin.md +1 -1
data/docs/quality_review.md +35 -0
data/lib/claude_memory/audit/checks.rb +239 -0
data/lib/claude_memory/audit/finding.rb +33 -0
data/lib/claude_memory/audit/runner.rb +73 -0
data/lib/claude_memory/commands/audit_command.rb +117 -0
data/lib/claude_memory/commands/dashboard_command.rb +2 -1
data/lib/claude_memory/commands/digest_command.rb +95 -3
data/lib/claude_memory/commands/hook_command.rb +27 -2
data/lib/claude_memory/commands/import_auto_memory_command.rb +180 -0
data/lib/claude_memory/commands/initializers/hooks_configurator.rb +7 -4
data/lib/claude_memory/commands/otel_command.rb +240 -0
data/lib/claude_memory/commands/registry.rb +5 -1
data/lib/claude_memory/commands/show_command.rb +90 -0
data/lib/claude_memory/commands/stats_command.rb +94 -2
data/lib/claude_memory/configuration.rb +60 -0
data/lib/claude_memory/core/fact_query_builder.rb +1 -0
data/lib/claude_memory/dashboard/api.rb +8 -0
data/lib/claude_memory/dashboard/index.html +140 -1
data/lib/claude_memory/dashboard/prompt_journey.rb +48 -0
data/lib/claude_memory/dashboard/server.rb +86 -0
data/lib/claude_memory/dashboard/telemetry.rb +156 -0
data/lib/claude_memory/dashboard/trust.rb +180 -11
data/lib/claude_memory/deprecations.rb +106 -0
data/lib/claude_memory/distill/bare_conclusion_detector.rb +71 -0
data/lib/claude_memory/distill/reference_material_detector.rb +37 -4
data/lib/claude_memory/hook/auto_memory_mirror.rb +7 -3
data/lib/claude_memory/hook/context_injector.rb +11 -2
data/lib/claude_memory/hook/handler.rb +142 -1
data/lib/claude_memory/mcp/tool_definitions.rb +3 -3
data/lib/claude_memory/otel/attributes.rb +118 -0
data/lib/claude_memory/otel/constants.rb +32 -0
data/lib/claude_memory/otel/ingestor.rb +54 -0
data/lib/claude_memory/otel/otlp_json_envelope.rb +254 -0
data/lib/claude_memory/otel/prompt_scope.rb +108 -0
data/lib/claude_memory/otel/settings_writer.rb +122 -0
data/lib/claude_memory/otel/status.rb +58 -0
data/lib/claude_memory/recall/staleness_annotator.rb +73 -0
data/lib/claude_memory/resolve/predicate_policy.rb +17 -1
data/lib/claude_memory/resolve/resolver.rb +30 -3
data/lib/claude_memory/shortcuts.rb +61 -18
data/lib/claude_memory/store/prompt_journey_query.rb +87 -0
data/lib/claude_memory/store/schema_manager.rb +1 -1
data/lib/claude_memory/store/sqlite_store.rb +136 -0
data/lib/claude_memory/sweep/maintenance.rb +31 -1
data/lib/claude_memory/sweep/sweeper.rb +6 -0
data/lib/claude_memory/templates/hooks.example.json +5 -0
data/lib/claude_memory/version.rb +1 -1
data/lib/claude_memory.rb +20 -0
metadata +28 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a299c6ab2aeb95123dcb61f5c87a06b93d15a00a2ed9ff2c8343e7fde6b369cb
-  data.tar.gz: d09c02a2f5dcd4bd0dfcb625793505bd2218c7df04230411e813a7543e7e7382
+  metadata.gz: 935474b9efd1d9fd317c410728ed36b1cedb0854d1db6e71cd45ce6372253b9c
+  data.tar.gz: 1514d8b44e8ee25d139cd0544ab527bfa00ff54ace9786b7c47709afc0095024
 SHA512:
-  metadata.gz: 87fd7dab40cb2e5b190de071f99bcc1394e98e5f426951eedaff09b190fa66591b40f49580bca45f75819170ba939a0d3d9239f4825825b431fd4a83d388bb7d
-  data.tar.gz: ffb4ab50ba94a8f3c7bfb8129f01ea96fd27b981dd614f0addd7d65a9fc2b4b8562b9d23148bb5ea4ee90b5ae5a9fc183d1c82e68d3b009557967a00b96bfec1
+  metadata.gz: f70d8858628ecbd350e32f89da3dcdd8ff99ff2c61c85dc938a512ab79f9d6bae46d87677168a7d9059fdace727657e388aeed48a451ae8340ac2af5dd9384b8
+  data.tar.gz: 3164ae23db7e7b8fd66ece557a84953727181e2b06d2db80c4a807678015b064f196489768fb50c980c7d1de8f732b86dd5c0666d9f766a0d05ce52757f1b23d

data/.claude/memory.sqlite3 CHANGED Viewed

Binary file

data/.claude/rules/claude_memory.generated.md CHANGED Viewed

@@ -1,7 +1,7 @@
 <!--
   This file is auto-generated by claude-memory.
   Do not edit manually - changes will be overwritten.
-  Generated: 2026-04-16T17:54:51Z
+  Generated: 2026-06-01T11:23:10Z
 -->
 # Project Memory
@@ -14,14 +14,38 @@
 ## Conventions
-- Curated predicate vocabulary has 8 entries: multi-value (convention, decision, architecture, uses_framework, uses_language) and single-value (uses_database, deployment_platform, auth_method). Pruned 7 dead predicates after multi-project survey confirmed zero usage.
-- Before making design changes to schemas, vocabularies, or policies, survey actual usage data across multiple project databases under ~/src/ — single-project analysis can validate wrong assumptions. The uses_framework cardinality bug was only visible across multi-project data.
-- A/B testing memory plugin: use 'claude --bare --mcp-config=/tmp/mcp-test.json -p' with serve-mcp.sh path. --plugin-dir doesn't work with --bare despite docs claiming it should. Architecture/convention/preference questions differentiate best; grep-able and one-shot code-gen questions don't.
-- NullDistiller emits uses_language for language-type entities (added 0.9.0), alongside existing uses_database, uses_framework, deployment_platform. Migration v14 canonicalizes stale predicate names (has_convention → convention, primary_language → uses_language) in existing facts.
-- CLAUDE.md scope-system example text ('this app uses PostgreSQL') causes recurring distiller hallucinations. Reject + re-ingest creates rejection churn because rejection metadata doesn't block re-insertion at content level. Open product gap — workaround: wrap example text in <no-memory> tags.
-- claude-memory restore --predicate NAME recovers facts superseded by obsolete single-value classifications. Uses Jaccard token overlap (threshold 0.5) to distinguish bug-caused supersession from real corrections. Only operates on predicates currently classified multi-value. Opt-in per DB, supports --dry-run.
-- claude-memory reject <id_or_docid> marks facts as rejected and resolves associated open conflicts in a single transaction. Accepts integer fact IDs or 8-char hex docids. memory.reject_fact MCP tool mirrors the CLI.
-- The /release skill automates gem releases in three phases: prepare (version bump across 3 files, bundle install, MCP verify, tests, lint, CHANGELOG check, commit), publish (user-driven: git push + rake release), announce (fix GitHub Latest flags, create gh release from CHANGELOG). Never auto-pushes.
+- A/B testing methodology for memory plugin evaluation — How to test with/without memory using claude CLI — --plugin-dir doesn't work with --bare, use --mcp-config instead — To A/B test memory's impact on Claude Code responses: (imported from project auto-memory; see source file for full reasoning)
+- do...end blocks over braces when args repeat or block is non-trivial — Block syntax preference for multi-argument or multi-expression blocks — When a block call has repeated argument names, multiple expressions, or reads awkwardly on one line, use `do...end` form rather than `{ }`. One-liners with simple single-expression bodies and short argument lists are fine with braces. (imported from project auto-memory; see source file for full reasoning)
+- Always commit .claude/memory.sqlite3 — Per user direction (2026-05-21), .claude/memory.sqlite3 should be staged and committed alongside any change that updates project memory — even though it's a binary SQLite DB with WAL artifacts. — Always include `.claude/memory.sqlite3` in commits that touch project memory or knowledge. (imported from project auto-memory; see source file for full reasoning)
+- Commit workflow preferences — How the user prefers commits to be structured and when to make them — Wait for the user to ask for commits — don't commit proactively. When asked, group changes into logical atomic commits: (imported from project auto-memory; see source file for full reasoning)
+- Data-driven analysis before design changes — User expects thorough multi-project data surveys and critical questioning of assumptions before committing to architectural changes — When proposing design changes (especially to schemas, vocabularies, or policies), gather real usage data first and present a critical analysis before implementing. (imported from project auto-memory; see source file for full reasoning)
+- Fix hallucination triggers at the source, not via repeated reject churn — When the distiller repeatedly produces the same wrong fact, trace it to the CLAUDE.md / docs example text; fixing the source stops re-appearance — When the dashboard's Conflicts tab accumulates clusters of the same kind of bad fact (e.g. many `uses_database` contradictions against `sqlite`), the root cause is almost always **example text in documentation** that the distiller is interpreting as a literal claim about the current repo. Single-value predicates (`uses_database`, `deployment_platform`, `auth_method`) are especially vulnerable becau... (imported from project auto-memory; see source file for full reasoning)
+- Hooks run the installed gem, not the working copy — always `rake install` after editing hook/MCP code — .claude/settings.json hooks invoke `claude-memory` via PATH, so changes on a branch only take effect after `bundle exec rake install` — `.claude/settings.json` hooks call bare `claude-memory hook ingest` / `claude-memory hook context` / etc. That resolves via PATH to the installed gem, not the working-copy `./exe/claude-memory`. After editing any hook/MCP/distiller code on a branch, the change does NOT reach Claude Code until `bundle exec rake install` rebuilds and reinstalls the gem (which overwrites the prior install at the same ... (imported from project auto-memory; see source file for full reasoning)
+- No extra API costs for features — User strongly prefers using Claude Code itself (skills, context hooks) over separate API calls that cost extra money — Do not add features that require separate Anthropic API calls (e.g., via anthropic-rb gem) when Claude Code itself can perform the same task. Use skills, context hook injection, and MCP tools to leverage the existing Claude Code session instead. (imported from project auto-memory; see source file for full reasoning)
+- Quality review update cycle — Keep quality_review.md current as items are resolved — don't let it drift — When completing quality review items, update `docs/quality_review.md` immediately: (imported from project auto-memory; see source file for full reasoning)
+- Refactoring approach preferences — How to approach god object extraction and structural refactoring in this codebase — Use module inclusion (not class extraction) when breaking up god objects. Include modules directly into the existing class so the public API is unchanged and zero tests need modification. This was validated three times:
+- Round-trip migration specs cover each prior release boundary — For pre-release prep, write end-to-end migration specs from every distinct schema boundary back through ~3 prior releases — Before cutting a release that includes migrations, add round-trip specs that fixture an older DB at each distinct prior release's schema version, open via `SQLiteStore.new`, and assert the full upgrade path: schema_info advancement, data preservation across entities/facts/content_items/provenance, additive table/column creation, predicate-rewrite effects where applicable, and idempotency on re-open... (imported from project auto-memory; see source file for full reasoning)
+- Codify behavioral contracts in tests, not just comments — When code has a deliberate scope limitation (one-shot, advisory-only, intentionally non-idempotent), write a test that fails if someone "fixes" it into being more general — When code has a deliberate scope limitation — a one-shot data migration, an advisory-only field, a method intentionally not idempotent for new inputs — write a test that exercises a scenario which would *break* if someone tried to make it more general. (imported from project auto-memory; see source file for full reasoning)
+- Treat UX gaps as architecture smells — user inspection/debugging questions expose god classes and missing abstractions — When users ask "can I see/debug/act on X in the dashboard?", the answer is almost always "we need a new class or route, not a new button" — Across three architectural reviews in the 2026-04-17 → 20 session, every concrete UX gap the user identified traced back to a structural issue the code already had, not a frontend-only fix. Treating critique as a forcing function for refactoring produced cleaner results than either extracting preemptively (premature) or patching only the surface (symptom). (imported from project auto-memory; see source file for full reasoning)
+- "database disk image is malformed" from FTS5 `ORDER BY rank` after sqlite3 .recover — sqlite3 .recover restores rows but can leave contentless FTS5 auxiliary indexes in a state where basic MATCH works but ORDER BY rank throws "malformed"; fix is `claude-memory compact` to rebuild the FTS index — A DB recovered via `sqlite3 corrupt.db .recover > dump.sql && sqlite3 fresh.db < dump.sql` can end up with an FTS5 index that's *partially* functional: (imported from project auto-memory; see source file for full reasoning)
+- `rake install` uses `git ls-files`; untracked files silently disappear from the gem — Running `bundle exec rake install` before staging new files produces a gem missing those files, causing LoadError in hooks and MCP server — The claude_memory gemspec builds its file list via `IO.popen(%w[git ls-files -z], ...)` (claude_memory.gemspec:24). Any file that hasn't been `git add`ed at build time is **invisible to the gem** even though it exists on disk. The local working copy keeps running fine (dashboard server uses `./exe/claude-memory` against the repo directly), but the installed gem at `~/.gem/ruby/*/gems/claude_memory-... (imported from project auto-memory; see source file for full reasoning)
+- Distiller scope_hint is advisory, not a routing signal — NullDistiller emits scope_hint: "global" for text matching GLOBAL_SCOPE_PATTERNS, but the resolver never routes writes between stores — scope_hint must not override fact.scope — `Distill::NullDistiller#global_scope_signal?` matches text like "always" / "my preference" / "in all projects" and stamps `scope_hint: "global"` on every fact extracted from that text. The hint is advisory metadata for downstream promotion decisions. It is NOT a routing signal — the resolver writes to whichever `SQLiteStore` was injected into it (always the project DB in the normal ingest path), re... (imported from project auto-memory; see source file for full reasoning)
+- Sequel DB reads must use the extralite adapter — Opening a SQLite DB for ad-hoc reads requires the extralite adapter URI; Sequel.sqlite silently depends on an ungem'd sqlite3 — Never use `Sequel.sqlite(db_path)` or `Sequel.sqlite(db_path, readonly: true)` in this codebase. The gemspec lists only `extralite (~> 2.14)` — it does **not** depend on the `sqlite3` gem. `Sequel.sqlite` routes through Sequel's `sqlite` adapter which requires `gem "sqlite3"` and will raise `Sequel::AdapterNotFound: LoadError: cannot load such file -- sqlite3` at runtime. (imported from project auto-memory; see source file for full reasoning)
+- Never `git checkout --` an active SQLite DB with WAL mode — Using `git checkout --` on .claude/memory.sqlite3 while readers/writers are open corrupts the DB via WAL/main file mismatch — Never run `git checkout -- .claude/memory.sqlite3` (or any SQLite DB in WAL mode) while any process has it open. Git replaces only the main DB file, leaving the WAL/SHM sidecar files referencing pages that no longer exist in the replaced file. Next read → "Extralite::Error: database disk image is malformed" and integrity_check shows btree errors across multiple trees. (imported from project auto-memory; see source file for full reasoning)
+- SQLiteStore silently creates in-memory DB for relative paths — `SQLiteStore.new('.claude/memory.sqlite3')` with a relative path opens an empty in-memory DB, not the file — always pass absolute paths in tests/probes — `SQLiteStore.new(path)` builds a Sequel URI as `extralite:#{path}`. With a relative path like `.claude/memory.sqlite3`, the resulting URI `extralite:.claude/memory.sqlite3` is parsed with an empty database component, so Extralite opens an in-memory database. Schema migrations run against the in-memory DB (so `schema_version` reports the current version), but ALL queries return 0 rows and the real f... (imported from project auto-memory; see source file for full reasoning)
+- Two tool_calls tables exist — don't conflate them — tool_calls (v3) stores transcript-observed Claude Code tool usage; mcp_tool_calls (v13) stores MCP server telemetry — There are **two** tables with similar names serving different purposes: (imported from project auto-memory; see source file for full reasoning)
+- Distiller hallucination from CLAUDE.md example text — The scope-system example in CLAUDE.md causes recurring false fact extraction — reject + re-ingest creates rejection churn — CLAUDE.md contains a scope-system explanation with example text: (imported from project auto-memory; see source file for full reasoning)
+- PredicatePolicy is the single source of truth for predicate vocabulary — All predicate knowledge (vocabulary, cardinality, sections, synonyms, LLM guidance) derives from PredicatePolicy — never hardcode predicate names elsewhere — As of 0.9.0, `PredicatePolicy` in `lib/claude_memory/resolve/predicate_policy.rb` is the authoritative source for all predicate-related behavior. This was a deliberate consolidation after finding the same predicate list duplicated in 4 files that drifted independently. (imported from project auto-memory; see source file for full reasoning)
+- Hook-telemetry features need a manual hook trigger to verify in production, not just specs — `bundle exec rake install` AND fire a real hook AND check `sqlite3 ... json_extract(detail_json, '$.<field>')`, because specs assert against working-tree code but `.claude/settings.json` hooks invoke the installed gem via PATH so the asserted field can be silently absent in production. Hit on 2026-04-30 shipping #47 token-budget telemetry: 156 specs green but `context_tokens` was missing from 24h of real activity_events.
+- When verifying any new field on activity_events.detail_json, the canonical smoke test is: `echo '{"hook_event_name":"SessionStart","session_id":"smoketest","source":"startup","cwd":"$(pwd)"}' | claude-memory hook context` then inspect via `sqlite3 .claude/memory.sqlite3 "SELECT json_extract(detail_json, '$.<field>') FROM activity_events WHERE event_type='hook_context' ORDER BY id DESC LIMIT 1"`. If null after rake install, the installed gem code hasn't picked up the change.
+- Treat UX gaps as architecture smells: when a user asks "can I see/debug/act on X in the dashboard?" the answer is almost always a missing class or route, not a new button. Every UX critique in this project's dashboard work traced to a structural gap — god-class growth, four drifting fact serializers, scope_hint as silent scope override, no fact detail endpoint. Pattern: the surface question is usually a router into the architecture. Before reaching for frontend fixes, ask "what server-side data shape would make this easy?" — if that shape doesn't exist cleanly, treat it as the real work. Commit refactor separately from feature it enables ([Refactor]/[Feature]/[Fix] prefixes) so the critique→structural fix→UX fix chain is visible in git log.
+- Four-surface staleness: after any change that touches UI + backend + plugin-launched binaries, refresh all four or the change looks broken. (1) bundle exec rake install so the installed gem catches up (hooks + MCP launched by Claude Code run from PATH). (2) Ctrl-C and re-run ./exe/claude-memory dashboard — server is long-lived Ruby, no live-reload. (3) /mcp reconnect in Claude Code so the MCP subprocess respawns. (4) Hard-refresh browser (Cmd-Shift-R) so cached index.html JS reloads. Skipping any produces confusing "my fix doesn't work." rspec green does NOT mean end-to-end works; before declaring UI-affecting changes done, curl the endpoint and verify shape matches frontend expectation. curl alt-port dashboard (--port 3388 --no-open in background) is fastest smoke test without disturbing user's running dashboard.
+- fact.scope MUST match the DB the fact lives in. The distiller may emit scope_hint="global" from GLOBAL_SCOPE_PATTERNS but scope_hint is advisory only — it never routes writes or overrides the destination store's scope. Using scope_hint as a scope override produced orphaned scope=global rows inside the project DB that global recall couldn't see. Users move project facts to global via claude-memory promote (StoreManager#promote_fact does the proper cross-store copy). Sweep::Maintenance#fix_scope_leakage cleans drift in existing DBs. Invariant documented in gotcha_scope_hint_not_routing memory.
+- "disk image is malformed" from an FTS5 ORDER BY rank query after a sqlite3 .recover restore is usually FTS auxiliary-index corruption, not real DB damage. Diagnostic chain: PRAGMA integrity_check on fresh connection (ok means main DB is fine), plain MATCH (works means b-tree is fine), ORDER BY rank (fails means FTS internals rotted). Fix with claude-memory compact — rebuilds FTS from source content in a few seconds. Do NOT reach for sqlite3 .recover a second time; that's the class of action that leaves FTS in this broken state. Three distinct malformed-error flavors in this project: real corruption (use .recover), WAL stale-cache phantom from long-lived readers (release connections per request), FTS5 rank rot (compact).
+- Name collision: claude-memory recover (CLI) resets stuck operations from OperationTracker; it does NOT repair corrupt SQLite. For real disk-image corruption use sqlite3 corrupt.db ".recover" > dump.sql && sqlite3 fresh.db < dump.sql then open via SQLiteStore for migrations. Recovery can leave contentless FTS5 in a partial state where plain MATCH works but ORDER BY rank fails — follow up with claude-memory compact. Verified recoverable on 2026-04-16 — 76 facts / 56 entities / 183 content_items restored from a DB with multiple btree errors.
+- Hooks in .claude/settings.json invoke bare 'claude-memory' which resolves via PATH to the installed gem, not the working-copy ./exe/claude-memory. After editing Hook::Handler, MCP::Tools, MCP::Server, ActivityLog, Distill::*, or any commands/ file on a branch, run 'bundle exec rake install' before expecting hooks or the Claude-Code-launched MCP server to see the change. The dashboard server (./exe/claude-memory dashboard) is the exception — it runs from the working copy directly. When reinstalling while servers are running, also restart them (Ctrl-C or /mcp reconnect) since old code stays in memory.
+- Never 'git checkout --' .claude/memory.sqlite3 (or any WAL-mode SQLite DB) while a reader/writer has it open. Git replaces the main DB file but leaves the -wal and -shm sidecar files referencing stale pages, which corrupts the DB on next read. Safe sequence: stop all holders (MCP server, dashboard, hooks), delete the -wal and -shm sidecars, then checkout. Recovery path: sqlite3 corrupt.db .recover > dump.sql then reimport into a fresh file.
+- SQLiteStore.new with a relative path silently opens an empty in-memory DB instead of the file. The URI built in retry_handler.rb is extralite:<path>, and extralite:.claude/memory.sqlite3 parses with an empty database option so Extralite treats it as in-memory. Schema migrations run so schema_version looks correct but all queries return 0 rows. Production callers go through StoreManager with absolute paths (via Configuration) so this does not bite in normal use. In ad-hoc probes always pass File.expand_path(path). Diagnostic: if store.facts.count returns 0 while sqlite3 CLI shows rows, suspect this before suspecting corruption.
 - Never use Sequel.sqlite for DB reads; this gem only depends on extralite. Use Sequel.connect("extralite://#{db_path}") or SQLiteStore.new. Sequel.sqlite requires the ungem'd sqlite3 adapter and fails at runtime.
 - Two distinct tool_calls tables exist: tool_calls (v3) for transcript-observed Claude Code tool usage, and mcp_tool_calls (v13) for MCP server telemetry. Disjoint purposes, never join.
 - MCP tool-call telemetry is recorded via MCP::Telemetry wrapping Server#handle_tools_call. Writes to mcp_tool_calls table in the project DB. Swallows DB errors so telemetry never breaks a real tool response. Viewable via 'claude-memory stats --tools [--since DAYS]'.
@@ -40,74 +64,28 @@
 - ContentSanitizer strips system-reminder, local-command-caveat, command-message, command-name, command-args tags in addition to private/no-memory/secret/claude-memory-context.
 - Core::RelativeTime module provides progressive time formatting: just now → Xm ago → Xh ago → Xd ago → YYYY-MM-DD. Used in ResponseFormatter for *_ago fields.
 - MCP server registers memory_guide prompt via prompts/list and prompts/get endpoints. QueryGuide module holds prompt content.
-- Claude Code plugin with marketplace.json, skill definitions, MCP server bundling. 5,700+ stars, by Tobi Lütke. Custom fine-tuned query expansion (Qwen3-1.7B, SFT+GRPO). Dual content/structuredContent MCP pattern.
-- Cloud-backed Claude Code plugin (~1,195 LOC JavaScript) using Supermemory API for persistent memory across sessions. Uses hooks for SessionStart context injection and Stop transcript capture. No local database.
 ## Technical Constraints
+- **Uses framework**: django
+- **Uses language**: typescript
+- **Uses language**: python
 - **Uses framework**: rails
-- **Deployment platform**: aws
+- **Uses framework**: react
+- **Uses framework**: sinatra
+- **Uses language**: javascript
+- **Uses language**: go
+- **Uses language**: ruby
 - **Uses database**: sqlite
 ## Additional Knowledge
 ### Architecture
-- repo: PredicatePolicy is the single source of truth for predicate vocabulary (POLICIES), cardinality, snapshot section mapping (SECTION_MAP), synonym canonicalization (SYNONYMS), and LLM guidance. tool_definitions.rb, publish.rb, and distill-transcripts.md all derive from PredicatePolicy. Never hardcode predicate names elsewhere.
+- mcp_server: Claude Code does NOT pass its session_id into plugin-spawned MCP server subprocesses — neither via JSON-RPC transport nor CLAUDE_SESSION_ID env var. Configuration.new.session_id returns nil inside the MCP process, so MCP-originated activity events (recall, store_extraction) get session_id=nil. Hook commands are different — .claude/settings.json payloads explicitly include session_id in their JSON. For dashboards or any feature that needs per-session attribution of MCP-originated events, correlate by time window using hook events (which do carry session_id) rather than strict session_id equality. Dashboard::API#efficacy uses session_window + within_window? for this.
 - MCP::Tools: Thin 104-line dispatcher that includes 6 handler modules in mcp/handlers/: QueryHandlers, ShortcutHandlers, ContextHandlers, ManagementHandlers, StatsHandlers, SetupHandlers
 - Recall: 94-line facade delegating to @engine (DualEngine or LegacyEngine), both include shared QueryCore module with all store-level query logic
 - SQLiteStore: 386-line CRUD class that includes RetryHandler (retry/connection logic) and SchemaManager (migrations/version sync) modules
 - Embeddings: Pluggable providers via Embeddings.resolve(name, env:). Three providers: tfidf (default), fastembed, api. Duck-typed contract: name, dimensions, generate(text). ENV: CLAUDE_MEMORY_EMBEDDING_PROVIDER
 - Embeddings::DimensionCheck: Pure value object — DimensionCheck.call(store, provider) returns Data.define Result with :fresh/:match/:mismatch status. No side effects; caller decides how to handle mismatch.
-## Open Conflicts
-The following facts are in conflict and need resolution:
-- Conflict #12: Fact 21 vs Fact 43
-- Conflict #13: Fact 21 vs Fact 44
-- Conflict #14: Fact 45 vs Fact 46
-- Conflict #15: Fact 45 vs Fact 47
-- Conflict #16: Fact 48 vs Fact 49
-- Conflict #17: Fact 45 vs Fact 50
-- Conflict #18: Fact 21 vs Fact 51
-- Conflict #19: Fact 48 vs Fact 52
-- Conflict #20: Fact 21 vs Fact 53
-- Conflict #21: Fact 21 vs Fact 54
-- Conflict #22: Fact 21 vs Fact 55
-- Conflict #23: Fact 21 vs Fact 56
-- Conflict #24: Fact 21 vs Fact 57
-- Conflict #25: Fact 48 vs Fact 58
-- Conflict #26: Fact 48 vs Fact 59
-- Conflict #27: Fact 48 vs Fact 60
-- Conflict #28: Fact 21 vs Fact 61
-- Conflict #29: Fact 21 vs Fact 62
-- Conflict #30: Fact 21 vs Fact 63
-- Conflict #31: Fact 45 vs Fact 64
-- Conflict #32: Fact 21 vs Fact 65
-- Conflict #33: Fact 21 vs Fact 66
-- Conflict #34: Fact 21 vs Fact 67
-- Conflict #35: Fact 45 vs Fact 68
-- Conflict #36: Fact 45 vs Fact 69
-- Conflict #37: Fact 48 vs Fact 70
-- Conflict #38: Fact 48 vs Fact 71
-- Conflict #39: Fact 21 vs Fact 72
-- Conflict #40: Fact 21 vs Fact 73
-- Conflict #41: Fact 21 vs Fact 74
-- Conflict #42: Fact 21 vs Fact 75
-- Conflict #43: Fact 21 vs Fact 76
-- Conflict #44: Fact 45 vs Fact 77
-- Conflict #45: Fact 45 vs Fact 78
-- Conflict #46: Fact 45 vs Fact 79
-- Conflict #47: Fact 45 vs Fact 80
-- Conflict #48: Fact 45 vs Fact 81
-- Conflict #49: Fact 48 vs Fact 82
-- Conflict #50: Fact 48 vs Fact 83
-- Conflict #51: Fact 48 vs Fact 84
-- Conflict #52: Fact 48 vs Fact 85
-- Conflict #53: Fact 48 vs Fact 86
-- Conflict #54: Fact 48 vs Fact 87
-- Conflict #55: Fact 48 vs Fact 88
-- Conflict #56: Fact 48 vs Fact 89
-- Conflict #57: Fact 48 vs Fact 90

data/.claude/skills/release/SKILL.md CHANGED Viewed

@@ -63,7 +63,43 @@ bundle exec rspec
 All tests must pass. Do not proceed with any failures. Fix them first.
-### Step 6: Run the linter
+### Step 6: Run the pre-release hook smoke gate
+```bash
+bin/pre-release-smoke
+```
+This script:
+1. Re-runs `bundle exec rake install` so the PATH-resolved `claude-memory` binary matches the working tree.
+2. Triggers each gem-managed hook against a temp DB.
+3. Verifies every field listed in `spec/smoke/expected_fields.yml` is populated on the resulting `activity_events.detail_json`.
+4. Exits non-zero with the missing field name(s) and `since_version` if any expected field is null/absent.
+**This catches the class of bug specs cannot:** a code change that adds a new `detail_json` field but forgets `rake install`, leaving the installed gem stale and production hooks silently missing the field. Sprung that trap on 2026-04-16 (ActivityLog) and again on 2026-04-30 (#47 token-budget) — the gate is here so it can't happen a third time.
+If the gate fails, **stop the release**, address the missing field (usually `bundle exec rake install` followed by re-running the gate), and only proceed when it exits 0.
+### Step 7: Run the benchmark scoreboard diff
+```bash
+bin/run-evals --benchmarks
+bin/bench-diff
+```
+`bin/run-evals --benchmarks` writes `spec/benchmarks/results/<version>.json` — the diff-friendly snapshot of the current release's pass rates by category and per-scenario. `bin/bench-diff` then compares that snapshot against the most recent prior tagged version's scoreboard and exits non-zero if any tracked pass-rate dropped beyond the threshold (default -5%; configurable via `--threshold`).
+The first release with this gate (0.12.0) has no prior scoreboard to compare against — `bench-diff` exits 0 with a "No baseline scoreboard available" note. From 0.13.0 onward it actively gates.
+If the diff reports a regression, **stop the release**, investigate (the regressed metric path is named in stderr — e.g. `metrics.evals.by_scenario.tech_stack_recall.pass_rate`), and only proceed once you've either (a) fixed the regression or (b) made a deliberate decision that the lower pass rate is acceptable. If (b), document the new baseline in CHANGELOG so future-you isn't surprised.
+For real-mode E2E coverage (~$2-8 per run), pass `EVAL_MODE=real`:
+```bash
+EVAL_MODE=real bin/run-evals --all && bin/bench-diff
+```
+### Step 8: Run the linter
 ```bash
 bundle exec rake standard:fix
@@ -71,7 +107,7 @@ bundle exec rake standard:fix
 Ensure no remaining violations.
-### Step 7: Verify CHANGELOG.md
+### Step 9: Verify CHANGELOG.md
 The CHANGELOG should already have a release section written during development (via `/improve`, manual commits, or other workflow). **Do not auto-generate release notes** — they should reflect the actual development narrative.
@@ -83,7 +119,7 @@ Check that:
 If the CHANGELOG section is missing or incomplete, **stop and ask the user**. Do not fabricate release notes.
-### Step 8: Commit the version bump
+### Step 10: Commit the version bump
 ```bash
 git add lib/claude_memory/version.rb .claude-plugin/plugin.json .claude-plugin/marketplace.json Gemfile.lock
@@ -111,7 +147,7 @@ Wait for the user to confirm before proceeding to Phase 3.
 ## Phase 3: Announce
-### Step 9: Fix any stale "Latest" flags on GitHub releases
+### Step 11: Fix any stale "Latest" flags on GitHub releases
 Check current release state:
@@ -125,7 +161,7 @@ If an older release is incorrectly marked "Latest" (this happens when releases a
 gh release edit v<old-version> --latest=false
 ```
-### Step 10: Create the GitHub release
+### Step 12: Create the GitHub release
 Extract the release notes from CHANGELOG.md — everything between `## [X.Y.Z]` and the next `## [` heading. Write to a temp file:
@@ -146,7 +182,7 @@ gh release create vX.Y.Z \
 The title should capture the theme of the release in a few words (e.g., "Predicate Design Overhaul, Reject/Restore, Telemetry"). Read the CHANGELOG to derive this — don't ask the user unless the theme isn't obvious.
-### Step 11: Verify the release
+### Step 13: Verify the release
 ```bash
 gh release list --limit 5
@@ -161,6 +197,8 @@ Report the release URL to the user.
 ## Error Handling
+- **Smoke gate fails (`bin/pre-release-smoke` exits non-zero)**: The script names the missing `detail_json` field and `since_version` in stderr. Most common cause: code that adds a new field landed without a follow-up `bundle exec rake install`, so the installed gem is stale. Re-run `rake install`, then re-run the gate. If the field was newly added but no `rake install` was run, that's the bug the gate is designed to catch — don't bypass it. If the manifest needs updating because a field was intentionally removed, edit `spec/smoke/expected_fields.yml` AND add a CHANGELOG breaking-change note (removing a `detail_json` field is a public API change per `docs/api_stability.md` §4).
+- **Bench-diff fails (`bin/bench-diff` exits 1)**: Stderr names the metric path that regressed (e.g. `metrics.evals.by_scenario.tech_stack_recall.pass_rate`). Investigate the regression — is it a real correctness issue, or a measurement-noise issue (e.g. real-mode flake)? If real, fix before releasing. If a deliberate baseline change (we knowingly traded N% in metric X for some other gain), update CHANGELOG with the new baseline and re-run with a temporarily looser `--threshold` to ship; the next release picks up the new floor automatically. **Don't bypass the gate without an explicit baseline-change note** — that defeats the entire scoreboard.
 - **Tests fail**: Fix first. Never release with failing tests.
 - **CHANGELOG missing**: Ask the user. Never fabricate release notes.
 - **Version already tagged**: The tag may exist from a prior attempt. Ask the user whether to delete and recreate, or use a different version.

data/.claude/skills/study-repo/SKILL.md CHANGED Viewed

@@ -36,6 +36,21 @@ Then invoke: `/study-repo /tmp/study-repos/project-name`
 See `.claude/skills/study-repo/focus-examples.md` for more examples.
+## CRITICAL: Memory Discipline (no external-tech misattribution)
+When studying an external repo you will read its README, gemspec, and source — and you will see things like *"uses Postgres"*, *"runs on AWS"*, *"built with Rails"*. These are facts **about the external project, not about this project**.
+Do NOT call `memory.store_extraction` with the external project's tech stack as `uses_database` / `uses_framework` / `uses_language` / `deployment_platform` / `auth_method` predicates. That misattribution caused 27 facts to be stored about ClaudeMemory in the 2026-04-23/24 window that all had to be hand-rejected (see `improvements.md` #61, `quality_review.md` 2026-04-30 note). The corpus damage was real even though the cleanup worked — every misattributed fact takes a round trip through the database, conflict-detection, and the user's `claude-memory reject` queue.
+**The rule.** While `/study-repo` is running, the only `memory.store_extraction` calls allowed are:
+- `predicate=reference` for descriptions of the external project ("X is a plugin/library/CLI that…"). The dashboard's Knowledge → References panel is the right home for these.
+- Facts genuinely about *this* project ClaudeMemory that you derive from contrast with the studied repo (e.g., a decision: "Adopt RRF fusion from QMD because…"). These belong as `decision` / `convention` / `architecture` with `subject=repo` or `subject=claude_memory` AND a reason clause embedded.
+**The hard ban.** Any single-value cardinality predicate (`uses_database`, `deployment_platform`, `auth_method`) populated with the studied project's tech is forbidden. If in doubt, write the observation into the influence document (`docs/influence/<project>.md`) — that file IS the right artifact for "what does this external project use" — and skip `memory.store_extraction` entirely.
+If the user asks "did the studied project use X?" later, the answer lives in `docs/influence/`, not in memory facts.
 ## Analysis Phases
 Follow these phases systematically to ensure comprehensive coverage:

data/.claude-plugin/commands/audit-memory.md ADDED Viewed

@@ -0,0 +1,68 @@
+# Audit Memory
+Run a health audit on the ClaudeMemory database and walk the user through resolving findings. Detects inconsistencies (open conflicts, single-cardinality contract violations, recurring contamination), regressions (shortcut filters losing predicate semantics), and optimizations (auto-memory files not yet imported, bare-conclusion ratio, duplicate global conventions).
+## Usage
+```
+/audit-memory
+/audit-memory --json    # machine-readable output (no walkthrough)
+/audit-memory --severity=error    # only errors
+```
+## Instructions
+You are a ClaudeMemory health auditor. Your job is to run the audit, present findings to the user with concrete remediation options, and apply fixes the user approves. Be efficient — read-only inspection is free, but every write needs user approval.
+### Step 1: Run the audit
+Call the CLI directly to get structured findings:
+```bash
+claude-memory audit --json
+```
+If the user passed `--json`, just dump the output verbatim and stop. Otherwise continue to step 2.
+If `claude-memory audit` returns `{"ok": true, "counts": {"error": 0, ...}}`, congratulate briefly and stop. Don't fabricate problems.
+### Step 2: Triage findings
+Group the findings by severity. Present them to the user in this order:
+1. **Errors (must fix)** — these block CI/quality contracts. Walk through each one. Each error has a `suggestion` field with the concrete CLI command(s) to run. Ask "shall I run this?" before executing.
+2. **Warnings (should investigate)** — surface but don't auto-fix. Many warnings (like `single_cardinality_churn`) require finding the contamination source, which needs human context.
+3. **Info (optimizations)** — present as suggestions, not blockers. Things like auto-memory imports, bare-conclusion reduction, duplicate cleanup.
+For each finding, the output already includes:
+- `id` (C001…C010) — stable across releases; users can refer to them
+- `title` — one-line summary
+- `detail` — why it matters
+- `suggestion` — the literal CLI command to run
+- `fact_ids` — the rows involved (use with `claude-memory explain <id>` for details)
+### Step 3: Investigate before mass-rejecting
+For `C002` (single-cardinality multiplicity) and `C010` (churn), DO NOT immediately bulk-reject. Recurring contamination has a source. Investigate first:
+1. Pick one of the offending fact IDs.
+2. Run `claude-memory explain <fact_id>` to see provenance.
+3. Read the `quote` and `content_item_id` to find the trigger text.
+4. Decide: is this a real claim or example text? Real claims should win the supersession; example text should be wrapped in `<no-memory>` tags at the source.
+### Step 4: Apply fixes with user approval
+For approved remediations, run the exact command from the `suggestion` field. Don't paraphrase. After each batch, re-run `claude-memory audit` to confirm the finding is gone.
+### Step 5: Wrap up
+When the audit reports `ok: true`, suggest the user:
+- Commit `.claude/memory.sqlite3` if they want to lock in the cleanup.
+- Run `claude-memory publish` to refresh `.claude/rules/claude_memory.generated.md`.
+- Wire `claude-memory audit` into CI / pre-release so future drift is caught early.
+## Background
+This skill is part of the systemic audit pipeline established in `docs/memory_audit_2026-05-21.md`. The contract definitions (single-cardinality, shortcut predicate filters, distillation backlog thresholds) live in `lib/claude_memory/audit/checks.rb`. Adding a new check there propagates automatically to this skill.
+See `docs/audit_runbook.md` for per-check rationale, common contamination sources, and worked examples.

data/.claude-plugin/marketplace.json CHANGED Viewed

@@ -7,7 +7,7 @@
   "plugins": [
     {
       "name": "claude-memory",
-      "version": "0.10.0",
+      "version": "0.12.0",
       "source": "./",
       "description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
       "repository": "https://github.com/codenamev/claude_memory"

data/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-memory",
-  "version": "0.10.0",
+  "version": "0.12.0",
   "description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
   "author": {
     "name": "Valentino Stoll",

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,76 @@ All notable changes to this project will be documented in this file.
 ## [Unreleased]
+## [0.12.0] - 2026-05-29
+Theme: **Release Discipline + Observability + Self-Audit** — the infrastructure that makes a 1.0 semver promise defensible. This release locks down the public API surface, adds the observability primitives (OTel ingestion, dashboard Telemetry) and the self-audit toolkit (`claude-memory audit`) that serve the visibility pillar, and ships the negative-fact harm benchmark + staleness guard that make the long-horizon-quality claim measurable rather than aspirational.
+### Added
+- **Staleness guard for single-value facts** — single-value predicates (`uses_database` / `deployment_platform` / `auth_method`) are exclusive claims Claude follows authoritatively, so a *stale* one is the most dangerous kind of memory. The 0.12 harm benchmark caught Claude emitting `git push heroku HEAD:main` from a stale `deployment_platform` fact with zero hedge — and supersession only protects against this if the replacement was recorded. New `Recall::StalenessAnnotator` (pure function) flags single-value facts that are old (`valid_from`/`created_at` older than `injection_stale_days`, default 180) AND not recently confirmed (`last_recalled_at` null or stale); `Hook::ContextInjector` appends a `⚠ stale: recorded YYYY-MM-DD … verify before relying` marker at SessionStart so Claude can hedge or verify instead of blindly following. Multi-value predicates are never annotated (they accumulate; one stale entry isn't authoritative). New `Configuration#injection_stale_days` (`CLAUDE_MEMORY_INJECTION_STALE_DAYS`), deliberately much longer than the 14-day dashboard review window. Serves the 1.0 long-horizon-quality pillar — it's the first defense against memory degrading session quality over months.
+- **Negative-fact harm benchmark — full 13-scenario corpus + release gate** — expands the 0.11 3-scenario prototype to 13 cases across four harm classes (stale_tech, mismatched_scope, superseded_undetected, and the new reference_material_as_fact). Each scenario ships a `project_files` scaffold whose current state contradicts the wrong memory fact, so the test measures "does Claude follow stale/wrong memory over the project's actual state?" rather than reacting to an empty directory. Scored best-of-N (default 3 runs, majority vote per scenario via `HARM_BENCH_RUNS`) to absorb single-shot LLM nondeterminism. `HARM_RATE_THRESHOLD` (default 1%) fails the run if the majority-harmed scenario rate is exceeded — making "memory doesn't make Claude wrong" a measurable release gate rather than a marketing claim. The first full-corpus real-mode run surfaced a real harm (stale deployment fact) and a harness confound (empty-tmpdir noise), which drove both the staleness guard above and the scaffold + best-of-N harness hardening.
+- **`claude-memory audit` — memory health diagnostic** — productionizes the 2026-05-21 contamination audit into a stable diagnostic surface anyone using claude_memory can run on their own setup. Ten contract checks (C001-C010) cover open conflicts, single-cardinality multiplicity, distillation backlog, shortcut-leak detection, duplicate global conventions, bare-conclusion rate, project starvation, auto-memory import gaps, and single-cardinality churn. `--json` is the stable contract for CI; `--severity` filters; `--no-exit` always exits 0. The `/audit-memory` slash command wraps the same runner for an interactive walkthrough. `docs/audit_runbook.md` documents each check's rationale and remediation. `CHECK_METHODS` is append-only by design so JSON consumers don't break when new checks land. New `claude-memory import-auto-memory` retroactively pulls `~/.claude/projects/<slug>/memory/*.md` entries that `AutoMemoryMirror` previously missed (slug bug: `tr("/", "-")` left underscores intact, so `claude_memory` paths never matched). Contributes to the **visibility** pillar of 1.0.
+- **Contamination guardrails — `ReferenceMaterialDetector` example-quote guard + `Resolver` `:discard` path** — the distiller used to treat example sentences in docs/CLAUDE.md ("e.g., postgres", "for example, mysql") as literal claims about the project, accumulating 103 rejected single-cardinality facts over six weeks before being caught by the 2026-05-21 audit. Two defenses now: (1) `ReferenceMaterialDetector` flags single-cardinality predicate extractions whose source text contains `e.g.,` / `for example` / `i.e.` quote patterns so they're tagged reference material at write time; (2) `Resolver` gains a `:discard` resolution path for the same shape so the fact never lands even if the detector misses. Memory shortcuts (`memory.decisions` / `.conventions` / `.architecture`) refactored from FTS text search (which returned facts whose *object* matched the predicate keyword) to predicate-based filtering via `PredicatePolicy`, with project-DB precedence over global. Closes a class of "is memory still trustworthy?" bugs that erode the 1.0 stability claim.
+- **OpenTelemetry ingestion + dashboard Telemetry tab** — Claude Code can now export metrics, log-style events, and (opt-in) traces straight into the dashboard via OTLP/HTTP/JSON. New `claude-memory otel` CLI manages the env block in `.claude/settings.json` (`--enable`, `--disable`, `--enable-traces`, `--capture-prompts`, `--status`, `--verify`); the dashboard exposes `/v1/metrics`, `/v1/logs`, `/v1/traces` on `127.0.0.1:3377` and a new "Telemetry" drawer showing cost per hour, tokens by model, top tools by latency, and a per-prompt journey waterfall that UNIONs `otel_events` with the existing `activity_events`. Schema v18 adds `otel_metrics`/`otel_events`/`otel_traces` plus an additive `prompt_id` column on `activity_events` for journey correlation. Privacy posture: nothing past metric counts is captured by default; `OTEL_LOG_USER_PROMPTS` only flips on with explicit `--capture-prompts` confirmation; traces remain 501-gated until the user opts in. Sweep retention defaults: 30 days metrics, 14 days events, 7 days traces.
+- **Pre-release hook smoke gate** (`bin/pre-release-smoke`) — verifies the *installed* claude-memory gem actually fires hooks correctly and populates expected `detail_json` fields per `spec/smoke/expected_fields.yml`. Codifies the verification convention from `feedback_hooks_run_installed_gem.md` into a machine-enforced release gate. The trap has been sprung twice (2026-04-16 ActivityLog, 2026-04-30 #47 token-budget); the gate exists so it can't be sprung a third time. Wired into the `/release` skill as Phase 1 Step 6 (after specs, before lint). First 0.12.0 milestone item.
+- **`/study-repo` memory-discipline guard (prompt-only)** — top-level "CRITICAL: Memory Discipline" section in `.claude/skills/study-repo/SKILL.md` explicitly forbids the LLM from extracting external projects' tech stack as project-level facts. Roots the cleanup work `claude-memory reject` had to do during 0.11 (27-fact misattribution cluster on 2026-04-23/24, see `quality_review.md` 2026-04-30 cause-4 finding). Defense-in-depth detector deferred to 0.12.x or later, only built if measurement shows persistent leakage.
+- **API stability audit (`docs/api_stability.md`)** — authoritative public-API contract enumerating which CLI commands, MCP tools, hook events, Ruby classes, and schema surfaces are stable / experimental / internal. Default-to-internal applied throughout; the doc is the source of truth for what 1.0's semver promise will lock down. New `ClaudeMemory::Deprecations.warn(name:, replacement:, removed_in:)` module wired into `PredicatePolicy.canonicalize` as the first soft-rename — `has_convention` and `primary_language` synonyms now emit deprecation warnings scheduled for removal in `1.0.0`. README + CLAUDE.md link to the new doc; suppress noise via `CLAUDE_MEMORY_NO_DEPRECATIONS=1`.
+- **Release-to-release benchmark scoreboard** — `bin/run-evals` now writes `spec/benchmarks/results/<version>.json` after each run; new `bin/bench-diff` compares the current scoreboard against the most recent prior tagged version's and exits non-zero if any tracked pass-rate dropped beyond the threshold (default -5%, configurable via `--threshold`). Wired into `/release` skill Phase 1 as Step 7 — the release aborts on regressions before publish. First release with this gate is 0.12.0 itself; from 0.13.0 onward bench-diff actively gates against 0.12 baselines.
+### Deferred to 0.13
+- **CLAUDE.md comparative baseline numbers (#4)** — the comparative E2E harness compares static CLAUDE.md (auto-loaded into context) against ClaudeMemory's MCP-tool retrieval, but in headless `claude -p` mode Claude doesn't proactively call the recall tools, so the comparison doesn't yet exercise ClaudeMemory's retrieval path fairly (first run returned a misleading ClaudeMemory 0/10 = no-memory 0/10 vs CLAUDE.md 8/10). Publishing that would mislead, so the numbers are withheld and the harness fix is tracked for 0.13. This surfaced a genuine separable observation — in fully headless, non-tool-forcing usage, ClaudeMemory's contribution rides entirely on the SessionStart context-hook injection — also tracked for 0.13. See `docs/1_0_punchlist.md` #4 / #16.
+### Upgrade Notes
+- **Schema migrates automatically to v18** (OTel telemetry tables + `prompt_id` on `activity_events`) on first DB open via `Sequel::Migrator` — no manual step. Round-trip migration specs cover the upgrade path from prior release boundaries.
+- **The staleness marker now appears in SessionStart context** for single-value facts (`uses_database` / `deployment_platform` / `auth_method`) older than 180 days and not recently recalled. This is additive and advisory (a `⚠ stale … verify before relying` note). Tune the window with `CLAUDE_MEMORY_INJECTION_STALE_DAYS`; the existing `CLAUDE_MEMORY_STALE_DAYS` (dashboard review window) is unchanged.
+- No breaking API changes. `has_convention` / `primary_language` predicate synonyms continue to emit deprecation warnings (scheduled for removal in 1.0.0); suppress via `CLAUDE_MEMORY_NO_DEPRECATIONS=1`.
+## [0.11.0] - 2026-04-30
+Theme: **Trust & Cost** — five user-visible signals that answer "is memory still worth it?" with numbers a skeptical user can read in <30 seconds.
+### Added
+- **Token budget telemetry** — every successful SessionStart context injection now records an estimated `context_tokens` count on its `activity_events` row. Surfaced three ways:
+  - Dashboard Trust panel emits a `token_budget` block with p50/p95/avg/sample_size over the last 30 days, so the JSON dashboard endpoint and any downstream consumer answer "what does memory cost per session?"
+  - `claude-memory digest` includes a "Context cost" subsection between activity and new-knowledge so the weekly report shows the price tag next to the value.
+  - `claude-memory stats --tokens [--since DAYS]` reports total sessions, p50/p95/avg/min/max, and a histogram across <500 / 500-1k / 1-2k / 2-5k / 5k+ buckets.
+- Pure additive — no schema migration. Historical events written before this release simply contribute zero samples until new injections accumulate.
+- First 0.11.0 milestone item from the 1.0 punchlist (Trust & Cost). Closes the "what % of my SessionStart token budget does memory consume?" gap.
+- **Hallucination rate metric** — the dashboard now quantifies how clean the fact base is, not just how full it is. `Distill::BareConclusionDetector` is the production-side mirror of the SessionStart prompt's reason-clause requirement (decision/convention facts must embed "because…" / "so that…" / "to avoid…"). Surfaced two ways:
+  - Dashboard Trust panel emits a `quality_score` block aggregating across project + global active facts: `suspect_count` (predicate=reference, retagged by ReferenceMaterialDetector), `bare_conclusion_count`, percentages, and an overall 0–100 score (higher = cleaner). Returns 100 on empty stores so fresh installs aren't penalized.
+  - `claude-memory digest` includes a "Quality" section showing the score breakdown plus the in-window rejection rate ("of facts created in the last 7 days, X% have been rejected since"), so calibration drift is visible.
+- Second 0.11.0 milestone item. Pairs with token-budget telemetry to answer "is memory still worth its cost?" via two skeptic-friendly numbers.
+- **`claude-memory show`** — new CLI command prints what memory would inject at the next SessionStart in plain Markdown. Runs the exact `Hook::ContextInjector` path real sessions use, so output matches what Claude actually receives. Footer reports fact count, ~token estimate, and char count so users see the SessionStart cost at a glance.
+  - Default suppresses the raw-transcript "Pending Knowledge Extraction" dump (intended for LLM distillation, not human reading); pass `--pending` to include it.
+  - `--source SOURCE` (startup/resume/clear) simulates each fresh-session entrypoint so users can preview which sections would appear.
+- Third 0.11.0 milestone item. Closes the inspectability gap — trust requires being able to see what memory will inject, the same way `cat CLAUDE.md` works.
+- **First-week ROI nudge** — at SessionEnd, memory now prints `memory contributed N facts this session, %used = X` for the first 10 sessions, then quiets. New users get user-visible proof memory is doing work for them without having to know about the dashboard. Once trust is established (or it isn't), the nudge gets out of the way.
+  - New `claude-memory hook nudge` subcommand + `Hook::Handler#nudge`. SessionEnd config now wires `[ingest, sweep, nudge]` in order.
+  - Silent on `CLAUDE_MEMORY_NO_NUDGE=1` opt-out, missing session_id, n=0 contributions, and after MAX_NUDGES emissions. The empty-session silent path doesn't burn a slot — quiet sessions don't count toward the 10.
+  - Activity event `roi_nudge` records `{n, used, pct, prior_count}` per emission so a future migration could change the threshold without re-counting from raw events.
+- Fourth 0.11.0 milestone item. Cold-start trust signal that pairs with #47 (token cost) and #48 (quality) to make the first-week answer to "is this worth it?" visible without effort.
+- **Harm benchmark prototype** — `spec/benchmarks/dataset/harm_scenarios.yml` + `spec/benchmarks/e2e/harm_bench_spec.rb`. Three hand-written cases spanning the riskiest harm classes (stale_tech, mismatched_scope, superseded_undetected). The first ClaudeMemory benchmark that measures whether memory can make Claude *wrong* — every other benchmark only measures whether memory helps.
+  - Structure validation (regex compile, fact loadability, harm-class coverage) runs in stub mode as part of `:benchmark` tag.
+  - Real-mode runner: `EVAL_MODE=real bundle exec rspec spec/benchmarks/e2e/harm_bench_spec.rb` — needs `claude` CLI on PATH, ~$2-8 per run. Reports harm rate; doesn't enforce a threshold yet (that's the 0.12 release gate).
+- 0.11.0 risk-de-risking item. If even one of these three surfaces a harm now, the full 10-15-case benchmark planned for 0.12 will likely reveal a fundamental issue — better to learn that at 0.11 than at 0.12. **Real-mode prototype run on 2026-04-30 reported 0/3 harm** — green light to expand to the full corpus in 0.12.
+### Changed
+- **Hallucination-rate metric calibration** — `Dashboard::Trust#quality_score` now reports a windowed (last 30d) "live" score as the headline plus a "historical" block over all active facts. Production verification on 2026-04-30 (recorded in `docs/quality_review.md`) showed the unwindowed metric was technically correct but pragmatically misleading: 97% of bare-conclusion facts pre-dated the 2026-04-20 reason-clause prompt commit, and the entire 7-day rejection cluster was a single-class systemic failure (a `/study-repo` burst), not ongoing noise. The split makes the metric actionable: live score = ongoing extraction quality, historical = legacy data. The digest's "Quality" section uses the live score as the headline.
+### Fixed
+- Real-eval CLI runner now passes `allowed_tools` through explicitly so the harm benchmark and other real-mode benches can pre-allow MCP memory tools without per-test wiring.
+### Upgrade Notes
+- No schema migration. All new features ship purely additive.
+- Hooks run the installed gem from PATH, not the working tree. After upgrading, `bundle exec rake install` (or `gem install claude_memory`) is required for the new SessionEnd nudge, `claude-memory show` command, `--tokens` stats flag, and `context_tokens` activity-event field to actually fire on real hook events.
+- Existing `quality_score` consumers will see additional fields (`window_days`, `historical`) in the snapshot. The original keys (`score`, `total_active`, `suspect_count`, `bare_conclusion_count`, `suspect_pct`, `bare_pct`) remain at the top level and now reflect the 30-day live window — historical numbers move to the `historical` sub-hash.
 ## [0.10.0] - 2026-04-28
 ### Added

data/CLAUDE.md CHANGED Viewed

@@ -15,6 +15,10 @@ ClaudeMemory is a Ruby gem that provides long-term, self-managed memory for Clau
 **Check memory before exploring code.** Use `memory.recall`, `memory.decisions`, `memory.architecture`, or `memory.conventions` to find existing knowledge before reading files.
+**Public API contract:** [docs/api_stability.md](docs/api_stability.md) is the authoritative stable-surface list (CLI, MCP, hooks, Ruby API, schema, predicate vocabulary). When changing any of those surfaces, update the doc in the same commit; if it's a soft-rename, wire `ClaudeMemory::Deprecations.warn`.
+**Audit memory health:** run `claude-memory audit` (or `/audit-memory` for an interactive walkthrough) to surface inconsistencies, regressions, and optimization opportunities. See [docs/audit_runbook.md](docs/audit_runbook.md) for per-check rationale and remediation steps.
 ### Git Usage & Best Practices
 - Before each commit, apply the quality-review skill
@@ -163,7 +167,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
   - Each command is a separate class (HelpCommand, DoctorCommand, etc.)
   - All commands inherit from BaseCommand
   - Dependency injection for I/O (stdout, stderr, stdin)
-  - 32 commands total, each focused on single responsibility
+  - 34 commands total, each focused on single responsibility
 - **`Configuration`**: Centralized ENV access (`configuration.rb`)
   - Single source of truth for paths and environment variables
@@ -209,6 +213,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
   - Pluggable distiller design (current: NullDistiller stub)
   - Extracts entities, facts, scope hints from content
   - `ReferenceMaterialDetector`: classifies "X is a plugin/library/tool" templates, LOC counts, "by Firstname Lastname" attributions as reference material. Runs in `ManagementHandlers#store_extraction` so mislabeling can't persist
+  - `BareConclusionDetector` (0.11.0+): production-side mirror of the SessionStart prompt's reason-clause requirement. Pure function — flags `decision` / `convention` facts whose object lacks a reason-clause signal ("because", "so that", "to avoid", etc.). Powers the `quality_score` metric on the Trust panel and the digest's Quality section.
   - SessionStart distillation prompt enforces reason clauses ("because…", "so that…") for `decision` and `convention` predicates — bare conclusions are explicitly disallowed
 - **`Resolve`**: Truth maintenance and conflict resolution (`resolve/`)
@@ -228,7 +233,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
   - Modes: shared (repo), local (uncommitted), home (user directory)
 - **`MCP`**: Model Context Protocol server and tools (`mcp/`)
-  - Exposes memory tools to Claude Code (25 tools total)
+  - Exposes memory tools to Claude Code (23 tools total)
   - `Telemetry`: Records tool invocations to `mcp_tool_calls` table for usage stats
   - Dual content/structuredContent responses with compact mode
@@ -249,7 +254,7 @@ Key tables (defined in `sqlite_store.rb`):
 - `fact_links`: Supersession and conflict relationships
 - `conflicts`: Open contradictions
 - `mcp_tool_calls`: MCP server tool invocation telemetry (schema v13)
-- `activity_events`: Hook/recall/context/sweep telemetry (schema v15) — powers the dashboard timeline, moments feed, efficacy reports
+- `activity_events`: Hook/recall/context/sweep/nudge telemetry (schema v15) — powers the dashboard timeline, moments feed, efficacy reports. Event types: `hook_ingest`, `hook_context` (carries `context_tokens` since 0.11.0), `hook_sweep`, `hook_publish`, `recall`, `store_extraction`, `roi_nudge` (since 0.11.0).
 - `moment_feedback`: Per-moment 👍/👎 verdicts with optional notes (schema v16) — unique on event_id, repeat clicks upsert
 Facts include:
@@ -261,10 +266,13 @@ Facts include:
 ### Scope System
 Facts are scoped to control where they apply:
+<no-memory>
 - **project**: Current project only (e.g., "claude_memory uses SQLite for storage")
 - **global**: All projects (e.g., "I prefer 4-space indentation")
 Distiller detects signals like "always", "in all projects", "my preference" and sets `scope_hint: "global"`. Users can manually promote facts via `claude-memory promote <fact_id>` or the `memory.promote` MCP tool.
+</no-memory>
 ## Testing Strategy
@@ -331,7 +339,7 @@ Also update `SECTION_MAP` if the predicate should appear in a specific snapshot
 - `lib/claude_memory.rb`: Main module, requires, database path helpers
 - `lib/claude_memory/cli.rb`: Thin command router (41 lines)
-- `lib/claude_memory/commands/`: Individual command classes (28 commands)
+- `lib/claude_memory/commands/`: Individual command classes (34 commands)
 - `lib/claude_memory/configuration.rb`: Centralized configuration and ENV access
 - `lib/claude_memory/domain/`: Domain models (Fact, Entity, Provenance, Conflict)
 - `lib/claude_memory/core/`: Value objects and null objects
@@ -346,7 +354,7 @@ Also update `SECTION_MAP` if the predicate should appear in a specific snapshot
 The gem includes an MCP server (`claude-memory serve-mcp`) that exposes memory operations as tools. Configuration should be in `.mcp.json` at project root.
-Available MCP tools (25 total):
+Available MCP tools (23 total):
 - **Query & Recall**: `memory.recall`, `memory.recall_index`, `memory.recall_details`, `memory.recall_semantic`, `memory.search_concepts`
 - **Provenance**: `memory.explain`, `memory.fact_graph`
 - **Shortcuts**: `memory.decisions`, `memory.conventions`, `memory.architecture`
@@ -373,6 +381,13 @@ ClaudeMemory integrates with Claude Code via hooks in `.claude/settings.json`:
   - Runs time-bounded maintenance on both databases
   - Cleans up vec0 entries for superseded/expired facts
+- **Nudge hook** (0.11.0+): Triggers on SessionEnd, fires after ingest+sweep
+  - Calls `claude-memory hook nudge`
+  - For the first 10 sessions only, prints "memory contributed N facts this session, %used = X" to stdout so new users see ROI inline before they discover the dashboard
+  - Records `roi_nudge` activity_events; quiets after `MAX_NUDGES` emissions
+  - Opt out with `CLAUDE_MEMORY_NO_NUDGE=1` (no event recorded on opt-out)
+  - Empty sessions (n=0) silently no-op so quiet sessions don't burn nudge slots
 Hook commands read JSON payloads from stdin for robustness. Supports `--async` flag for non-blocking execution.
 ## Dashboard