RubyGems - claude_memory - Versions diffs - 0.10.0 → 0.12.0 - Mend

claude_memory 0.10.0 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (72) hide show

checksums.yaml +4 -4
data/.claude/memory.sqlite3 +0 -0
data/.claude/rules/claude_memory.generated.md +42 -64
data/.claude/skills/release/SKILL.md +44 -6
data/.claude/skills/study-repo/SKILL.md +15 -0
data/.claude-plugin/commands/audit-memory.md +68 -0
data/.claude-plugin/marketplace.json +1 -1
data/.claude-plugin/plugin.json +1 -1
data/CHANGELOG.md +70 -0
data/CLAUDE.md +20 -5
data/README.md +64 -2
data/db/migrations/018_add_otel_telemetry.rb +81 -0
data/docs/1_0_punchlist.md +522 -89
data/docs/GETTING_STARTED.md +3 -1
data/docs/api_stability.md +341 -0
data/docs/architecture.md +3 -3
data/docs/audit_runbook.md +209 -0
data/docs/claude_monitoring.md +956 -0
data/docs/dashboard.md +23 -3
data/docs/improvements.md +329 -5
data/docs/influence/ai-memory-systems-2026.md +403 -0
data/docs/memory_audit_2026-05-21.md +303 -0
data/docs/plugin.md +1 -1
data/docs/quality_review.md +35 -0
data/lib/claude_memory/audit/checks.rb +239 -0
data/lib/claude_memory/audit/finding.rb +33 -0
data/lib/claude_memory/audit/runner.rb +73 -0
data/lib/claude_memory/commands/audit_command.rb +117 -0
data/lib/claude_memory/commands/dashboard_command.rb +2 -1
data/lib/claude_memory/commands/digest_command.rb +95 -3
data/lib/claude_memory/commands/hook_command.rb +27 -2
data/lib/claude_memory/commands/import_auto_memory_command.rb +180 -0
data/lib/claude_memory/commands/initializers/hooks_configurator.rb +7 -4
data/lib/claude_memory/commands/otel_command.rb +240 -0
data/lib/claude_memory/commands/registry.rb +5 -1
data/lib/claude_memory/commands/show_command.rb +90 -0
data/lib/claude_memory/commands/stats_command.rb +94 -2
data/lib/claude_memory/configuration.rb +60 -0
data/lib/claude_memory/core/fact_query_builder.rb +1 -0
data/lib/claude_memory/dashboard/api.rb +8 -0
data/lib/claude_memory/dashboard/index.html +140 -1
data/lib/claude_memory/dashboard/prompt_journey.rb +48 -0
data/lib/claude_memory/dashboard/server.rb +86 -0
data/lib/claude_memory/dashboard/telemetry.rb +156 -0
data/lib/claude_memory/dashboard/trust.rb +180 -11
data/lib/claude_memory/deprecations.rb +106 -0
data/lib/claude_memory/distill/bare_conclusion_detector.rb +71 -0
data/lib/claude_memory/distill/reference_material_detector.rb +37 -4
data/lib/claude_memory/hook/auto_memory_mirror.rb +7 -3
data/lib/claude_memory/hook/context_injector.rb +11 -2
data/lib/claude_memory/hook/handler.rb +142 -1
data/lib/claude_memory/mcp/tool_definitions.rb +3 -3
data/lib/claude_memory/otel/attributes.rb +118 -0
data/lib/claude_memory/otel/constants.rb +32 -0
data/lib/claude_memory/otel/ingestor.rb +54 -0
data/lib/claude_memory/otel/otlp_json_envelope.rb +254 -0
data/lib/claude_memory/otel/prompt_scope.rb +108 -0
data/lib/claude_memory/otel/settings_writer.rb +122 -0
data/lib/claude_memory/otel/status.rb +58 -0
data/lib/claude_memory/recall/staleness_annotator.rb +73 -0
data/lib/claude_memory/resolve/predicate_policy.rb +17 -1
data/lib/claude_memory/resolve/resolver.rb +30 -3
data/lib/claude_memory/shortcuts.rb +61 -18
data/lib/claude_memory/store/prompt_journey_query.rb +87 -0
data/lib/claude_memory/store/schema_manager.rb +1 -1
data/lib/claude_memory/store/sqlite_store.rb +136 -0
data/lib/claude_memory/sweep/maintenance.rb +31 -1
data/lib/claude_memory/sweep/sweeper.rb +6 -0
data/lib/claude_memory/templates/hooks.example.json +5 -0
data/lib/claude_memory/version.rb +1 -1
data/lib/claude_memory.rb +20 -0
metadata +28 -1

data/docs/memory_audit_2026-05-21.md ADDED Viewed

@@ -0,0 +1,303 @@
+# Memory Database Audit — 2026-05-21
+<no-memory>
+This document discusses hallucinated facts and example stack names by way of diagnosis. It is wrapped in `<no-memory>` so the distiller does not re-extract the very words being audited. Human readers ignore the tags.
+</no-memory>
+<no-memory>
+**Scope:** Full audit of the ClaudeMemory project's own memory database (global + project) against the actual state of the codebase at v0.11.0. Grades how useful the distilled knowledge is for coding agents working in this repo, and defines a systemic remediation pipeline.
+**Snapshot:** `claude_memory @ main`, schema v18, 23 MCP tools, 9 predicates in `PredicatePolicy::POLICIES`.
+---
+## Executive Summary
+The memory pipeline is structurally sound but **contaminated**. Three independent surfaces (global DB, project DB, generated snapshot) carry mixed signal — the curated snapshot is genuinely useful (~70% signal), but the MCP shortcut tools (`memory.decisions`, `memory.conventions`, `memory.architecture`) are net-misleading in their current form: they surface hallucinated stack diversity and global terminal preferences instead of the rich project knowledge that exists in the DB.
+**Root cause:** CLAUDE.md's scope-system example text ("this app uses PostgreSQL", "I prefer 4-space indentation") is repeatedly re-distilled as fact. This compounds with a 99-item undistilled backlog and inconsistent shortcut-tool scope filters.
+**Verdict:** Stop the bleeding (source-text fix + doc drift fix), then clean the noise floor (bulk-reject hallucinated `uses_*` facts), then fix the shortcut tool filters. The system can become trustworthy with ~1 day of focused work.
+---
+## Part 1 — Ground Truth (from the code)
+ClaudeMemory v0.11.0, "Trust & Cost" release. Authoritative findings from full repo audit:
+### Stack
+- Pure Ruby gem. Ruby ≥ 3.2.
+- Sequel + Extralite over SQLite (`Sequel.connect("extralite:#{path}")` only — never `Sequel.sqlite`).
+- sqlite-vec for KNN; fastembed-rb optional for semantic search.
+- **No Rails, no React, no Django, no Express, no Next.js, no MySQL, no Postgres, no Redis, no AWS/GCP/Azure/Vercel/Docker deployment.** It's a gem you install locally.
+### Architecture (7 layers, verified)
+- **Application:** `CLI` (41-line router) → 35 commands under `lib/claude_memory/commands/`, all inheriting `BaseCommand` with stdin/stdout/stderr DI.
+- **Core Domain:** `domain/` (Fact, Entity, Provenance, Conflict — frozen + validated) + `core/` (21 value/null objects: Result, SessionId, NullFact, FactRanker, RRFusion, etc.).
+- **Infrastructure:** `Store::SQLiteStore` (with `RetryHandler`, `SchemaManager`, `LLMCache`, `MetricsAggregator` mixins), `Store::StoreManager` (dual-DB router), `Infrastructure::FileSystem` / `InMemoryFileSystem`.
+- **Business Logic:** `Ingest`, `Index` (`LexicalFTS` + `VectorIndex`), `Distill` (`NullDistiller`, `ReferenceMaterialDetector`, `BareConclusionDetector`), `Resolve::Resolver` + `Resolve::PredicatePolicy`, `Recall` (facade → `DualEngine`/`LegacyEngine` both including `QueryCore`), `Sweep`, `Publish`, `MCP`, `Hook`.
+- **Dashboard:** 14 panel modules under `lib/claude_memory/dashboard/`.
+### Schema
+Current `SCHEMA_VERSION = 18` in `lib/claude_memory/store/schema_manager.rb:8`. 18 migrations in `db/migrations/`:
+- v13 → `mcp_tool_calls` (telemetry, minimal columns — no query_text/hash by YAGNI).
+- v15 → `activity_events` (hook/recall/context/sweep/store_extraction/roi_nudge events).
+- v16 → `moment_feedback` (per-event 👍/👎 verdicts, upsert on event_id).
+- v17 → `facts.last_recalled_at` (access-based staleness via `Sweep::RecallTimestampRefresher`).
+- v18 → `otel_metrics`/`otel_events`/`otel_traces` + `activity_events.prompt_id` for prompt-journey correlation.
+### Predicates (9, not 8)
+From `lib/claude_memory/resolve/predicate_policy.rb`:
+| Predicate | Cardinality | Section |
+|---|---|---|
+| `convention` | multi | conventions |
+| `decision` | multi | decisions |
+| `architecture` | multi | additional (intentionally unmapped) |
+| `reference` | multi | references *(new in 0.11.0)* |
+| `uses_framework` | multi | constraints |
+| `uses_language` | multi | constraints |
+| `uses_database` | single (exclusive) | constraints |
+| `deployment_platform` | single (exclusive) | constraints |
+| `auth_method` | single (exclusive) | constraints |
+Synonyms (with `Deprecations.warn`, removal in 1.0.0): `has_convention → convention`, `primary_language → uses_language`.
+### MCP (23 tools, not 25)
+Six handler modules under `lib/claude_memory/mcp/handlers/`: `QueryHandlers`, `ShortcutHandlers`, `ContextHandlers`, `ManagementHandlers`, `StatsHandlers`, `SetupHandlers`. `MCP::Telemetry` wraps `Server#handle_tools_call`, records to `mcp_tool_calls`, swallows DB errors so telemetry never breaks a tool response.
+### Hooks
+Five events: **ingest**, **context**, **sweep**, **publish**, **nudge** (new in 0.11.0, `MAX_NUDGES=10`, silently no-ops on empty sessions or `CLAUDE_MEMORY_NO_NUDGE=1`). `AutoMemoryMirror` runs on fresh `SessionStart`, surfaces up to 5 candidates × 1500 chars from `~/.claude/projects/<slug>/memory/*.md`.
+### Distillation (three layers, all wired)
+- **Layer 1 — NullDistiller:** regex, P95 < 5ms, runs every hook.
+- **Layer 2 — SessionStart context injection:** `hookSpecificOutput.additionalContext` with reason-clause-required extraction prompt. Claude Code itself acts as distiller at no extra API cost.
+- **Layer 3 — `/distill-transcripts` skill:** manual, deep extraction with depth-aware prompts.
+- `ReferenceMaterialDetector` is wired at `mcp/handlers/management_handlers.rb:37` so external-project descriptions can't persist as `convention` facts.
+- `BareConclusionDetector` is wired into the Trust panel's `quality_score` and the digest's Quality section.
+### Documentation drift in CLAUDE.md / generated rules
+- "8 predicates" → actually **9** (CLAUDE.md predates the `reference` predicate added in 0.11.0).
+- "25 MCP tools total" → actually **23**.
+---
+## Part 2 — What Memory Actually Believes
+### Database state (`memory.stats`)
+| Scope | Total | Active | Superseded | Open Conflicts | Pending Distillation |
+|---|---|---|---|---|---|
+| Global | 12 | 7 | 2 | 0 | — |
+| Project | 201 | 46 | 37 | **10** | **99** |
+### Project predicate distribution (46 active)
+- `convention`: 28
+- `architecture`: 6
+- `uses_language`: 3 *(ruby, go, python — repo is Ruby-only)*
+- `decision`: 3
+- `uses_framework`: 2 *(rails, react — neither present in code)*
+- `reference`: 2
+- `uses_database`: 1
+- `deployment_platform`: 1
+### Global memory (7 active)
+All `convention` predicate, all user-level workflow prefs: Docker, tmux, iTerm2, VS Code + Ruby LSP. Several near-duplicates (ids 1↔8, 6↔11, 7↔12, 5↔10).
+### Open conflicts (10, all cluster around hallucination loop)
+- Fact #21 `uses_database=sqlite` vs #139 postgresql, #148 postgres, #154, #155 redis.
+- Fact #45 `uses_framework=rails` vs sinatra/express/django/next.js/react.
+- Fact #48 `deployment_platform=aws` vs gcp/vercel/docker/azure.
+All caused by CLAUDE.md scope-system example text being repeatedly re-extracted; resolver dutifully creates a new conflict each pass because none of the values can be authoritatively chosen.
+---
+## Part 3 — Tool-by-Tool Grading
+### `memory.decisions` — Grade: D
+Returns 23 results, only **3** are actual `decision`-predicate facts. The rest are `uses_*`/`deployment_platform`/`reference` rows. Output mixes contradictory single-cardinality predicates as if they all hold simultaneously. **Net effect: agent concludes this gem talks to MySQL + Postgres + Redis + SQLite at once.** It doesn't.
+### `memory.conventions` — Grade: F (for project work)
+Returns 10 results, **all global scope**. The 28 high-value project conventions (Configuration instance-only, Sequel/extralite adapter rule, version-in-3-places, EXPECTED_HOOKS sync, block-style rule, A/B testing methodology, `/release` workflow, etc.) are **not returned**. Worse, the global list is half-duplicates.
+### `memory.architecture` — Grade: D
+Returns 31 results. ~25 are hallucinated `uses_*` / `deployment_platform` facts; ~6 are global user-preference noise. The real architectural facts (PredicatePolicy SoT, MCP::Tools 6-handler split, Recall facade structure, SQLiteStore mixin pattern, Embeddings::DimensionCheck) **don't appear in the top 31**.
+### Generated snapshot (`.claude/rules/claude_memory.generated.md`) — Grade: B+
+Auto-loads into every Claude Code session. Genuinely useful:
+- 3 real decisions (MCP telemetry minimal columns, QMD restudy, claude-supermemory study), all with reason clauses.
+- ~25 real project conventions covering the genuine gotchas.
+- 6 architecture facts (PredicatePolicy SoT, MCP::Tools dispatcher, Recall facade, SQLiteStore mixins, pluggable Embeddings, DimensionCheck value object).
+**But:** the "Technical Constraints" block says `uses_framework=rails`, `deployment_platform=aws` (both wrong), and the Open Conflicts list at the bottom is a 47-row tail that pushes useful content down.
+### Auto-memory files (`~/.claude/projects/-Users-…/memory/*.md`) — Grade: A
+Separately maintained, never made it into the SQLite DB. Highest-quality knowledge in the system: SchemaVersion-in-tests, `upsert_content_item` requires `text_hash`+`byte_len`, FTS indexing pattern, hook context test isolation, WAL stale-cache phantom corruption, FTS5 rank corruption after `.recover`, scope_hint vs scope routing, round-trip migration specs. **Invisible to `memory.recall`.** Only surfaced transiently via `AutoMemoryMirror` at SessionStart.
+---
+## Part 4 — Root Causes
+1. **CLAUDE.md scope-system example is a hallucination factory.** It contains literal phrases ("this app uses PostgreSQL", "I prefer 4-space indentation") that Layer 1 and Layer 2 distillers extract as ground truth. Known open product gap (`feedback_hallucination_source_vs_cleanup.md`).
+2. **99-item distillation backlog.** SessionStart prompts for deep distillation but `mark_distilled` isn't being called for most items. Same text gets re-extracted across sessions, conflicts re-open.
+3. **Shortcut tools have inconsistent filters.** `memory.conventions` is hardcoded to global scope; `memory.decisions` aggregates across `decision`/`uses_*`/`reference` predicates; `memory.architecture` mixes scopes and predicates differently again. None correctly answer "what does this project actually look like?"
+4. **Single-cardinality predicates can't self-heal.** `uses_database`, `deployment_platform` keep flipping as new hallucinated facts arrive; resolver creates a new conflict each time and never closes the old ones.
+5. **Documentation drift propagates.** CLAUDE.md says "8 predicates", "25 MCP tools" — if a coding agent re-extracts from CLAUDE.md, these incorrect counts become persisted facts.
+6. **Auto-memory bypass.** The highest-quality knowledge (gotcha files) lives in markdown files outside the DB, so retrieval tools can't reach it.
+---
+## Part 5 — Remediation Pipeline
+Four phases, lowest-risk highest-leverage first. Each phase has explicit done criteria and a verification step. The whole pipeline should be re-runnable: see Phase 4 for the systemic check.
+### Phase 1 — Stop the bleeding *(blocks further drift, ~30 min)*
+| # | Action | Verification |
+|---|---|---|
+| 1.1 | Wrap CLAUDE.md scope-system example text in `<no-memory>` tags (around the "this app uses PostgreSQL" / "I prefer 4-space indentation" example block in the Scope System section). | Re-ingest CLAUDE.md, confirm no new `uses_database=postgresql` or `convention=4-space indentation` facts appear. |
+| 1.2 | Update CLAUDE.md predicate count: "8 entries" → "9 entries (includes `reference`)". | grep "8 entries" returns 0 matches in CLAUDE.md. |
+| 1.3 | Update CLAUDE.md MCP tool count: "25 tools total" → "23 tools total". | grep "25 tools" returns 0 matches in CLAUDE.md. |
+| 1.4 | Update `.claude/rules/claude_memory.generated.md` regeneration path: run `claude-memory publish` after Phase 2 cleanup to refresh. | Generated file matches active facts. |
+**Done criteria:** Source-text fix in place. No new hallucinations will be created from these specific triggers.
+### Phase 2 — Clean the noise floor *(reclaims trust in the DB, ~45 min)*
+Bulk-reject the hallucinated facts. Order matters — reject leaves before clearing conflicts.
+| # | Action | Command | Verification |
+|---|---|---|---|
+| 2.1 | Reject hallucinated `uses_framework` facts. Keep: none in production code (this gem has no framework dependency in the runtime sense). | `claude-memory reject 45 46 47 50 51 53 54 55 56 57 61 65 66 67 72 73 74 134 196 197 198` (rails, sinatra, react, express, next.js, django + tail of synonyms) | `memory.recall "uses_framework" scope=project` returns 0. |
+| 2.2 | Reject hallucinated `uses_language` facts. Keep: `ruby` (id 76). Reject the rest. | `claude-memory reject 75 78 91 149 195` (javascript, go, python, typescript, rust) | `memory.recall "uses_language" scope=project` returns only ruby. |
+| 2.3 | Reject hallucinated `uses_database` facts. Keep: `sqlite` (id 21). Reject the rest. | `claude-memory reject 62 63 139 148 154 155` (mysql, postgres, postgresql, redis) | `memory.recall "uses_database" scope=project` returns only sqlite. |
+| 2.4 | Reject hallucinated `deployment_platform` facts. The gem has none — reject all. | `claude-memory reject 27 48 49 52 70` (azure, aws, gcp, vercel, docker) | `memory.recall "deployment_platform" scope=project` returns 0. |
+| 2.5 | Dedupe global `convention` facts (Docker, tmux, iTerm2, VS Code duplicates). | `claude-memory reject 8 11 12 10` (keep ids 1, 6, 7, 5 from each duplicate pair) | `memory.conventions scope=global` returns ≤ 7 distinct facts. |
+| 2.6 | Triage the 99-item distillation backlog. Either `/distill-transcripts` to deeply extract or bulk-call `memory.mark_distilled` to clear. | Use `memory.undistilled` to inspect, then run `/distill-transcripts` for items with potential, mark-distilled for the rest. | `memory.stats` pending_distillation < 10. |
+| 2.7 | Conflicts close as a side effect of rejection (reject resolves the open conflict in the same transaction per `claude-memory reject` convention). | — | `memory.conflicts` returns 0. |
+**Done criteria:** Project DB has ~25 high-signal active facts. Conflicts: 0. Pending distillation: < 10.
+**Note:** The exact fact IDs above are from the 2026-05-21 snapshot. Re-query `memory.stats` and `memory.recall` before executing to confirm the IDs haven't shifted.
+### Phase 3 — Fix the system *(prevents recurrence, ~2-3 hours)*
+| # | Action | Where | Verification |
+|---|---|---|---|
+| 3.1 | **Fix `memory.conventions` scope filter.** Default should return project conventions (with optional global merge), not global-only. | `lib/claude_memory/mcp/handlers/shortcut_handlers.rb` (convention shortcut) + `lib/claude_memory/mcp/tool_definitions.rb` (tool description). | New spec: `memory.conventions` on this repo returns the 28 project conventions, not just the 7 global ones. |
+| 3.2 | **Fix `memory.decisions` predicate filter.** Should return only `decision`-predicate facts (with reason clauses), not `uses_*` rows. | `lib/claude_memory/mcp/handlers/shortcut_handlers.rb` (decision shortcut). | `memory.decisions` returns only facts where `predicate = 'decision'`. |
+| 3.3 | **Fix `memory.architecture` predicate filter.** Should return only `architecture`-predicate facts, not `uses_*` aggregates. | `lib/claude_memory/mcp/handlers/shortcut_handlers.rb`. | `memory.architecture` returns only facts where `predicate = 'architecture'`. |
+| 3.4 | **Migrate auto-memory `~/.claude/projects/.../memory/*.md` content into the project DB.** Each gotcha becomes a `convention` or `architecture` fact with full reason clause. Today it's only surfaced transiently via `AutoMemoryMirror`. | One-off script or a new `claude-memory import-auto-memory` command. | Top auto-memory gotchas (Configuration-instance-only, schema-version-in-tests, FTS indexing pattern, etc.) are reachable via `memory.recall`. |
+| 3.5 | **Strengthen `ReferenceMaterialDetector`** so it catches the specific scope-system example phrasing if `<no-memory>` is ever removed. Add unit test: extracting from "this app uses PostgreSQL" inside a paragraph titled "Scope System" should be rejected as reference material. | `lib/claude_memory/distill/reference_material_detector.rb`. | New spec passes. |
+| 3.6 | **Add `Resolve::Resolver` conflict auto-resolution heuristic** for single-cardinality predicates when one fact has reference-material signals and the other doesn't. | `lib/claude_memory/resolve/resolver.rb`. | New spec: conflict between authoritative SQLite fact and example-text Postgres fact auto-resolves in SQLite's favor. |
+**Done criteria:** Shortcut tools return signal-only. Auto-memory gotchas are first-class facts. Future hallucinations are caught earlier.
+### Phase 4 — Verify & make systemic *(prevents regression, ~1 hour)*
+| # | Action | Where | Verification |
+|---|---|---|---|
+| 4.1 | **Create `bin/memory-audit`** — script that runs `memory.stats`, lists active facts by predicate, flags `uses_*` outliers (e.g. multiple `uses_database` values active), reports open conflicts, reports pending distillation count, and exits non-zero if thresholds are exceeded. | `bin/memory-audit`. | Script run on a clean DB exits 0. Inject a hallucinated fact and confirm exit 1. |
+| 4.2 | **Add a benchmark `spec/benchmarks/health/database_signal_spec.rb`** that codifies expected signal-to-noise ratio for this project: minimum conventions reachable via `memory.conventions`, maximum `uses_*` cardinality for single-cardinality predicates, zero open conflicts on main. | `spec/benchmarks/health/`. | `bundle exec rspec spec/benchmarks/health/` passes on a clean DB. |
+| 4.3 | **Add a quarterly recurring task** (or schedule via the `schedule` skill) to re-run this audit and produce a dated `docs/memory_audit_<date>.md`. | `bin/run-evals --health` or a cron entry. | An audit is generated on schedule, diffed against the prior. |
+| 4.4 | **Update `docs/api_stability.md`** to mention the audit script and the signal-health benchmark as part of pre-release verification. | `docs/api_stability.md`. | Mention is present. |
+**Done criteria:** Future drift will be caught by a script, not by accident.
+---
+## Appendix A — Fact IDs to Reject (2026-05-21 snapshot)
+Re-verify before executing; IDs may shift if other writes occur.
+**`uses_framework` (reject all):** 45, 46, 47, 50, 51, 53, 54, 55, 56, 57, 61, 65, 66, 67, 72, 73, 74, 134, 196, 197, 198.
+**`uses_language` (keep 76 ruby; reject):** 75, 78, 91, 149, 195.
+**`uses_database` (keep 21 sqlite; reject):** 62, 63, 139, 148, 154, 155.
+**`deployment_platform` (reject all):** 27, 48, 49, 52, 70.
+**Global convention duplicates (reject):** 8, 10, 11, 12.
+**Estimated cleanup result:** project active facts drop from 46 → ~25, all open conflicts close, generated snapshot's "Technical Constraints" block becomes accurate.
+## Appendix B — Out of Scope (intentional)
+- Migrating to a different vector store. Not the bottleneck.
+- Re-architecting the distillation pipeline. The three-layer design is sound; only inputs and shortcut filters are broken.
+- Changing the dual-DB model. Working as intended.
+## Appendix C — Audit Methodology
+1. Repo audit via Explore subagent covering 16 areas (layout, layers, dual-DB, distillation, predicates, hooks, recall engine, embeddings, MCP, dashboard, telemetry, tests, conventions, recent changes, public API, contradictions).
+2. Memory database query via MCP tools: `memory.stats`, `memory.status`, `memory.decisions`, `memory.conventions`, `memory.architecture`, `memory.conflicts`, `memory.changes`, `memory.list_projects`.
+3. Cross-reference: for each documented claim, verify against code and grade tool output usefulness for a hypothetical coding agent dropped into the repo.
+---
+*End of audit. Next snapshot: 2026-08-21 (quarterly) or after Phase 3 ships, whichever comes first.*
+</no-memory>
+<no-memory>
+## Pipeline Execution Results — 2026-05-21
+All four phases executed in a single session. Final state:
+| Metric | Before | After |
+|---|---|---|
+| Global active facts | 7 | 4 |
+| Project active facts | 46 | 68 *(↑ via auto-memory import)* |
+| Open conflicts | 10 | 0 |
+| Pending distillation | 99 | 0 |
+| Single-cardinality violations | 1+ | 0 |
+| `uses_database` active | 4 contradictory | 1 (sqlite) |
+| `deployment_platform` active | 1 hallucinated | 0 |
+| `uses_framework` active | 2 hallucinated | 0 |
+| `uses_language` active | 3 (1 real, 2 halluc.) | 1 (ruby) |
+### Phase 1 — Stop the bleeding (DONE)
+- Wrapped CLAUDE.md scope-system example in `<no-memory>` tags.
+- Fixed "25 tools" → "23 tools" drift in `CLAUDE.md`, `docs/api_stability.md`, `docs/plugin.md`.
+- Wrapped this audit doc in `<no-memory>` so it doesn't self-contaminate on re-ingestion.
+### Phase 2 — Clean the noise floor (DONE)
+- Rejected 9 hallucinated project facts plus 3 global duplicates (Docker / iTerm2 / tmux variants).
+- Bulk-marked 99 backlog content items as distilled.
+- All 10 open conflicts closed.
+### Phase 3 — Fix the system (DONE)
+- **Shortcuts refactored** (`lib/claude_memory/shortcuts.rb`): switched from FTS text search to predicate-based filtering. `memory.conventions` now returns both project and global facts (was global-only); `memory.decisions` now returns only `decision`-predicate facts (was returning `uses_*` too); `memory.architecture` returns only architecture + stack-shaping predicates.
+- **Tool descriptions updated** in `lib/claude_memory/mcp/tool_definitions.rb` to reflect new behavior.
+- **AutoMemoryMirror slug bug** fixed (`tr("/", "-")` → `tr("/_", "-")`) — previously silently missed auto-memory for any project name with underscores, including `claude_memory` itself.
+- **New CLI command** `claude-memory import-auto-memory [--dry-run]` migrates `~/.claude/projects/<slug>/memory/*.md` into the project DB as durable facts. Imported 27 gotchas/feedback/reference files.
+- **ReferenceMaterialDetector strengthened** with `QUOTE_GUARDED_PREDICATES` and `EXAMPLE_QUOTE_PATTERNS`: stack predicates extracted from example-text quotes ('e.g., ...PostgreSQL') now reroute to `reference`.
+- **Resolver discard heuristic** added: single-cardinality predicates with example-text quotes get silently dropped instead of creating disputed-fact + conflict rows. Catches the case where Layer-1 NullDistiller bypasses ReferenceMaterialDetector.
+- Specs added/updated for each (50 examples across `shortcuts_spec.rb`, `reference_material_detector_spec.rb`, `resolver_spec.rb`). Full suite: **2121 passing, 0 failures**.
+### Phase 4 — Verify & make systemic (DONE)
+- **`bin/memory-audit`** script writes a stable JSON shape (`--json`) and exits non-zero on threshold breach. Hard fails on: any open conflict, >1 active fact per single-cardinality predicate, ≥ 100 pending distillation. Warns on ≥ 25 pending.
+- **`spec/benchmarks/health/database_signal_spec.rb`** codifies the contracts as a `:benchmark`-tagged RSpec suite. Runs against the live DB. 10 examples passing.
+- **`docs/api_stability.md` Section 7** documents the audit script's stable surface and the benchmark spec; corrected schema version from v17 → v18 (8 → 18 migrations).
+### Re-running this pipeline
+If this audit goes stale (drift creeps back in), re-run from the top:
+```bash
+bin/memory-audit                                                  # see current state
+bundle exec rspec spec/benchmarks/health/ --tag benchmark        # confirm contracts
+./exe/claude-memory import-auto-memory --dry-run                 # preview new auto-memory
+```
+Failure modes the audit will catch:
+- A new contamination source landing in CLAUDE.md or another auto-ingested file.
+- A shortcut handler losing predicate-filter semantics in a refactor.
+- A resolver bug re-introducing single-cardinality conflicts.
+- An undistilled-content backlog from a high-traffic week.
+</no-memory>

data/docs/plugin.md CHANGED Viewed

@@ -133,7 +133,7 @@ Unlike traditional approaches that require a separate API key, ClaudeMemory uses
 ### MCP Server
-The plugin exposes 25 tools to Claude. Highlights:
+The plugin exposes 23 tools to Claude. Highlights:
 | Tool | Description |
 |------|-------------|

data/docs/quality_review.md CHANGED Viewed

@@ -9,6 +9,41 @@
 ---
+## Post-0.11 Investigation: Hallucination Rate Metric Calibration (2026-04-30)
+When #48 (hallucination-rate metric) was first run against this project's real DB, it surfaced numbers that *looked* alarming:
+- Quality score: 39/100
+- Bare conclusions: 34 / 59 active facts (57.6%)
+- 7-day rejection rate: 27 of 32 facts (84.4%)
+The first read was that the LLM extractor was producing noise faster than usable knowledge. Per `improvements.md` #60, four causes were proposed; diagnostics ran 2026-04-30:
+| Cause | Verdict | Evidence |
+|---|---|---|
+| Prompt drift in `distill-transcripts.md` | **Confirmed dominant** | 34/35 (97%) bare-conclusion facts pre-date the reason-clause prompt commit `f22d12f` (2026-04-20). Only 1 was created post-commit (and that one is a meta-convention added during this session). |
+| Auto-memory mirror regurgitation | Rejected | 0/35 substring matches in `~/.claude/projects/.../memory/*.md`. Auto-memory mirror only landed in 0.10.0 (2026-04-28), after the bare-fact creation window — temporally impossible to be the source. |
+| `ReferenceMaterialDetector` predicate scope too narrow | Not material | Only 3/35 bare facts are `decision`-predicate; 0 of those match the strong reference-material patterns. Expanding `GUARDED_PREDICATES` would not move the needle on the bare-conclusion count. |
+| Junky corpus / rejection cluster | **Confirmed in single class** | All 27 rejected facts in the 7-day window are `uses_database` (18) or `deployment_platform` (9), all with `session_id=nil` (MCP-originated, almost certainly `/study-repo` runs misattributing external-project tech to this project), all from 2026-04-23 to 04-24. Systemic single-class failure, correctly cleaned up after detection — not ongoing extraction noise. |
+**What this means for #48 as currently shipped:**
+The metric is *technically correct* but *pragmatically misleading*. It bakes historical noise (pre-prompt-commit bare conclusions) into a signal that users will read as "ongoing extraction quality." A 57.6% bare-conclusion rate looks like the LLM is broken; in reality the live extraction rate (post-2026-04-20) is ~3% (1 bare fact out of ~30+ created since the prompt commit landed).
+The 84% rejection rate has a similar structural issue: it counts cleanup of a bursty `/study-repo` regression against the active-facts denominator, not against the actual extraction quality of the live window.
+**Quick fix shipping now (this session):** restrict `quality_score` and the digest's "Quality" section to facts created within the same 30-day window already used by `token_budget`. Surface a separate "historical" line so users can see both numbers, but the headline is the live one. This makes the metric actionable: a high live bare-conclusion rate = live LLM calibration drift; a high historical rate = legacy data, not a current alarm.
+**Deferred to 0.12 / 1.x:**
+1. The systemic `/study-repo` misattribution failure mode (cause 4) deserves its own guard. External-project READMEs being studied should land in `reference` predicates, not as `uses_database`/`deployment_platform`. Track this as a follow-up entry.
+2. A backfill/cleanup pass on the 34 historical bare-conclusion facts: either retroactive rejection, or a one-shot reclassification that moves them to a `legacy_observation` predicate that the prompt's reason-clause requirement doesn't apply to.
+3. The metric's calibration assumes "bare conclusion = bad", but spot-checking shows several flagged facts are perfectly informative ("MCP tools return dual content + structuredContent via TextSummary module") — they describe mechanics implicitly. The vocabulary may itself be too strict; revisit during 1.0 soak with real usage data.
+**Process win:** the metric did its job — it surfaced a real signal that would otherwise have stayed invisible, and the investigation distinguished historical noise from live calibration. Without #48 we'd have no way to know.
+---
 ## Executive Summary
 Six days, +2,011 LOC. The headline finding: **the watch-list item from 2026-04-22 (#28 — extract per-endpoint helpers from `Dashboard::API`) was not just deferred, it actively regressed.** `dashboard/api.rb` grew from 627 → 807 LOC (+180, +29%), is now the only file in `lib/` over 750 lines, and gained four new methods all exceeding 15 lines. Method-size pressure increased: the previous worst case (`recall` at 39 lines) is now `timeline` at 52 lines, and the file has 11 methods over 15 lines (vs 11 last review) but with a higher mean.

data/lib/claude_memory/audit/checks.rb ADDED Viewed

@@ -0,0 +1,239 @@
+# frozen_string_literal: true
+module ClaudeMemory
+  module Audit
+    # Individual audit checks. Each method takes a Store::StoreManager and
+    # returns an Array<Finding>. Checks must be read-only — write
+    # operations belong in dedicated commands the user opts into.
+    #
+    # Adding a new check:
+    #   1. Define a method here with an explicit C### id assignment.
+    #   2. Append the method name to Runner::CHECK_METHODS.
+    #   3. Document it in docs/audit_runbook.md.
+    module Checks
+      module_function
+      # C001 — Open conflicts in either DB.
+      def open_conflicts(manager)
+        findings = []
+        {project: manager.store_if_exists("project"), global: manager.store_if_exists("global")}.each do |scope, store|
+          next unless store
+          conflicts = store.open_conflicts
+          next if conflicts.empty?
+          findings << Finding.new(
+            id: "C001",
+            severity: :error,
+            title: "#{conflicts.size} open conflict(s) in #{scope} DB",
+            detail: "Open conflicts indicate unresolved single-cardinality disputes. Each will keep re-firing until the losing fact is rejected.",
+            suggestion: "claude-memory conflicts && claude-memory reject <fact_id>",
+            fact_ids: conflicts.flat_map { |c| [c[:fact_a_id], c[:fact_b_id]] }.uniq
+          )
+        end
+        findings
+      end
+      # C002 — Single-cardinality predicates with > 1 active fact.
+      SINGLE_CARDINALITY_PREDICATES = %w[uses_database deployment_platform auth_method].freeze
+      def single_cardinality_multiplicity(manager)
+        store = manager.store_if_exists("project")
+        return [] unless store
+        SINGLE_CARDINALITY_PREDICATES.flat_map do |predicate|
+          rows = store.facts.where(status: "active", predicate: predicate).all
+          next [] if rows.size <= 1
+          [Finding.new(
+            id: "C002",
+            severity: :error,
+            title: "predicate=#{predicate} has #{rows.size} active facts (single-cardinality)",
+            detail: "Single-cardinality predicates must have at most one active value. Multiple actives mean resolver dropped a supersession or distillation produced contradictory claims.",
+            suggestion: "Inspect with: claude-memory explain <fact_id>. Reject the wrong ones: claude-memory reject <fact_id>",
+            fact_ids: rows.map { |r| r[:id] }
+          )]
+        end
+      end
+      # C003 — Distillation backlog (warn ≥ 25, error ≥ 100).
+      def distillation_backlog(manager)
+        store = manager.store_if_exists("project")
+        return [] unless store
+        distilled_ids = store.ingestion_metrics.select(:content_item_id).distinct
+        pending = store.content_items.exclude(id: distilled_ids).count
+        return [] if pending < 25
+        severity = (pending >= 100) ? :error : :warn
+        [Finding.new(
+          id: "C003",
+          severity: severity,
+          title: "#{pending} content items not yet deeply distilled",
+          detail: "Backlog grows when SessionStart distillation prompts aren't acknowledged with memory.mark_distilled. A large backlog means the same text gets re-extracted across sessions, increasing hallucination rate.",
+          suggestion: "Triage with /distill-transcripts (interactive) OR mark all distilled if you accept the backlog is noise: claude-memory sweep --mark-all-distilled",
+          fact_ids: []
+        )]
+      end
+      # C004 — memory.decisions leaking non-decision predicates.
+      def shortcut_decision_leak(manager)
+        results = Shortcuts.decisions(manager, limit: 50)
+        leaked = results.map { |r| r[:fact][:predicate] }.uniq - ["decision"]
+        return [] if leaked.empty?
+        [Finding.new(
+          id: "C004",
+          severity: :error,
+          title: "memory.decisions returns non-decision predicates: #{leaked.inspect}",
+          detail: "memory.decisions should return only `decision`-predicate facts. Predicate leakage suggests the shortcut implementation has regressed back to text-search filtering (pre-2026-05-21 audit).",
+          suggestion: "Inspect lib/claude_memory/shortcuts.rb — SHORTCUTS[:decisions][:predicates] should equal ['decision']. Run `bundle exec rspec spec/claude_memory/shortcuts_spec.rb`.",
+          fact_ids: results.select { |r| leaked.include?(r[:fact][:predicate]) }.map { |r| r[:fact][:id] }
+        )]
+      end
+      # C005 — memory.conventions returns no project facts despite project conventions existing.
+      def shortcut_convention_scope(manager)
+        project_store = manager.store_if_exists("project")
+        return [] unless project_store
+        project_count = project_store.facts.where(status: "active", predicate: "convention").count
+        return [] if project_count.zero?
+        results = Shortcuts.conventions(manager, limit: 50)
+        project_returned = results.count { |r| r[:source] == "project" }
+        return [] if project_returned > 0
+        [Finding.new(
+          id: "C005",
+          severity: :warn,
+          title: "memory.conventions returned 0 project facts despite #{project_count} project conventions existing",
+          detail: "Pre-2026-05-21 audit, memory.conventions was hardcoded to scope=global. If you're seeing 0 project facts in a project with conventions, the shortcut has regressed.",
+          suggestion: "Check Shortcuts.collect_facts in lib/claude_memory/shortcuts.rb. Re-run `bundle exec rspec spec/claude_memory/shortcuts_spec.rb`.",
+          fact_ids: []
+        )]
+      end
+      # C006 — Duplicate global convention candidates (near-identical text).
+      def duplicate_global_conventions(manager)
+        store = manager.store_if_exists("global")
+        return [] unless store
+        rows = store.facts.where(status: "active", predicate: "convention").select(:id, :object_literal).all
+        return [] if rows.size < 2
+        # Group by normalized object text (lowercased, stripped of leading
+        # "uses"/"prefers"/punctuation). Pairs with the same normalized
+        # key are likely near-duplicates.
+        groups = rows.group_by { |r| normalize_convention(r[:object_literal]) }
+        dupe_groups = groups.select { |_, list| list.size > 1 }
+        return [] if dupe_groups.empty?
+        [Finding.new(
+          id: "C006",
+          severity: :info,
+          title: "#{dupe_groups.size} near-duplicate global convention group(s)",
+          detail: "Multiple global conventions normalize to the same phrasing. Pick the cleanest and reject the rest to keep memory.conventions output tight.",
+          suggestion: "Review with: claude-memory recall <concept> --scope=global. Reject duplicates: claude-memory reject <fact_id>",
+          fact_ids: dupe_groups.values.flatten.map { |r| r[:id] }
+        )]
+      end
+      # C007 — Bare-conclusion decisions/conventions (no reason clause).
+      def bare_conclusion_rate(manager)
+        store = manager.store_if_exists("project")
+        return [] unless store
+        detector = Distill::BareConclusionDetector.new
+        rows = store.facts.where(status: "active", predicate: %w[decision convention]).select(:id, :predicate, :object_literal).all
+        bare = rows.select { |r| detector.bare_conclusion?(predicate: r[:predicate], object_literal: r[:object_literal]) }
+        return [] if bare.empty?
+        ratio = bare.size.to_f / rows.size
+        return [] if ratio < 0.3
+        [Finding.new(
+          id: "C007",
+          severity: :info,
+          title: "#{(ratio * 100).round}% of decisions/conventions lack reason clauses (#{bare.size}/#{rows.size})",
+          detail: "Facts without 'because/so that/to avoid/...' lose their justification once context fades. Bare conclusions are dead weight when the team grows or you revisit a year later.",
+          suggestion: "Inspect with: claude-memory explain <fact_id>. Reject low-value bare facts or rewrite with reason clauses via memory.store_extraction.",
+          fact_ids: bare.map { |r| r[:id] }
+        )]
+      end
+      # C008 — Project DB starvation (< 5 active facts may indicate broken ingest).
+      def project_starvation(manager)
+        store = manager.store_if_exists("project")
+        return [] unless store
+        count = store.facts.where(status: "active").count
+        return [] if count >= 5
+        [Finding.new(
+          id: "C008",
+          severity: :warn,
+          title: "Only #{count} active project fact(s)",
+          detail: "A nearly-empty project DB suggests either a fresh install (ignore) OR a broken ingest pipeline / overzealous rejection. Verify hooks are firing: claude-memory doctor.",
+          suggestion: "claude-memory doctor; claude-memory stats; check .claude/settings.json hook configuration.",
+          fact_ids: []
+        )]
+      end
+      # C009 — Auto-memory drift (markdown files newer than project DB facts).
+      def auto_memory_unimported(manager)
+        config = Configuration.new
+        dir = Hook::AutoMemoryMirror.default_dir(config.project_dir, config.claude_config_dir)
+        return [] unless Dir.exist?(dir)
+        md_files = Dir.glob(File.join(dir, "*.md")).reject { |f| File.basename(f) == "MEMORY.md" }
+        return [] if md_files.empty?
+        store = manager.store_if_exists("project")
+        return [] unless store
+        # Look for auto_memory_import content items as evidence of prior
+        # import. Count files that would be new on the next import.
+        imported_count = store.content_items.where(source: "auto_memory_import").count
+        net_new = md_files.size - imported_count
+        return [] if net_new <= 0
+        [Finding.new(
+          id: "C009",
+          severity: :info,
+          title: "#{net_new} auto-memory file(s) not yet imported",
+          detail: "~/.claude/projects/<slug>/memory/*.md files contain durable knowledge that isn't reachable via memory.recall until imported. AutoMemoryMirror only surfaces them transiently at SessionStart.",
+          suggestion: "Preview: claude-memory import-auto-memory --dry-run. Import: claude-memory import-auto-memory.",
+          fact_ids: []
+        )]
+      end
+      # C010 — Recurring single-cardinality churn (history shows the same
+      # predicate has accumulated many superseded/disputed facts — sign of
+      # a persistent contamination source).
+      CHURN_THRESHOLD = 5
+      def single_cardinality_churn(manager)
+        store = manager.store_if_exists("project")
+        return [] unless store
+        SINGLE_CARDINALITY_PREDICATES.flat_map do |predicate|
+          non_active = store.facts
+            .where(predicate: predicate, status: %w[superseded disputed rejected])
+            .count
+          next [] if non_active < CHURN_THRESHOLD
+          [Finding.new(
+            id: "C010",
+            severity: :warn,
+            title: "predicate=#{predicate} shows churn: #{non_active} historical non-active facts",
+            detail: "Repeated supersession/dispute on a single-cardinality predicate usually means a contamination source (e.g., example text in CLAUDE.md or docs) keeps re-introducing the same hallucination.",
+            suggestion: "Find the contamination source: claude-memory recall <bad_value> --scope=project. Wrap the trigger text in <no-memory> tags. See docs/audit_runbook.md.",
+            fact_ids: []
+          )]
+        end
+      end
+      def normalize_convention(text)
+        text.to_s
+          .downcase
+          .gsub(/\b(?:uses|prefers|always|never)\b/, "")
+          .gsub(/[[:punct:]]/, "")
+          .gsub(/\s+/, " ")
+          .strip
+      end
+    end
+  end
+end

data/lib/claude_memory/audit/finding.rb ADDED Viewed

@@ -0,0 +1,33 @@
+# frozen_string_literal: true
+module ClaudeMemory
+  module Audit
+    # A single audit finding. Immutable value object emitted by checks
+    # (see Audit::Checks) and aggregated by Audit::Runner.
+    #
+    # Severity levels:
+    #   - :error — a contract violation; CI/automation should fail
+    #   - :warn  — likely problem requiring attention but not blocking
+    #   - :info  — observation; suggests an optimization or cleanup
+    #
+    # Each finding embeds the suggested remediation command(s) as plain
+    # strings so the audit output is directly actionable. The skill
+    # `/audit-memory` reads these and offers to run them for the user.
+    Finding = Data.define(:id, :severity, :title, :detail, :suggestion, :fact_ids) do
+      def error? = severity == :error
+      def warn? = severity == :warn
+      def info? = severity == :info
+      def to_h
+        {
+          id: id,
+          severity: severity,
+          title: title,
+          detail: detail,
+          suggestion: suggestion,
+          fact_ids: fact_ids
+        }
+      end
+    end
+  end
+end