RubyGems - claude_memory - Versions diffs - 0.9.1 → 0.10.0 - Mend

claude_memory 0.9.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (73) hide show

checksums.yaml +4 -4
data/.claude/memory.sqlite3 +0 -0
data/.claude/skills/dashboard/SKILL.md +42 -0
data/.claude-plugin/marketplace.json +1 -1
data/.claude-plugin/plugin.json +1 -1
data/CHANGELOG.md +86 -0
data/CLAUDE.md +21 -5
data/README.md +32 -2
data/db/migrations/015_add_activity_events.rb +26 -0
data/db/migrations/016_add_moment_feedback.rb +22 -0
data/db/migrations/017_add_last_recalled_at.rb +15 -0
data/docs/1_0_punchlist.md +190 -0
data/docs/EXAMPLES.md +41 -2
data/docs/GETTING_STARTED.md +31 -4
data/docs/architecture.md +22 -7
data/docs/audit-queries.md +131 -0
data/docs/dashboard.md +172 -0
data/docs/improvements.md +465 -9
data/docs/influence/cq.md +187 -0
data/docs/plugin.md +13 -6
data/docs/quality_review.md +489 -172
data/docs/reflection_memory_as_accumulating_judgment.md +67 -0
data/lib/claude_memory/activity_log.rb +86 -0
data/lib/claude_memory/commands/census_command.rb +210 -0
data/lib/claude_memory/commands/completion_command.rb +3 -0
data/lib/claude_memory/commands/dashboard_command.rb +54 -0
data/lib/claude_memory/commands/dedupe_conflicts_command.rb +55 -0
data/lib/claude_memory/commands/digest_command.rb +181 -0
data/lib/claude_memory/commands/hook_command.rb +34 -0
data/lib/claude_memory/commands/reclassify_references_command.rb +56 -0
data/lib/claude_memory/commands/registry.rb +6 -1
data/lib/claude_memory/commands/skills/distill-transcripts.md +13 -1
data/lib/claude_memory/commands/stats_command.rb +38 -1
data/lib/claude_memory/commands/sweep_command.rb +2 -0
data/lib/claude_memory/configuration.rb +16 -0
data/lib/claude_memory/core/relative_time.rb +9 -0
data/lib/claude_memory/dashboard/api.rb +610 -0
data/lib/claude_memory/dashboard/conflicts.rb +279 -0
data/lib/claude_memory/dashboard/efficacy.rb +127 -0
data/lib/claude_memory/dashboard/fact_presenter.rb +109 -0
data/lib/claude_memory/dashboard/health.rb +175 -0
data/lib/claude_memory/dashboard/index.html +2707 -0
data/lib/claude_memory/dashboard/knowledge.rb +136 -0
data/lib/claude_memory/dashboard/moments.rb +244 -0
data/lib/claude_memory/dashboard/reuse.rb +97 -0
data/lib/claude_memory/dashboard/scoped_fact_resolver.rb +95 -0
data/lib/claude_memory/dashboard/server.rb +211 -0
data/lib/claude_memory/dashboard/timeline.rb +68 -0
data/lib/claude_memory/dashboard/trust.rb +285 -0
data/lib/claude_memory/distill/reference_material_detector.rb +78 -0
data/lib/claude_memory/hook/auto_memory_mirror.rb +112 -0
data/lib/claude_memory/hook/context_injector.rb +97 -3
data/lib/claude_memory/hook/handler.rb +50 -3
data/lib/claude_memory/mcp/handlers/management_handlers.rb +8 -0
data/lib/claude_memory/mcp/query_guide.rb +11 -0
data/lib/claude_memory/mcp/text_summary.rb +29 -0
data/lib/claude_memory/mcp/tool_definitions.rb +13 -0
data/lib/claude_memory/mcp/tools.rb +148 -0
data/lib/claude_memory/publish.rb +13 -21
data/lib/claude_memory/recall/stale_detector.rb +67 -0
data/lib/claude_memory/resolve/predicate_policy.rb +2 -0
data/lib/claude_memory/resolve/resolver.rb +41 -11
data/lib/claude_memory/store/llm_cache.rb +68 -0
data/lib/claude_memory/store/metrics_aggregator.rb +96 -0
data/lib/claude_memory/store/schema_manager.rb +1 -1
data/lib/claude_memory/store/sqlite_store.rb +47 -143
data/lib/claude_memory/store/store_manager.rb +29 -0
data/lib/claude_memory/sweep/maintenance.rb +216 -0
data/lib/claude_memory/sweep/recall_timestamp_refresher.rb +83 -0
data/lib/claude_memory/sweep/sweeper.rb +2 -0
data/lib/claude_memory/version.rb +1 -1
data/lib/claude_memory.rb +22 -0
metadata +49 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: b6df0a3f58a88c1bbec82ec20e26789d51ad2712408d058337a196c5eac90654
-  data.tar.gz: beb9c2ef59ef6a45430eeb03466e37f6b1f741ef1745b5303a1443b02a7c84b4
+  metadata.gz: a299c6ab2aeb95123dcb61f5c87a06b93d15a00a2ed9ff2c8343e7fde6b369cb
+  data.tar.gz: d09c02a2f5dcd4bd0dfcb625793505bd2218c7df04230411e813a7543e7e7382
 SHA512:
-  metadata.gz: '06905bca1f77df5642caf0846cde7394ba9a1baf3c954138383ac39927fcaae2ef097ff79dd3c866e6930fa0eac0d0fb958366bded54a0616d8e356a316e616c'
-  data.tar.gz: 9a8e3c455c20ae616bc239b766e1d4e2aa4c6e5448f494294d9c6a646a8a613428e9b63218624c5cae7e30f389704dd3bee6b788e97a369cb719b115abffddd7
+  metadata.gz: 87fd7dab40cb2e5b190de071f99bcc1394e98e5f426951eedaff09b190fa66591b40f49580bca45f75819170ba939a0d3d9239f4825825b431fd4a83d388bb7d
+  data.tar.gz: ffb4ab50ba94a8f3c7bfb8129f01ea96fd27b981dd614f0addd7d65a9fc2b4b8562b9d23148bb5ea4ee90b5ae5a9fc183d1c82e68d3b009557967a00b96bfec1

data/.claude/memory.sqlite3 CHANGED Viewed

Binary file

data/.claude/skills/dashboard/SKILL.md ADDED Viewed

@@ -0,0 +1,42 @@
+---
+name: dashboard
+description: Launch a local web dashboard for ClaudeMemory debugging and observability
+---
+# Dashboard
+Launch the ClaudeMemory debugging dashboard to visualize memory system health, activity, and efficacy.
+## Task
+Start the dashboard web server so the user can inspect what's happening behind the scenes.
+## Steps
+1. Run the dashboard command:
+```bash
+claude-memory dashboard
+```
+This starts a local web server (default port 3377) and opens it in the browser.
+## What the Dashboard Shows
+- **Health Status**: Database health, hook configuration, vector index status
+- **Overview**: Fact/entity/content counts, top predicates, entity type distribution, 30-day activity timeline
+- **Activity**: Live event log of hook executions (ingest, context, sweep), memory recalls, and store extractions with timing and details
+- **Facts**: Searchable fact explorer with status filtering, predicate/object search
+- **Efficacy**: Recall hit rate, total results served, average results per query, top queries by result count
+## Options
+- `--port PORT` - Use a different port (default: 3377)
+- `--no-open` - Don't auto-open the browser
+## Notes
+- Dashboard auto-refreshes every 30 seconds
+- Activity events are recorded by hooks and MCP tools into the `activity_events` table
+- The dashboard reads from both global and project databases
+- Press Ctrl+C to stop the server

data/.claude-plugin/marketplace.json CHANGED Viewed

@@ -7,7 +7,7 @@
   "plugins": [
     {
       "name": "claude-memory",
-      "version": "0.9.1",
+      "version": "0.10.0",
       "source": "./",
       "description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
       "repository": "https://github.com/codenamev/claude_memory"

data/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-memory",
-  "version": "0.9.1",
+  "version": "0.10.0",
   "description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
   "author": {
     "name": "Valentino Stoll",

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,92 @@ All notable changes to this project will be documented in this file.
 ## [Unreleased]
+## [0.10.0] - 2026-04-28
+### Added
+**Dashboard — feed-first redesign with observability built in**
+- New feed-first dashboard UI with scope-aware moments, fact detail modal, query tester, and activity drilldown. Reuse, Trust, Knowledge, Conflicts, and Moments panels each backed by a dedicated module (`Dashboard::{Reuse, Trust, Knowledge, Conflicts, Moments}`) under unit tests, replacing the prior all-in-API-class layout.
+- 👍/👎 feedback on individual moments with persisted verdicts (schema v16, `moment_feedback` table). Trust panel surfaces a 30-day up/down ratio so the dashboard can answer "when memory surfaces something, are users marking it useful?".
+- Utilization ratio panel — of facts extracted in the last 30 days, how many has Claude actually used in a recall or context injection? Color-coded (green ≥40%, yellow ≥15%, red below). Hidden on fresh installs to avoid misleading zeros.
+- Conflict deduping at the display layer: identical (subject, predicate, object_pair) detections collapse into one row with a `×N` badge. Sidebar "Needs review" count now reflects distinct contradictions, not raw row count.
+- Activity events drilldown: each moment opens a payload modal with prettified JSONL, recall trigger correlation (which user prompt motivated this lookup), and linked-fact resolution scoped per database.
+- Vector index health threshold and clickable remediation hints in the health dashboard.
+**CLI — observability surfaces and one-shot cleanups**
+- `claude-memory digest [--since DAYS] [--output FILE]` — weekly markdown report. Sections: Activity, New knowledge by predicate, Utilization (extracted vs used), Conflicts, Feedback. No new schema; renders from existing aggregates.
+- `claude-memory census [--root DIR]` — privacy-safe cross-project vocabulary scan. Aggregates per-DB predicate × status counts, novel predicates, synonym candidates. Suppresses object literals, entity names, and paths; per-DB IDs are SHA256-prefixed.
+- `claude-memory dedupe-conflicts [--scope SCOPE] [--dry-run]` — one-shot cleanup for historical conflict-row duplication that predates the Resolver dedup fix (commit f571ba4). Groups by (subject, predicate, normalized object pair), keeps the earliest, migrates provenance to the keeper.
+- `claude-memory reclassify-references [--scope SCOPE] [--dry-run]` — retags active convention facts that the new `Distill::ReferenceMaterialDetector` flags as reference material (LOC counts, star counts, "X is a plugin..." templates, "by Firstname Lastname" attributions).
+**Memory quality**
+- Access-based staleness scoring (improvements.md #35). Schema v17 adds `last_recalled_at` to facts. `Sweep::RecallTimestampRefresher` derives the field periodically from activity_events; `claude-memory stats --stale [--stale-days N]` lists facts that haven't been recalled inside the threshold. Replaces the prior "active facts minus seen-in-recalls" approximation.
+- Auto-memory mirror (improvements.md #36). On fresh sessions, the SessionStart context hook scans `~/.claude/projects/<slug>/memory/*.md` and surfaces new or changed entries as extraction candidates so users can promote auto-memory observations into claude_memory without manual copy-paste.
+- Reasoning requirement enforced in distillation (improvements.md #34). The SessionStart prompt and the `/distill-transcripts` skill now require a why clause for `decision` and `convention` predicates ("because…", "so that…", etc.). Audit found ~75% of facts were bare conclusions before this change.
+- `Distill::ReferenceMaterialDetector` reclassifies convention facts whose object text matches reference patterns. New `reference` predicate registered in `PredicatePolicy` with its own `:references` snapshot section. Detector runs at write time in `ManagementHandlers#store_extraction` so mislabeling can't persist.
+- Predicate census command (#30) for cross-project vocabulary audits — see CLI section above.
+**Benchmarks and observability**
+- Repeat-correction benchmark harness (improvements.md #32). `spec/benchmarks/e2e/repeat_correction_spec.rb` pre-loads a past correction as a memory fact, runs the prompt through real Claude under `EVAL_MODE=real`, and reports pass rate (no violation patterns matched). Starter set of 2 scenarios drawn from this project's recurring gotchas.
+- Relevance ratio metric (improvements.md #31). `Hook::ContextInjector#emitted_subjects` exposes the subjects injected at SessionStart; `BenchmarkHelpers::RelevanceMetrics` measures whether they appear in Claude's response. Trend signal for memory-application quality, integrated into `devmemeval_spec.rb`.
+- MCP server embeds the V=R/C ("Verify before Recommend / Correct") mental model in agent instructions so memory recommendations come with built-in verification cues.
+**Schema v15 → v17 (additive only, automatic on first run)**
+- Migration 015: adds `activity_events` table for hook/recall/context/sweep telemetry. Powers the dashboard timeline, moments feed, and efficacy reports.
+- Migration 016: adds `moment_feedback` table (unique on event_id) for the dashboard 👍/👎 surface.
+- Migration 017: adds nullable `facts.last_recalled_at` for access-based staleness scoring.
+**1.0 readiness track**
+- New `docs/1_0_punchlist.md` opens the path to 1.0: token-budget telemetry, hallucination-rate metric, negative-fact harm benchmark, CLAUDE.md baseline publication, `claude-memory show`, benchmark scoreboard. Ten entries (#47-56) added to `docs/improvements.md` with concrete file:line plumbing notes.
+### Changed
+- `Resolver#apply_conflict` no longer creates a duplicate disputed fact + conflict row when the same contradicting value is re-extracted. Looks up disputed facts in the same (subject, predicate) slot and reinforces with provenance instead.
+- `Resolver` no longer treats the distiller's `scope_hint` as a scope override. `scope_hint` is advisory metadata; `fact.scope` must match the DB the row lives in. Earlier behavior caused scope leakage where global-hinted distillations landed in the project DB.
+- `Hook::ContextInjector` adds `emitted_fact_ids` and `emitted_subjects` accessors so benchmark harnesses can attribute injection contributions per session.
+- `SQLiteStore` decomposed via module inclusion: `LLMCache` and `MetricsAggregator` extracted into `lib/claude_memory/store/`. SQLiteStore back under 600 LOC.
+- `Dashboard::API` decomposed: `FactPresenter`, `Conflicts`, `Efficacy::Reporter`, `Timeline`, `Health` extracted into dedicated classes following the boundary pattern. API now routes/delegates rather than aggregating.
+- Dashboard releases DB connections after each HTTP request (was holding connections open for the lifetime of the WEBrick session).
+- `Sweep::Maintenance` gains `dedupe_open_conflicts` and `reclassify_references` for the one-shot CLI commands above.
+- Round-trip migration specs from v12, v13, v14 → v17 (per-version migrations covered by `spec/claude_memory/store/migrations/`). Codifies the release-blocker convention: any schema bump must round-trip from each prior major-release boundary back ~3 releases.
+### Fixed
+- Dashboard surfaces an actionable hint when Recall hits FTS5 corruption (run `claude-memory compact` rather than a generic error).
+- Dashboard query tester unwraps the nested Recall result shape rather than printing the raw envelope.
+- Dashboard health checks correctly detect the claude-memory hook installation across the two-level Claude Code hooks structure (was reporting false negatives when hooks were installed under a matcher block).
+- Dashboard Efficacy "this session" correlation falls back to a time window when the recall event has no `session_id` (MCP tool calls don't thread session_id).
+- Bulk-reject in the Conflicts modal now retries with an actionable message when the server-side state is stale.
+### Upgrade Notes
+**Schema bump v14 → v17.** Three migrations run automatically on first launch after upgrade. All three are additive (no existing data is rewritten):
+1. Migration 015 creates `activity_events` (hook/recall telemetry).
+2. Migration 016 creates `moment_feedback` (dashboard verdicts).
+3. Migration 017 adds `facts.last_recalled_at` (NULL by default; `Sweep::RecallTimestampRefresher` populates it on the next sweep cycle from existing activity_events).
+The migration delta has round-trip spec coverage in `spec/claude_memory/store/migrations/`. Forward-compatibility: 0.10.0 databases cannot be opened by 0.9.x or earlier. Downgrade is destructive — back up `~/.claude/memory.sqlite3` and `.claude/memory.sqlite3` before downgrading.
+**Optional historical cleanups.** Two new admin commands address data tails left by earlier bugs that have since been fixed at the source:
+```bash
+claude-memory dedupe-conflicts --dry-run   # preview duplicate conflict rows
+claude-memory dedupe-conflicts             # consolidate them
+claude-memory reclassify-references --dry-run   # preview reference-material mislabels
+claude-memory reclassify-references             # retag them
+```
+Both are opt-in. Neither runs in the regular sweep cycle. Use `--scope global` to clean the global DB.
+**Telemetry footprint.** The `activity_events` table grows with hook activity. The dashboard surfaces this by default and powers the timeline/moments/efficacy panels. Retention pruning is not yet automatic (planned for a follow-up); manual cleanup via `DELETE FROM activity_events WHERE occurred_at < ?` is safe — the dashboard tolerates missing history.
 ## [0.9.1] - 2026-04-16
 ### Fixed

data/CLAUDE.md CHANGED Viewed

@@ -163,7 +163,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
   - Each command is a separate class (HelpCommand, DoctorCommand, etc.)
   - All commands inherit from BaseCommand
   - Dependency injection for I/O (stdout, stderr, stdin)
-  - 28 commands total, each focused on single responsibility
+  - 32 commands total, each focused on single responsibility
 - **`Configuration`**: Centralized ENV access (`configuration.rb`)
   - Single source of truth for paths and environment variables
@@ -208,6 +208,8 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
 - **`Distill`**: Fact extraction interface (`distill/`)
   - Pluggable distiller design (current: NullDistiller stub)
   - Extracts entities, facts, scope hints from content
+  - `ReferenceMaterialDetector`: classifies "X is a plugin/library/tool" templates, LOC counts, "by Firstname Lastname" attributions as reference material. Runs in `ManagementHandlers#store_extraction` so mislabeling can't persist
+  - SessionStart distillation prompt enforces reason clauses ("because…", "so that…") for `decision` and `convention` predicates — bare conclusions are explicitly disallowed
 - **`Resolve`**: Truth maintenance and conflict resolution (`resolve/`)
   - Determines equivalence, supersession, or conflicts
@@ -226,7 +228,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
   - Modes: shared (repo), local (uncommitted), home (user directory)
 - **`MCP`**: Model Context Protocol server and tools (`mcp/`)
-  - Exposes memory tools to Claude Code (24 tools total)
+  - Exposes memory tools to Claude Code (25 tools total)
   - `Telemetry`: Records tool invocations to `mcp_tool_calls` table for usage stats
   - Dual content/structuredContent responses with compact mode
@@ -234,6 +236,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
   - Reads stdin JSON from Claude Code hooks
   - Routes to ingest/sweep/publish commands
   - `DistillationRunner`: Manages context hook injection with undistilled content for LLM extraction
+  - `AutoMemoryMirror` (0.10.0): On fresh sessions, scans `~/.claude/projects/<slug>/memory/*.md` for new/changed entries and surfaces them as extraction candidates in the SessionStart context. State diffed by md5 in `.claude/auto_memory_mirror.json`; bounded to 5 candidates per session, 1500 chars each.
 ### Database Schema
@@ -246,16 +249,19 @@ Key tables (defined in `sqlite_store.rb`):
 - `fact_links`: Supersession and conflict relationships
 - `conflicts`: Open contradictions
 - `mcp_tool_calls`: MCP server tool invocation telemetry (schema v13)
+- `activity_events`: Hook/recall/context/sweep telemetry (schema v15) — powers the dashboard timeline, moments feed, efficacy reports
+- `moment_feedback`: Per-moment 👍/👎 verdicts with optional notes (schema v16) — unique on event_id, repeat clicks upsert
 Facts include:
 - `scope`: "global" or "project" (determines applicability)
 - `project_path`: Set for project-scoped facts
 - `valid_from`/`valid_to`: Temporal validity window
+- `last_recalled_at` (schema v17): Set by `Sweep::RecallTimestampRefresher` from activity_events; powers `claude-memory stats --stale` and the dashboard's "stale" needs-review count
 ### Scope System
 Facts are scoped to control where they apply:
-- **project**: Current project only (e.g., "this app uses PostgreSQL")
+- **project**: Current project only (e.g., "claude_memory uses SQLite for storage")
 - **global**: All projects (e.g., "I prefer 4-space indentation")
 Distiller detects signals like "always", "in all projects", "my preference" and sets `scope_hint: "global"`. Users can manually promote facts via `claude-memory promote <fact_id>` or the `memory.promote` MCP tool.
@@ -340,14 +346,14 @@ Also update `SECTION_MAP` if the predicate should appear in a specific snapshot
 The gem includes an MCP server (`claude-memory serve-mcp`) that exposes memory operations as tools. Configuration should be in `.mcp.json` at project root.
-Available MCP tools (24 total):
+Available MCP tools (25 total):
 - **Query & Recall**: `memory.recall`, `memory.recall_index`, `memory.recall_details`, `memory.recall_semantic`, `memory.search_concepts`
 - **Provenance**: `memory.explain`, `memory.fact_graph`
 - **Shortcuts**: `memory.decisions`, `memory.conventions`, `memory.architecture`
 - **Context**: `memory.facts_by_tool`, `memory.facts_by_context`
 - **Management**: `memory.promote`, `memory.reject_fact`, `memory.store_extraction`
 - **Distillation**: `memory.undistilled`, `memory.mark_distilled`
-- **Monitoring**: `memory.status`, `memory.stats`, `memory.changes`, `memory.conflicts`
+- **Monitoring**: `memory.status`, `memory.stats`, `memory.changes`, `memory.conflicts`, `memory.activity`
 - **Maintenance**: `memory.sweep_now`
 - **Discovery**: `memory.check_setup`, `memory.list_projects`
@@ -369,6 +375,16 @@ ClaudeMemory integrates with Claude Code via hooks in `.claude/settings.json`:
 Hook commands read JSON payloads from stdin for robustness. Supports `--async` flag for non-blocking execution.
+## Dashboard
+Local web UI for inspecting memory state. Started via `claude-memory dashboard` (default port 3377). Reads from both global and project databases; no write side effects from page loads.
+The dashboard is a thin web layer over the same `Recall`/`Conflicts`/`Trust`/`Moments`/`Knowledge`/`Reuse`/`Health`/`Timeline` classes the MCP server uses. Each panel is backed by a dedicated module under `lib/claude_memory/dashboard/`; `Dashboard::API` holds HTTP-shape glue and per-endpoint formatting (delegating non-trivial logic to the panel classes).
+Connections are released after each request — never holds a WAL writer lock open across page loads.
+See [docs/dashboard.md](docs/dashboard.md) for the user-facing guide (panels, common workflows, related CLI commands).
 ## Code Style
 This project uses [Standard Ruby](https://github.com/standardrb/standard) for linting. Run `bundle exec rake standard:fix` before committing.

data/README.md CHANGED Viewed

@@ -140,6 +140,35 @@ File-searchable questions ("what version is this?") and one-shot code generation
 - **Claude-Powered**: Uses Claude's intelligence to extract facts (no API key needed)
 - **Token Efficient**: 10x reduction in memory queries with progressive disclosure
 - **Database Maintenance**: Compact, export, and backup commands
+- **Built-in Observability** (0.10.0+): `claude-memory dashboard` opens a local web UI with a moments feed, trust panel, conflicts dedup, knowledge index, 👍/👎 feedback, and a 30-day utilization ratio. See **[Dashboard guide →](docs/dashboard.md)**. `claude-memory digest` writes a weekly markdown report; `claude-memory census` audits the predicate vocabulary across projects.
+## What's New in 0.10.0
+Three behavior changes worth knowing about — they affect what you'll see in
+extracted facts and SessionStart context, even if you don't change anything:
+- **Auto-memory mirror** — On fresh sessions, the SessionStart context hook
+  scans `~/.claude/projects/<slug>/memory/*.md` and surfaces new or changed
+  entries as candidates for extraction into ClaudeMemory. You'll see a
+  "Pending Knowledge Extraction" section in Claude's startup context citing
+  files from your auto-memory directory. Claude reviews these and calls
+  `memory.store_extraction` for the high-signal ones; you don't need to
+  copy-paste manually anymore.
+- **Why-clause enforcement** — When Claude distills `decision` and
+  `convention` facts, it's now required to embed a reason ("…because…",
+  "…so that…", "…to avoid…"). A bare conclusion is dead weight; a fact with
+  a reason stays useful when the situation changes. You'll see this
+  reflected in fact text being longer and more justified.
+- **Reference predicate** — Active facts that look like reference material
+  (LOC counts, "X is a plugin/library/tool" templates, "by Firstname
+  Lastname" attributions) are auto-tagged `predicate=reference` instead of
+  `convention`. Keeps the conventions list signal-rich. Browse them in the
+  dashboard's Knowledge → References section, or run
+  `claude-memory reclassify-references --dry-run` to see candidates.
+Plus: **staleness detection** (`claude-memory stats --stale`) lists active
+facts that haven't been recalled in N days, so you can prune dead weight
+explicitly. The dashboard's Trust → Needs review panel surfaces the count.
 ## Privacy Control
@@ -241,7 +270,8 @@ The uninstall command removes:
 - 📖 [Getting Started](docs/GETTING_STARTED.md) - Step-by-step onboarding
 - 💡 [Examples](docs/EXAMPLES.md) - Use cases and workflows
-- 🔧 [Plugin Setup](docs/PLUGIN.md) - Claude Code integration
+- 📊 [Dashboard](docs/dashboard.md) - Local web UI for inspection and trust signals (0.10.0+)
+- 🔧 [Plugin Setup](docs/plugin.md) - Claude Code integration
 - 🏗️ [Architecture](docs/architecture.md) - Technical deep dive
 - 📝 [Changelog](CHANGELOG.md) - Release notes
@@ -292,7 +322,7 @@ The benchmark dataset draws from real CLAUDE.md patterns and is designed specifi
 - **Language:** Ruby 3.2+
 - **Storage:** SQLite3 (no external services)
-- **Testing:** 1477 examples (1375 unit/integration + 102 benchmarks/evals), 100% core coverage
+- **Testing:** 1964 examples (~1700 unit/integration + ~250 benchmarks/evals), 100% core coverage
 - **Code Style:** Standard Ruby
 ```bash

data/db/migrations/015_add_activity_events.rb ADDED Viewed

@@ -0,0 +1,26 @@
+# frozen_string_literal: true
+# Migration v15: Add activity_events table for debugging and observability
+# Tracks hook executions, memory recalls, context injections, and sweep operations.
+# Powers the dashboard timeline and efficacy reports.
+Sequel.migration do
+  up do
+    create_table?(:activity_events) do
+      primary_key :id
+      String :event_type, null: false    # "hook_ingest", "hook_context", "hook_sweep", "recall", "store_extraction"
+      String :session_id                 # Claude session that triggered the event
+      String :status, null: false        # "success", "skipped", "error"
+      Integer :duration_ms               # How long the operation took
+      String :detail_json, text: true    # Event-specific details (JSON)
+      String :occurred_at, null: false   # ISO 8601 timestamp
+    end
+    run "CREATE INDEX IF NOT EXISTS idx_activity_events_type ON activity_events(event_type)"
+    run "CREATE INDEX IF NOT EXISTS idx_activity_events_occurred_at ON activity_events(occurred_at)"
+    run "CREATE INDEX IF NOT EXISTS idx_activity_events_session ON activity_events(session_id)"
+  end
+  down do
+    drop_table?(:activity_events)
+  end
+end

data/db/migrations/016_add_moment_feedback.rb ADDED Viewed

@@ -0,0 +1,22 @@
+# frozen_string_literal: true
+# Migration v16: Per-moment feedback (improvements.md #43).
+# Tracks a single thumbs-up/down verdict (+ optional note) per activity_event
+# so the dashboard can surface a trust-calibration signal. Unique on event_id
+# so a given moment has at most one current verdict; repeat clicks upsert.
+Sequel.migration do
+  up do
+    create_table?(:moment_feedback) do
+      primary_key :id
+      Integer :event_id, null: false
+      String :verdict, null: false  # "up" | "down"
+      String :note, text: true      # optional freeform note
+      String :recorded_at, null: false
+      index :event_id, unique: true
+    end
+  end
+  down do
+    drop_table?(:moment_feedback)
+  end
+end

data/db/migrations/017_add_last_recalled_at.rb ADDED Viewed

@@ -0,0 +1,15 @@
+# frozen_string_literal: true
+# Migration v17: Access-based staleness scoring (improvements.md #35).
+# Records the last time a fact was surfaced via memory.recall or context
+# injection, derived periodically from activity_events. Sweep-derived rather
+# than per-call so we avoid WAL write contention on the recall hot path.
+Sequel.migration do
+  up do
+    add_column :facts, :last_recalled_at, String
+  end
+  down do
+    drop_column :facts, :last_recalled_at
+  end
+end

data/docs/1_0_punchlist.md ADDED Viewed

@@ -0,0 +1,190 @@
+# 1.0 Punchlist
+*Created: 2026-04-28*
+The remaining work for a stable 1.0 release. Distinct from `improvements.md` —
+that file tracks the long tail of inbound study/idea entries; this file tracks
+**what blocks 1.0 confidence**.
+Guiding question: *a skeptical Ruby developer should be able to look at one
+screen and say "yes, this is helping, here's the evidence" without trusting our
+marketing.* Today the dashboard tells that story in pieces but not as a
+headline. Each item below closes a specific gap that prevents that headline
+from existing.
+Items are cross-linked to the canonical entry in `improvements.md` where the
+implementation detail and acceptance criteria live. This file is the
+prioritization view; that file is the work view.
+---
+## Must-have for 1.0
+### 1. Token budget telemetry — *what does memory cost?*
+**Gap.** `Core::TokenEstimator` exists and is unused outside one helper. We
+have no idea what % of the SessionStart token budget memory consumes per
+session, how it scales with DB size, or whether it's growing.
+**Acceptance.** Trust panel + `claude-memory digest` show p50/p95 injected
+tokens per session over the last 30 days. Per-session count rides on every
+`hook_context` activity event so the data is queryable post-hoc.
+**Why must-have.** "Costs you tokens forever" is the strongest critique of any
+context-injection memory system; if we can't answer it numerically, we can't
+defend the trade.
+→ improvements.md entry: *Token Budget Telemetry*
+### 2. Hallucination rate as a first-class trust metric
+**Gap.** `ReferenceMaterialDetector` already classifies suspect facts and we
+know from the #34 audit that ~25% of facts had embedded reasoning (i.e.
+~75% were bare conclusions at audit time). Neither signal is exposed on the
+dashboard. We display clean numbers; we should display stained ones.
+**Acceptance.** Trust panel surfaces a `quality_score` derived from
+suspect-fact ratio + bare-conclusion ratio over active facts in both stores.
+Digest includes a 30-day rejection rate ("how much of what we extracted got
+rejected within a week?") so calibration drift is visible.
+**Why must-have.** We can't claim "memory is helping" if we can't show "memory
+isn't poisoning the well."
+→ improvements.md entry: *Hallucination Rate Metric*
+### 3. Negative-fact harm benchmark
+**Gap.** Every benchmark we run today measures whether memory **helps**.
+Nothing measures whether memory **harms** — i.e. injects a wrong fact and
+Claude follows it. Without this, "memory helps" is unfalsifiable.
+**Acceptance.** New `spec/benchmarks/dataset/harm_scenarios.yml` with 10–15
+cases where memory holds a stale or wrong fact. Each case scores `harm` if
+Claude's response follows the wrong fact, `safe` otherwise. Wired into
+`bin/run-evals`. >1% harm rate blocks release.
+**Why must-have.** A retrieval system that occasionally makes Claude *wrong*
+is strictly worse than no memory; we need a release gate that proves we're
+not in that regime.
+→ improvements.md entry: *Negative-Fact Harm Benchmark*
+### 4. Publish the CLAUDE.md baseline in headline E2E results
+**Gap.** `claude_md_adapter` exists in `spec/benchmarks/comparative/adapters/`
+and supports E2E. The adapter is wired into `comparative_helper.rb` but the
+README's headline comparative table doesn't include it. The single most
+important question for adoption — *"is this better than a hand-written
+CLAUDE.md?"* — is currently unanswered in our published numbers.
+**Acceptance.** Comparative E2E report includes `CLAUDE.md baseline` row in
+`spec/benchmarks/README.md` and in `bin/run-evals --comparative` summary
+output. README explicitly states the win/loss versus the static baseline.
+**Why must-have.** Cheapest item on the list — adapter already built, just
+surface the number. If we can't beat a static CLAUDE.md on developer
+scenarios, that's the loudest possible signal that the rest of the system
+needs work; if we can, that's the headline 1.0 brag.
+→ improvements.md entry: *CLAUDE.md Baseline in Headline Results*
+### 5. `claude-memory show` — human-readable "what would be injected"
+**Gap.** Inspecting memory state today requires the dashboard or several CLI
+commands (`recall`, `stats`, `census`). The CLAUDE.md alternative is
+`cat CLAUDE.md` — instant, plain-English, no tool. We need the same one-line
+inspect surface.
+**Acceptance.** `claude-memory show` runs the same `Hook::ContextInjector`
+path real sessions use, prints what would be injected next session in plain
+English (not JSON), sized to fit a terminal, with predicate-grouped sections
+matching the snapshot format.
+**Why must-have.** Trust requires inspectability. A user who can't see what
+memory will inject can't develop confidence in it.
+→ improvements.md entry: *claude-memory show*
+### 6. Release-to-release benchmark scoreboard
+**Gap.** Benchmark output is textual today. Nothing diff-able across versions.
+Regressions land silently — the only reason we caught the FTS5/RRF
+normalization bug was a manual run.
+**Acceptance.** Each `bin/run-evals` run writes
+`spec/benchmarks/results/<version>.json`. New `bin/bench-diff` (or rake task)
+compares against the last tagged version's JSON and reports deltas. Release
+script (`/release` skill) reads it and refuses to ship on regressions over a
+configurable threshold.
+**Why must-have.** Without longitudinal tracking, every benchmark we run is a
+snapshot. 1.0 is the moment we commit to *not regressing* what we ship.
+→ improvements.md entry: *Benchmark Scoreboard Diff*
+---
+## Strong post-1.0
+These shouldn't block 1.0 but should land in the next release window.
+### 7. First-week ROI nudge
+SessionEnd hook prints `memory contributed N facts this session, %used = X`
+inline for the first ~10 sessions. Closes the cold-start gap where new users
+don't see value because they don't think to look.
+→ improvements.md entry: *First-Week ROI Nudge*
+### 8. Real-session repeat-correction detector
+The repeat-correction benchmark (#32) is synthetic; production has no
+equivalent signal. Analyze `activity_events` to detect "this fact was injected
+last session, the user re-stated it this session" — that's where memory is
+silently failing.
+→ improvements.md entry: *Real-Session Repeat-Correction Detection*
+### 9. Token-cost growth tracking
+Builds on #1. Weekly digest reports "context cost grew X% over 30d" as an
+anomaly signal that the DB is bloating or context injection is going wide.
+→ improvements.md entry: *Token-Cost Growth Tracking*
+### 10. Drift dashboard
+Snapshot `census` weekly, surface predicate distribution shifts on the
+dashboard. Answers "is my fact base going off?" without a manual audit.
+→ improvements.md entry: *Drift Dashboard*
+---
+## Defer / skip for 1.0
+- **#44 Universal search box** — cosmetic given the gaps above. Knowledge tab
+  drawers cover the primary need.
+- **#45 Live SSE/WebSocket feed** — polling is adequate; dashboard polish, not
+  a confidence gap.
+---
+## Sequencing recommendation
+Smallest set that materially shifts 1.0 confidence (~2 days):
+1. **Token budget telemetry** (#1) — closes the loudest critique.
+2. **CLAUDE.md baseline publish** (#4) — adapter already built, one report change.
+3. **Hallucination rate** (#2) — reuses ReferenceMaterialDetector.
+Then in roughly priority order: `claude-memory show` (#5), harm benchmark
+(#3), scoreboard (#6). Post-1.0 items follow naturally once the must-haves
+land.
+---
+*Last updated: 2026-04-28 — initial punchlist drawn from session-end critique
+of observability/outcome gaps. Each entry will be elaborated with concrete
+file:line refs in improvements.md as it's worked.*

data/docs/EXAMPLES.md CHANGED Viewed

@@ -428,9 +428,48 @@ Claude: "You're using Context API for state management. You previously used Redu
 ---
+## Inspecting What Memory Knows (0.10.0+)
+When you want to see what's actually in memory — what's been extracted, which
+facts Claude has been reaching for, what's stale, what's contradicting — open
+the dashboard:
+```bash
+claude-memory dashboard
+```
+Default port `http://localhost:3377`. Surfaces:
+- A **moments feed** — every recall, context injection, extraction event with
+  the facts they touched. Click any moment for the full payload.
+- A **Trust sidebar** — week-over-week activity, your global "fingerprint",
+  utilization ratio (% of recently extracted facts Claude actually used), and
+  your 👍/👎 feedback ratio.
+- **Conflicts** with display-layer dedup so you don't have to triage 11 rows
+  of the same contradiction one at a time.
+- **Knowledge** — facts grouped by predicate, with a separate References
+  section for auto-detected reference material.
+For a markdown summary you can email or commit:
+```bash
+claude-memory digest --since 7
+```
+For a privacy-safe cross-project audit:
+```bash
+claude-memory census
+```
+See **[Dashboard guide →](dashboard.md)** for the full panel reference.
+---
 ## Next Steps
-- 📖 [Read the Getting Started Guide](GETTING_STARTED.md) *(coming soon)*
-- 🔧 [Set up the Claude Code Plugin](PLUGIN.md)
+- 📖 [Read the Getting Started Guide](GETTING_STARTED.md)
+- 📊 [Inspect with the Dashboard](dashboard.md)
+- 🔧 [Set up the Claude Code Plugin](plugin.md)
 - 🏗️ [Understand the Architecture](architecture.md)
 - 📝 [Check the Changelog](../CHANGELOG.md)