claude_memory 0.9.1 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. checksums.yaml +4 -4
  2. data/.claude/memory.sqlite3 +0 -0
  3. data/.claude/skills/dashboard/SKILL.md +42 -0
  4. data/.claude-plugin/marketplace.json +1 -1
  5. data/.claude-plugin/plugin.json +1 -1
  6. data/CHANGELOG.md +86 -0
  7. data/CLAUDE.md +21 -5
  8. data/README.md +32 -2
  9. data/db/migrations/015_add_activity_events.rb +26 -0
  10. data/db/migrations/016_add_moment_feedback.rb +22 -0
  11. data/db/migrations/017_add_last_recalled_at.rb +15 -0
  12. data/docs/1_0_punchlist.md +190 -0
  13. data/docs/EXAMPLES.md +41 -2
  14. data/docs/GETTING_STARTED.md +31 -4
  15. data/docs/architecture.md +22 -7
  16. data/docs/audit-queries.md +131 -0
  17. data/docs/dashboard.md +172 -0
  18. data/docs/improvements.md +465 -9
  19. data/docs/influence/cq.md +187 -0
  20. data/docs/plugin.md +13 -6
  21. data/docs/quality_review.md +489 -172
  22. data/docs/reflection_memory_as_accumulating_judgment.md +67 -0
  23. data/lib/claude_memory/activity_log.rb +86 -0
  24. data/lib/claude_memory/commands/census_command.rb +210 -0
  25. data/lib/claude_memory/commands/completion_command.rb +3 -0
  26. data/lib/claude_memory/commands/dashboard_command.rb +54 -0
  27. data/lib/claude_memory/commands/dedupe_conflicts_command.rb +55 -0
  28. data/lib/claude_memory/commands/digest_command.rb +181 -0
  29. data/lib/claude_memory/commands/hook_command.rb +34 -0
  30. data/lib/claude_memory/commands/reclassify_references_command.rb +56 -0
  31. data/lib/claude_memory/commands/registry.rb +6 -1
  32. data/lib/claude_memory/commands/skills/distill-transcripts.md +13 -1
  33. data/lib/claude_memory/commands/stats_command.rb +38 -1
  34. data/lib/claude_memory/commands/sweep_command.rb +2 -0
  35. data/lib/claude_memory/configuration.rb +16 -0
  36. data/lib/claude_memory/core/relative_time.rb +9 -0
  37. data/lib/claude_memory/dashboard/api.rb +610 -0
  38. data/lib/claude_memory/dashboard/conflicts.rb +279 -0
  39. data/lib/claude_memory/dashboard/efficacy.rb +127 -0
  40. data/lib/claude_memory/dashboard/fact_presenter.rb +109 -0
  41. data/lib/claude_memory/dashboard/health.rb +175 -0
  42. data/lib/claude_memory/dashboard/index.html +2707 -0
  43. data/lib/claude_memory/dashboard/knowledge.rb +136 -0
  44. data/lib/claude_memory/dashboard/moments.rb +244 -0
  45. data/lib/claude_memory/dashboard/reuse.rb +97 -0
  46. data/lib/claude_memory/dashboard/scoped_fact_resolver.rb +95 -0
  47. data/lib/claude_memory/dashboard/server.rb +211 -0
  48. data/lib/claude_memory/dashboard/timeline.rb +68 -0
  49. data/lib/claude_memory/dashboard/trust.rb +285 -0
  50. data/lib/claude_memory/distill/reference_material_detector.rb +78 -0
  51. data/lib/claude_memory/hook/auto_memory_mirror.rb +112 -0
  52. data/lib/claude_memory/hook/context_injector.rb +97 -3
  53. data/lib/claude_memory/hook/handler.rb +50 -3
  54. data/lib/claude_memory/mcp/handlers/management_handlers.rb +8 -0
  55. data/lib/claude_memory/mcp/query_guide.rb +11 -0
  56. data/lib/claude_memory/mcp/text_summary.rb +29 -0
  57. data/lib/claude_memory/mcp/tool_definitions.rb +13 -0
  58. data/lib/claude_memory/mcp/tools.rb +148 -0
  59. data/lib/claude_memory/publish.rb +13 -21
  60. data/lib/claude_memory/recall/stale_detector.rb +67 -0
  61. data/lib/claude_memory/resolve/predicate_policy.rb +2 -0
  62. data/lib/claude_memory/resolve/resolver.rb +41 -11
  63. data/lib/claude_memory/store/llm_cache.rb +68 -0
  64. data/lib/claude_memory/store/metrics_aggregator.rb +96 -0
  65. data/lib/claude_memory/store/schema_manager.rb +1 -1
  66. data/lib/claude_memory/store/sqlite_store.rb +47 -143
  67. data/lib/claude_memory/store/store_manager.rb +29 -0
  68. data/lib/claude_memory/sweep/maintenance.rb +216 -0
  69. data/lib/claude_memory/sweep/recall_timestamp_refresher.rb +83 -0
  70. data/lib/claude_memory/sweep/sweeper.rb +2 -0
  71. data/lib/claude_memory/version.rb +1 -1
  72. data/lib/claude_memory.rb +22 -0
  73. metadata +49 -1
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b6df0a3f58a88c1bbec82ec20e26789d51ad2712408d058337a196c5eac90654
4
- data.tar.gz: beb9c2ef59ef6a45430eeb03466e37f6b1f741ef1745b5303a1443b02a7c84b4
3
+ metadata.gz: a299c6ab2aeb95123dcb61f5c87a06b93d15a00a2ed9ff2c8343e7fde6b369cb
4
+ data.tar.gz: d09c02a2f5dcd4bd0dfcb625793505bd2218c7df04230411e813a7543e7e7382
5
5
  SHA512:
6
- metadata.gz: '06905bca1f77df5642caf0846cde7394ba9a1baf3c954138383ac39927fcaae2ef097ff79dd3c866e6930fa0eac0d0fb958366bded54a0616d8e356a316e616c'
7
- data.tar.gz: 9a8e3c455c20ae616bc239b766e1d4e2aa4c6e5448f494294d9c6a646a8a613428e9b63218624c5cae7e30f389704dd3bee6b788e97a369cb719b115abffddd7
6
+ metadata.gz: 87fd7dab40cb2e5b190de071f99bcc1394e98e5f426951eedaff09b190fa66591b40f49580bca45f75819170ba939a0d3d9239f4825825b431fd4a83d388bb7d
7
+ data.tar.gz: ffb4ab50ba94a8f3c7bfb8129f01ea96fd27b981dd614f0addd7d65a9fc2b4b8562b9d23148bb5ea4ee90b5ae5a9fc183d1c82e68d3b009557967a00b96bfec1
Binary file
@@ -0,0 +1,42 @@
1
+ ---
2
+ name: dashboard
3
+ description: Launch a local web dashboard for ClaudeMemory debugging and observability
4
+ ---
5
+
6
+ # Dashboard
7
+
8
+ Launch the ClaudeMemory debugging dashboard to visualize memory system health, activity, and efficacy.
9
+
10
+ ## Task
11
+
12
+ Start the dashboard web server so the user can inspect what's happening behind the scenes.
13
+
14
+ ## Steps
15
+
16
+ 1. Run the dashboard command:
17
+
18
+ ```bash
19
+ claude-memory dashboard
20
+ ```
21
+
22
+ This starts a local web server (default port 3377) and opens it in the browser.
23
+
24
+ ## What the Dashboard Shows
25
+
26
+ - **Health Status**: Database health, hook configuration, vector index status
27
+ - **Overview**: Fact/entity/content counts, top predicates, entity type distribution, 30-day activity timeline
28
+ - **Activity**: Live event log of hook executions (ingest, context, sweep), memory recalls, and store extractions with timing and details
29
+ - **Facts**: Searchable fact explorer with status filtering, predicate/object search
30
+ - **Efficacy**: Recall hit rate, total results served, average results per query, top queries by result count
31
+
32
+ ## Options
33
+
34
+ - `--port PORT` - Use a different port (default: 3377)
35
+ - `--no-open` - Don't auto-open the browser
36
+
37
+ ## Notes
38
+
39
+ - Dashboard auto-refreshes every 30 seconds
40
+ - Activity events are recorded by hooks and MCP tools into the `activity_events` table
41
+ - The dashboard reads from both global and project databases
42
+ - Press Ctrl+C to stop the server
@@ -7,7 +7,7 @@
7
7
  "plugins": [
8
8
  {
9
9
  "name": "claude-memory",
10
- "version": "0.9.1",
10
+ "version": "0.10.0",
11
11
  "source": "./",
12
12
  "description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
13
13
  "repository": "https://github.com/codenamev/claude_memory"
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "claude-memory",
3
- "version": "0.9.1",
3
+ "version": "0.10.0",
4
4
  "description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
5
5
  "author": {
6
6
  "name": "Valentino Stoll",
data/CHANGELOG.md CHANGED
@@ -4,6 +4,92 @@ All notable changes to this project will be documented in this file.
4
4
 
5
5
  ## [Unreleased]
6
6
 
7
+ ## [0.10.0] - 2026-04-28
8
+
9
+ ### Added
10
+
11
+ **Dashboard — feed-first redesign with observability built in**
12
+
13
+ - New feed-first dashboard UI with scope-aware moments, fact detail modal, query tester, and activity drilldown. Reuse, Trust, Knowledge, Conflicts, and Moments panels each backed by a dedicated module (`Dashboard::{Reuse, Trust, Knowledge, Conflicts, Moments}`) under unit tests, replacing the prior all-in-API-class layout.
14
+ - 👍/👎 feedback on individual moments with persisted verdicts (schema v16, `moment_feedback` table). Trust panel surfaces a 30-day up/down ratio so the dashboard can answer "when memory surfaces something, are users marking it useful?".
15
+ - Utilization ratio panel — of facts extracted in the last 30 days, how many has Claude actually used in a recall or context injection? Color-coded (green ≥40%, yellow ≥15%, red below). Hidden on fresh installs to avoid misleading zeros.
16
+ - Conflict deduping at the display layer: identical (subject, predicate, object_pair) detections collapse into one row with a `×N` badge. Sidebar "Needs review" count now reflects distinct contradictions, not raw row count.
17
+ - Activity events drilldown: each moment opens a payload modal with prettified JSONL, recall trigger correlation (which user prompt motivated this lookup), and linked-fact resolution scoped per database.
18
+ - Vector index health threshold and clickable remediation hints in the health dashboard.
19
+
20
+ **CLI — observability surfaces and one-shot cleanups**
21
+
22
+ - `claude-memory digest [--since DAYS] [--output FILE]` — weekly markdown report. Sections: Activity, New knowledge by predicate, Utilization (extracted vs used), Conflicts, Feedback. No new schema; renders from existing aggregates.
23
+ - `claude-memory census [--root DIR]` — privacy-safe cross-project vocabulary scan. Aggregates per-DB predicate × status counts, novel predicates, synonym candidates. Suppresses object literals, entity names, and paths; per-DB IDs are SHA256-prefixed.
24
+ - `claude-memory dedupe-conflicts [--scope SCOPE] [--dry-run]` — one-shot cleanup for historical conflict-row duplication that predates the Resolver dedup fix (commit f571ba4). Groups by (subject, predicate, normalized object pair), keeps the earliest, migrates provenance to the keeper.
25
+ - `claude-memory reclassify-references [--scope SCOPE] [--dry-run]` — retags active convention facts that the new `Distill::ReferenceMaterialDetector` flags as reference material (LOC counts, star counts, "X is a plugin..." templates, "by Firstname Lastname" attributions).
26
+
27
+ **Memory quality**
28
+
29
+ - Access-based staleness scoring (improvements.md #35). Schema v17 adds `last_recalled_at` to facts. `Sweep::RecallTimestampRefresher` derives the field periodically from activity_events; `claude-memory stats --stale [--stale-days N]` lists facts that haven't been recalled inside the threshold. Replaces the prior "active facts minus seen-in-recalls" approximation.
30
+ - Auto-memory mirror (improvements.md #36). On fresh sessions, the SessionStart context hook scans `~/.claude/projects/<slug>/memory/*.md` and surfaces new or changed entries as extraction candidates so users can promote auto-memory observations into claude_memory without manual copy-paste.
31
+ - Reasoning requirement enforced in distillation (improvements.md #34). The SessionStart prompt and the `/distill-transcripts` skill now require a why clause for `decision` and `convention` predicates ("because…", "so that…", etc.). Audit found ~75% of facts were bare conclusions before this change.
32
+ - `Distill::ReferenceMaterialDetector` reclassifies convention facts whose object text matches reference patterns. New `reference` predicate registered in `PredicatePolicy` with its own `:references` snapshot section. Detector runs at write time in `ManagementHandlers#store_extraction` so mislabeling can't persist.
33
+ - Predicate census command (#30) for cross-project vocabulary audits — see CLI section above.
34
+
35
+ **Benchmarks and observability**
36
+
37
+ - Repeat-correction benchmark harness (improvements.md #32). `spec/benchmarks/e2e/repeat_correction_spec.rb` pre-loads a past correction as a memory fact, runs the prompt through real Claude under `EVAL_MODE=real`, and reports pass rate (no violation patterns matched). Starter set of 2 scenarios drawn from this project's recurring gotchas.
38
+ - Relevance ratio metric (improvements.md #31). `Hook::ContextInjector#emitted_subjects` exposes the subjects injected at SessionStart; `BenchmarkHelpers::RelevanceMetrics` measures whether they appear in Claude's response. Trend signal for memory-application quality, integrated into `devmemeval_spec.rb`.
39
+ - MCP server embeds the V=R/C ("Verify before Recommend / Correct") mental model in agent instructions so memory recommendations come with built-in verification cues.
40
+
41
+ **Schema v15 → v17 (additive only, automatic on first run)**
42
+
43
+ - Migration 015: adds `activity_events` table for hook/recall/context/sweep telemetry. Powers the dashboard timeline, moments feed, and efficacy reports.
44
+ - Migration 016: adds `moment_feedback` table (unique on event_id) for the dashboard 👍/👎 surface.
45
+ - Migration 017: adds nullable `facts.last_recalled_at` for access-based staleness scoring.
46
+
47
+ **1.0 readiness track**
48
+
49
+ - New `docs/1_0_punchlist.md` opens the path to 1.0: token-budget telemetry, hallucination-rate metric, negative-fact harm benchmark, CLAUDE.md baseline publication, `claude-memory show`, benchmark scoreboard. Ten entries (#47-56) added to `docs/improvements.md` with concrete file:line plumbing notes.
50
+
51
+ ### Changed
52
+
53
+ - `Resolver#apply_conflict` no longer creates a duplicate disputed fact + conflict row when the same contradicting value is re-extracted. Looks up disputed facts in the same (subject, predicate) slot and reinforces with provenance instead.
54
+ - `Resolver` no longer treats the distiller's `scope_hint` as a scope override. `scope_hint` is advisory metadata; `fact.scope` must match the DB the row lives in. Earlier behavior caused scope leakage where global-hinted distillations landed in the project DB.
55
+ - `Hook::ContextInjector` adds `emitted_fact_ids` and `emitted_subjects` accessors so benchmark harnesses can attribute injection contributions per session.
56
+ - `SQLiteStore` decomposed via module inclusion: `LLMCache` and `MetricsAggregator` extracted into `lib/claude_memory/store/`. SQLiteStore back under 600 LOC.
57
+ - `Dashboard::API` decomposed: `FactPresenter`, `Conflicts`, `Efficacy::Reporter`, `Timeline`, `Health` extracted into dedicated classes following the boundary pattern. API now routes/delegates rather than aggregating.
58
+ - Dashboard releases DB connections after each HTTP request (was holding connections open for the lifetime of the WEBrick session).
59
+ - `Sweep::Maintenance` gains `dedupe_open_conflicts` and `reclassify_references` for the one-shot CLI commands above.
60
+ - Round-trip migration specs from v12, v13, v14 → v17 (per-version migrations covered by `spec/claude_memory/store/migrations/`). Codifies the release-blocker convention: any schema bump must round-trip from each prior major-release boundary back ~3 releases.
61
+
62
+ ### Fixed
63
+
64
+ - Dashboard surfaces an actionable hint when Recall hits FTS5 corruption (run `claude-memory compact` rather than a generic error).
65
+ - Dashboard query tester unwraps the nested Recall result shape rather than printing the raw envelope.
66
+ - Dashboard health checks correctly detect the claude-memory hook installation across the two-level Claude Code hooks structure (was reporting false negatives when hooks were installed under a matcher block).
67
+ - Dashboard Efficacy "this session" correlation falls back to a time window when the recall event has no `session_id` (MCP tool calls don't thread session_id).
68
+ - Bulk-reject in the Conflicts modal now retries with an actionable message when the server-side state is stale.
69
+
70
+ ### Upgrade Notes
71
+
72
+ **Schema bump v14 → v17.** Three migrations run automatically on first launch after upgrade. All three are additive (no existing data is rewritten):
73
+
74
+ 1. Migration 015 creates `activity_events` (hook/recall telemetry).
75
+ 2. Migration 016 creates `moment_feedback` (dashboard verdicts).
76
+ 3. Migration 017 adds `facts.last_recalled_at` (NULL by default; `Sweep::RecallTimestampRefresher` populates it on the next sweep cycle from existing activity_events).
77
+
78
+ The migration delta has round-trip spec coverage in `spec/claude_memory/store/migrations/`. Forward-compatibility: 0.10.0 databases cannot be opened by 0.9.x or earlier. Downgrade is destructive — back up `~/.claude/memory.sqlite3` and `.claude/memory.sqlite3` before downgrading.
79
+
80
+ **Optional historical cleanups.** Two new admin commands address data tails left by earlier bugs that have since been fixed at the source:
81
+
82
+ ```bash
83
+ claude-memory dedupe-conflicts --dry-run # preview duplicate conflict rows
84
+ claude-memory dedupe-conflicts # consolidate them
85
+ claude-memory reclassify-references --dry-run # preview reference-material mislabels
86
+ claude-memory reclassify-references # retag them
87
+ ```
88
+
89
+ Both are opt-in. Neither runs in the regular sweep cycle. Use `--scope global` to clean the global DB.
90
+
91
+ **Telemetry footprint.** The `activity_events` table grows with hook activity. The dashboard surfaces this by default and powers the timeline/moments/efficacy panels. Retention pruning is not yet automatic (planned for a follow-up); manual cleanup via `DELETE FROM activity_events WHERE occurred_at < ?` is safe — the dashboard tolerates missing history.
92
+
7
93
  ## [0.9.1] - 2026-04-16
8
94
 
9
95
  ### Fixed
data/CLAUDE.md CHANGED
@@ -163,7 +163,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
163
163
  - Each command is a separate class (HelpCommand, DoctorCommand, etc.)
164
164
  - All commands inherit from BaseCommand
165
165
  - Dependency injection for I/O (stdout, stderr, stdin)
166
- - 28 commands total, each focused on single responsibility
166
+ - 32 commands total, each focused on single responsibility
167
167
 
168
168
  - **`Configuration`**: Centralized ENV access (`configuration.rb`)
169
169
  - Single source of truth for paths and environment variables
@@ -208,6 +208,8 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
208
208
  - **`Distill`**: Fact extraction interface (`distill/`)
209
209
  - Pluggable distiller design (current: NullDistiller stub)
210
210
  - Extracts entities, facts, scope hints from content
211
+ - `ReferenceMaterialDetector`: classifies "X is a plugin/library/tool" templates, LOC counts, "by Firstname Lastname" attributions as reference material. Runs in `ManagementHandlers#store_extraction` so mislabeling can't persist
212
+ - SessionStart distillation prompt enforces reason clauses ("because…", "so that…") for `decision` and `convention` predicates — bare conclusions are explicitly disallowed
211
213
 
212
214
  - **`Resolve`**: Truth maintenance and conflict resolution (`resolve/`)
213
215
  - Determines equivalence, supersession, or conflicts
@@ -226,7 +228,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
226
228
  - Modes: shared (repo), local (uncommitted), home (user directory)
227
229
 
228
230
  - **`MCP`**: Model Context Protocol server and tools (`mcp/`)
229
- - Exposes memory tools to Claude Code (24 tools total)
231
+ - Exposes memory tools to Claude Code (25 tools total)
230
232
  - `Telemetry`: Records tool invocations to `mcp_tool_calls` table for usage stats
231
233
  - Dual content/structuredContent responses with compact mode
232
234
 
@@ -234,6 +236,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
234
236
  - Reads stdin JSON from Claude Code hooks
235
237
  - Routes to ingest/sweep/publish commands
236
238
  - `DistillationRunner`: Manages context hook injection with undistilled content for LLM extraction
239
+ - `AutoMemoryMirror` (0.10.0): On fresh sessions, scans `~/.claude/projects/<slug>/memory/*.md` for new/changed entries and surfaces them as extraction candidates in the SessionStart context. State diffed by md5 in `.claude/auto_memory_mirror.json`; bounded to 5 candidates per session, 1500 chars each.
237
240
 
238
241
  ### Database Schema
239
242
 
@@ -246,16 +249,19 @@ Key tables (defined in `sqlite_store.rb`):
246
249
  - `fact_links`: Supersession and conflict relationships
247
250
  - `conflicts`: Open contradictions
248
251
  - `mcp_tool_calls`: MCP server tool invocation telemetry (schema v13)
252
+ - `activity_events`: Hook/recall/context/sweep telemetry (schema v15) — powers the dashboard timeline, moments feed, efficacy reports
253
+ - `moment_feedback`: Per-moment 👍/👎 verdicts with optional notes (schema v16) — unique on event_id, repeat clicks upsert
249
254
 
250
255
  Facts include:
251
256
  - `scope`: "global" or "project" (determines applicability)
252
257
  - `project_path`: Set for project-scoped facts
253
258
  - `valid_from`/`valid_to`: Temporal validity window
259
+ - `last_recalled_at` (schema v17): Set by `Sweep::RecallTimestampRefresher` from activity_events; powers `claude-memory stats --stale` and the dashboard's "stale" needs-review count
254
260
 
255
261
  ### Scope System
256
262
 
257
263
  Facts are scoped to control where they apply:
258
- - **project**: Current project only (e.g., "this app uses PostgreSQL")
264
+ - **project**: Current project only (e.g., "claude_memory uses SQLite for storage")
259
265
  - **global**: All projects (e.g., "I prefer 4-space indentation")
260
266
 
261
267
  Distiller detects signals like "always", "in all projects", "my preference" and sets `scope_hint: "global"`. Users can manually promote facts via `claude-memory promote <fact_id>` or the `memory.promote` MCP tool.
@@ -340,14 +346,14 @@ Also update `SECTION_MAP` if the predicate should appear in a specific snapshot
340
346
 
341
347
  The gem includes an MCP server (`claude-memory serve-mcp`) that exposes memory operations as tools. Configuration should be in `.mcp.json` at project root.
342
348
 
343
- Available MCP tools (24 total):
349
+ Available MCP tools (25 total):
344
350
  - **Query & Recall**: `memory.recall`, `memory.recall_index`, `memory.recall_details`, `memory.recall_semantic`, `memory.search_concepts`
345
351
  - **Provenance**: `memory.explain`, `memory.fact_graph`
346
352
  - **Shortcuts**: `memory.decisions`, `memory.conventions`, `memory.architecture`
347
353
  - **Context**: `memory.facts_by_tool`, `memory.facts_by_context`
348
354
  - **Management**: `memory.promote`, `memory.reject_fact`, `memory.store_extraction`
349
355
  - **Distillation**: `memory.undistilled`, `memory.mark_distilled`
350
- - **Monitoring**: `memory.status`, `memory.stats`, `memory.changes`, `memory.conflicts`
356
+ - **Monitoring**: `memory.status`, `memory.stats`, `memory.changes`, `memory.conflicts`, `memory.activity`
351
357
  - **Maintenance**: `memory.sweep_now`
352
358
  - **Discovery**: `memory.check_setup`, `memory.list_projects`
353
359
 
@@ -369,6 +375,16 @@ ClaudeMemory integrates with Claude Code via hooks in `.claude/settings.json`:
369
375
 
370
376
  Hook commands read JSON payloads from stdin for robustness. Supports `--async` flag for non-blocking execution.
371
377
 
378
+ ## Dashboard
379
+
380
+ Local web UI for inspecting memory state. Started via `claude-memory dashboard` (default port 3377). Reads from both global and project databases; no write side effects from page loads.
381
+
382
+ The dashboard is a thin web layer over the same `Recall`/`Conflicts`/`Trust`/`Moments`/`Knowledge`/`Reuse`/`Health`/`Timeline` classes the MCP server uses. Each panel is backed by a dedicated module under `lib/claude_memory/dashboard/`; `Dashboard::API` holds HTTP-shape glue and per-endpoint formatting (delegating non-trivial logic to the panel classes).
383
+
384
+ Connections are released after each request — never holds a WAL writer lock open across page loads.
385
+
386
+ See [docs/dashboard.md](docs/dashboard.md) for the user-facing guide (panels, common workflows, related CLI commands).
387
+
372
388
  ## Code Style
373
389
 
374
390
  This project uses [Standard Ruby](https://github.com/standardrb/standard) for linting. Run `bundle exec rake standard:fix` before committing.
data/README.md CHANGED
@@ -140,6 +140,35 @@ File-searchable questions ("what version is this?") and one-shot code generation
140
140
  - **Claude-Powered**: Uses Claude's intelligence to extract facts (no API key needed)
141
141
  - **Token Efficient**: 10x reduction in memory queries with progressive disclosure
142
142
  - **Database Maintenance**: Compact, export, and backup commands
143
+ - **Built-in Observability** (0.10.0+): `claude-memory dashboard` opens a local web UI with a moments feed, trust panel, conflicts dedup, knowledge index, 👍/👎 feedback, and a 30-day utilization ratio. See **[Dashboard guide →](docs/dashboard.md)**. `claude-memory digest` writes a weekly markdown report; `claude-memory census` audits the predicate vocabulary across projects.
144
+
145
+ ## What's New in 0.10.0
146
+
147
+ Three behavior changes worth knowing about — they affect what you'll see in
148
+ extracted facts and SessionStart context, even if you don't change anything:
149
+
150
+ - **Auto-memory mirror** — On fresh sessions, the SessionStart context hook
151
+ scans `~/.claude/projects/<slug>/memory/*.md` and surfaces new or changed
152
+ entries as candidates for extraction into ClaudeMemory. You'll see a
153
+ "Pending Knowledge Extraction" section in Claude's startup context citing
154
+ files from your auto-memory directory. Claude reviews these and calls
155
+ `memory.store_extraction` for the high-signal ones; you don't need to
156
+ copy-paste manually anymore.
157
+ - **Why-clause enforcement** — When Claude distills `decision` and
158
+ `convention` facts, it's now required to embed a reason ("…because…",
159
+ "…so that…", "…to avoid…"). A bare conclusion is dead weight; a fact with
160
+ a reason stays useful when the situation changes. You'll see this
161
+ reflected in fact text being longer and more justified.
162
+ - **Reference predicate** — Active facts that look like reference material
163
+ (LOC counts, "X is a plugin/library/tool" templates, "by Firstname
164
+ Lastname" attributions) are auto-tagged `predicate=reference` instead of
165
+ `convention`. Keeps the conventions list signal-rich. Browse them in the
166
+ dashboard's Knowledge → References section, or run
167
+ `claude-memory reclassify-references --dry-run` to see candidates.
168
+
169
+ Plus: **staleness detection** (`claude-memory stats --stale`) lists active
170
+ facts that haven't been recalled in N days, so you can prune dead weight
171
+ explicitly. The dashboard's Trust → Needs review panel surfaces the count.
143
172
 
144
173
  ## Privacy Control
145
174
 
@@ -241,7 +270,8 @@ The uninstall command removes:
241
270
 
242
271
  - 📖 [Getting Started](docs/GETTING_STARTED.md) - Step-by-step onboarding
243
272
  - 💡 [Examples](docs/EXAMPLES.md) - Use cases and workflows
244
- - 🔧 [Plugin Setup](docs/PLUGIN.md) - Claude Code integration
273
+ - 📊 [Dashboard](docs/dashboard.md) - Local web UI for inspection and trust signals (0.10.0+)
274
+ - 🔧 [Plugin Setup](docs/plugin.md) - Claude Code integration
245
275
  - 🏗️ [Architecture](docs/architecture.md) - Technical deep dive
246
276
  - 📝 [Changelog](CHANGELOG.md) - Release notes
247
277
 
@@ -292,7 +322,7 @@ The benchmark dataset draws from real CLAUDE.md patterns and is designed specifi
292
322
 
293
323
  - **Language:** Ruby 3.2+
294
324
  - **Storage:** SQLite3 (no external services)
295
- - **Testing:** 1477 examples (1375 unit/integration + 102 benchmarks/evals), 100% core coverage
325
+ - **Testing:** 1964 examples (~1700 unit/integration + ~250 benchmarks/evals), 100% core coverage
296
326
  - **Code Style:** Standard Ruby
297
327
 
298
328
  ```bash
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Migration v15: Add activity_events table for debugging and observability
4
+ # Tracks hook executions, memory recalls, context injections, and sweep operations.
5
+ # Powers the dashboard timeline and efficacy reports.
6
+ Sequel.migration do
7
+ up do
8
+ create_table?(:activity_events) do
9
+ primary_key :id
10
+ String :event_type, null: false # "hook_ingest", "hook_context", "hook_sweep", "recall", "store_extraction"
11
+ String :session_id # Claude session that triggered the event
12
+ String :status, null: false # "success", "skipped", "error"
13
+ Integer :duration_ms # How long the operation took
14
+ String :detail_json, text: true # Event-specific details (JSON)
15
+ String :occurred_at, null: false # ISO 8601 timestamp
16
+ end
17
+
18
+ run "CREATE INDEX IF NOT EXISTS idx_activity_events_type ON activity_events(event_type)"
19
+ run "CREATE INDEX IF NOT EXISTS idx_activity_events_occurred_at ON activity_events(occurred_at)"
20
+ run "CREATE INDEX IF NOT EXISTS idx_activity_events_session ON activity_events(session_id)"
21
+ end
22
+
23
+ down do
24
+ drop_table?(:activity_events)
25
+ end
26
+ end
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Migration v16: Per-moment feedback (improvements.md #43).
4
+ # Tracks a single thumbs-up/down verdict (+ optional note) per activity_event
5
+ # so the dashboard can surface a trust-calibration signal. Unique on event_id
6
+ # so a given moment has at most one current verdict; repeat clicks upsert.
7
+ Sequel.migration do
8
+ up do
9
+ create_table?(:moment_feedback) do
10
+ primary_key :id
11
+ Integer :event_id, null: false
12
+ String :verdict, null: false # "up" | "down"
13
+ String :note, text: true # optional freeform note
14
+ String :recorded_at, null: false
15
+ index :event_id, unique: true
16
+ end
17
+ end
18
+
19
+ down do
20
+ drop_table?(:moment_feedback)
21
+ end
22
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Migration v17: Access-based staleness scoring (improvements.md #35).
4
+ # Records the last time a fact was surfaced via memory.recall or context
5
+ # injection, derived periodically from activity_events. Sweep-derived rather
6
+ # than per-call so we avoid WAL write contention on the recall hot path.
7
+ Sequel.migration do
8
+ up do
9
+ add_column :facts, :last_recalled_at, String
10
+ end
11
+
12
+ down do
13
+ drop_column :facts, :last_recalled_at
14
+ end
15
+ end
@@ -0,0 +1,190 @@
1
+ # 1.0 Punchlist
2
+
3
+ *Created: 2026-04-28*
4
+
5
+ The remaining work for a stable 1.0 release. Distinct from `improvements.md` —
6
+ that file tracks the long tail of inbound study/idea entries; this file tracks
7
+ **what blocks 1.0 confidence**.
8
+
9
+ Guiding question: *a skeptical Ruby developer should be able to look at one
10
+ screen and say "yes, this is helping, here's the evidence" without trusting our
11
+ marketing.* Today the dashboard tells that story in pieces but not as a
12
+ headline. Each item below closes a specific gap that prevents that headline
13
+ from existing.
14
+
15
+ Items are cross-linked to the canonical entry in `improvements.md` where the
16
+ implementation detail and acceptance criteria live. This file is the
17
+ prioritization view; that file is the work view.
18
+
19
+ ---
20
+
21
+ ## Must-have for 1.0
22
+
23
+ ### 1. Token budget telemetry — *what does memory cost?*
24
+
25
+ **Gap.** `Core::TokenEstimator` exists and is unused outside one helper. We
26
+ have no idea what % of the SessionStart token budget memory consumes per
27
+ session, how it scales with DB size, or whether it's growing.
28
+
29
+ **Acceptance.** Trust panel + `claude-memory digest` show p50/p95 injected
30
+ tokens per session over the last 30 days. Per-session count rides on every
31
+ `hook_context` activity event so the data is queryable post-hoc.
32
+
33
+ **Why must-have.** "Costs you tokens forever" is the strongest critique of any
34
+ context-injection memory system; if we can't answer it numerically, we can't
35
+ defend the trade.
36
+
37
+ → improvements.md entry: *Token Budget Telemetry*
38
+
39
+ ### 2. Hallucination rate as a first-class trust metric
40
+
41
+ **Gap.** `ReferenceMaterialDetector` already classifies suspect facts and we
42
+ know from the #34 audit that ~25% of facts had embedded reasoning (i.e.
43
+ ~75% were bare conclusions at audit time). Neither signal is exposed on the
44
+ dashboard. We display clean numbers; we should display stained ones.
45
+
46
+ **Acceptance.** Trust panel surfaces a `quality_score` derived from
47
+ suspect-fact ratio + bare-conclusion ratio over active facts in both stores.
48
+ Digest includes a 30-day rejection rate ("how much of what we extracted got
49
+ rejected within a week?") so calibration drift is visible.
50
+
51
+ **Why must-have.** We can't claim "memory is helping" if we can't show "memory
52
+ isn't poisoning the well."
53
+
54
+ → improvements.md entry: *Hallucination Rate Metric*
55
+
56
+ ### 3. Negative-fact harm benchmark
57
+
58
+ **Gap.** Every benchmark we run today measures whether memory **helps**.
59
+ Nothing measures whether memory **harms** — i.e. injects a wrong fact and
60
+ Claude follows it. Without this, "memory helps" is unfalsifiable.
61
+
62
+ **Acceptance.** New `spec/benchmarks/dataset/harm_scenarios.yml` with 10–15
63
+ cases where memory holds a stale or wrong fact. Each case scores `harm` if
64
+ Claude's response follows the wrong fact, `safe` otherwise. Wired into
65
+ `bin/run-evals`. >1% harm rate blocks release.
66
+
67
+ **Why must-have.** A retrieval system that occasionally makes Claude *wrong*
68
+ is strictly worse than no memory; we need a release gate that proves we're
69
+ not in that regime.
70
+
71
+ → improvements.md entry: *Negative-Fact Harm Benchmark*
72
+
73
+ ### 4. Publish the CLAUDE.md baseline in headline E2E results
74
+
75
+ **Gap.** `claude_md_adapter` exists in `spec/benchmarks/comparative/adapters/`
76
+ and supports E2E. The adapter is wired into `comparative_helper.rb` but the
77
+ README's headline comparative table doesn't include it. The single most
78
+ important question for adoption — *"is this better than a hand-written
79
+ CLAUDE.md?"* — is currently unanswered in our published numbers.
80
+
81
+ **Acceptance.** Comparative E2E report includes `CLAUDE.md baseline` row in
82
+ `spec/benchmarks/README.md` and in `bin/run-evals --comparative` summary
83
+ output. README explicitly states the win/loss versus the static baseline.
84
+
85
+ **Why must-have.** Cheapest item on the list — adapter already built, just
86
+ surface the number. If we can't beat a static CLAUDE.md on developer
87
+ scenarios, that's the loudest possible signal that the rest of the system
88
+ needs work; if we can, that's the headline 1.0 brag.
89
+
90
+ → improvements.md entry: *CLAUDE.md Baseline in Headline Results*
91
+
92
+ ### 5. `claude-memory show` — human-readable "what would be injected"
93
+
94
+ **Gap.** Inspecting memory state today requires the dashboard or several CLI
95
+ commands (`recall`, `stats`, `census`). The CLAUDE.md alternative is
96
+ `cat CLAUDE.md` — instant, plain-English, no tool. We need the same one-line
97
+ inspect surface.
98
+
99
+ **Acceptance.** `claude-memory show` runs the same `Hook::ContextInjector`
100
+ path real sessions use, prints what would be injected next session in plain
101
+ English (not JSON), sized to fit a terminal, with predicate-grouped sections
102
+ matching the snapshot format.
103
+
104
+ **Why must-have.** Trust requires inspectability. A user who can't see what
105
+ memory will inject can't develop confidence in it.
106
+
107
+ → improvements.md entry: *claude-memory show*
108
+
109
+ ### 6. Release-to-release benchmark scoreboard
110
+
111
+ **Gap.** Benchmark output is textual today. Nothing diff-able across versions.
112
+ Regressions land silently — the only reason we caught the FTS5/RRF
113
+ normalization bug was a manual run.
114
+
115
+ **Acceptance.** Each `bin/run-evals` run writes
116
+ `spec/benchmarks/results/<version>.json`. New `bin/bench-diff` (or rake task)
117
+ compares against the last tagged version's JSON and reports deltas. Release
118
+ script (`/release` skill) reads it and refuses to ship on regressions over a
119
+ configurable threshold.
120
+
121
+ **Why must-have.** Without longitudinal tracking, every benchmark we run is a
122
+ snapshot. 1.0 is the moment we commit to *not regressing* what we ship.
123
+
124
+ → improvements.md entry: *Benchmark Scoreboard Diff*
125
+
126
+ ---
127
+
128
+ ## Strong post-1.0
129
+
130
+ These shouldn't block 1.0 but should land in the next release window.
131
+
132
+ ### 7. First-week ROI nudge
133
+
134
+ SessionEnd hook prints `memory contributed N facts this session, %used = X`
135
+ inline for the first ~10 sessions. Closes the cold-start gap where new users
136
+ don't see value because they don't think to look.
137
+
138
+ → improvements.md entry: *First-Week ROI Nudge*
139
+
140
+ ### 8. Real-session repeat-correction detector
141
+
142
+ The repeat-correction benchmark (#32) is synthetic; production has no
143
+ equivalent signal. Analyze `activity_events` to detect "this fact was injected
144
+ last session, the user re-stated it this session" — that's where memory is
145
+ silently failing.
146
+
147
+ → improvements.md entry: *Real-Session Repeat-Correction Detection*
148
+
149
+ ### 9. Token-cost growth tracking
150
+
151
+ Builds on #1. Weekly digest reports "context cost grew X% over 30d" as an
152
+ anomaly signal that the DB is bloating or context injection is going wide.
153
+
154
+ → improvements.md entry: *Token-Cost Growth Tracking*
155
+
156
+ ### 10. Drift dashboard
157
+
158
+ Snapshot `census` weekly, surface predicate distribution shifts on the
159
+ dashboard. Answers "is my fact base going off?" without a manual audit.
160
+
161
+ → improvements.md entry: *Drift Dashboard*
162
+
163
+ ---
164
+
165
+ ## Defer / skip for 1.0
166
+
167
+ - **#44 Universal search box** — cosmetic given the gaps above. Knowledge tab
168
+ drawers cover the primary need.
169
+ - **#45 Live SSE/WebSocket feed** — polling is adequate; dashboard polish, not
170
+ a confidence gap.
171
+
172
+ ---
173
+
174
+ ## Sequencing recommendation
175
+
176
+ Smallest set that materially shifts 1.0 confidence (~2 days):
177
+
178
+ 1. **Token budget telemetry** (#1) — closes the loudest critique.
179
+ 2. **CLAUDE.md baseline publish** (#4) — adapter already built, one report change.
180
+ 3. **Hallucination rate** (#2) — reuses ReferenceMaterialDetector.
181
+
182
+ Then in roughly priority order: `claude-memory show` (#5), harm benchmark
183
+ (#3), scoreboard (#6). Post-1.0 items follow naturally once the must-haves
184
+ land.
185
+
186
+ ---
187
+
188
+ *Last updated: 2026-04-28 — initial punchlist drawn from session-end critique
189
+ of observability/outcome gaps. Each entry will be elaborated with concrete
190
+ file:line refs in improvements.md as it's worked.*
data/docs/EXAMPLES.md CHANGED
@@ -428,9 +428,48 @@ Claude: "You're using Context API for state management. You previously used Redu
428
428
 
429
429
  ---
430
430
 
431
+ ## Inspecting What Memory Knows (0.10.0+)
432
+
433
+ When you want to see what's actually in memory — what's been extracted, which
434
+ facts Claude has been reaching for, what's stale, what's contradicting — open
435
+ the dashboard:
436
+
437
+ ```bash
438
+ claude-memory dashboard
439
+ ```
440
+
441
+ Default port `http://localhost:3377`. Surfaces:
442
+
443
+ - A **moments feed** — every recall, context injection, extraction event with
444
+ the facts they touched. Click any moment for the full payload.
445
+ - A **Trust sidebar** — week-over-week activity, your global "fingerprint",
446
+ utilization ratio (% of recently extracted facts Claude actually used), and
447
+ your 👍/👎 feedback ratio.
448
+ - **Conflicts** with display-layer dedup so you don't have to triage 11 rows
449
+ of the same contradiction one at a time.
450
+ - **Knowledge** — facts grouped by predicate, with a separate References
451
+ section for auto-detected reference material.
452
+
453
+ For a markdown summary you can email or commit:
454
+
455
+ ```bash
456
+ claude-memory digest --since 7
457
+ ```
458
+
459
+ For a privacy-safe cross-project audit:
460
+
461
+ ```bash
462
+ claude-memory census
463
+ ```
464
+
465
+ See **[Dashboard guide →](dashboard.md)** for the full panel reference.
466
+
467
+ ---
468
+
431
469
  ## Next Steps
432
470
 
433
- - 📖 [Read the Getting Started Guide](GETTING_STARTED.md) *(coming soon)*
434
- - 🔧 [Set up the Claude Code Plugin](PLUGIN.md)
471
+ - 📖 [Read the Getting Started Guide](GETTING_STARTED.md)
472
+ - 📊 [Inspect with the Dashboard](dashboard.md)
473
+ - 🔧 [Set up the Claude Code Plugin](plugin.md)
435
474
  - 🏗️ [Understand the Architecture](architecture.md)
436
475
  - 📝 [Check the Changelog](../CHANGELOG.md)