claude_memory 0.9.1 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. checksums.yaml +4 -4
  2. data/.claude/memory.sqlite3 +0 -0
  3. data/.claude/skills/dashboard/SKILL.md +42 -0
  4. data/.claude-plugin/marketplace.json +1 -1
  5. data/.claude-plugin/plugin.json +1 -1
  6. data/CHANGELOG.md +130 -0
  7. data/CLAUDE.md +30 -6
  8. data/README.md +66 -2
  9. data/db/migrations/015_add_activity_events.rb +26 -0
  10. data/db/migrations/016_add_moment_feedback.rb +22 -0
  11. data/db/migrations/017_add_last_recalled_at.rb +15 -0
  12. data/docs/1_0_punchlist.md +371 -0
  13. data/docs/EXAMPLES.md +41 -2
  14. data/docs/GETTING_STARTED.md +33 -4
  15. data/docs/architecture.md +22 -7
  16. data/docs/audit-queries.md +131 -0
  17. data/docs/dashboard.md +192 -0
  18. data/docs/improvements.md +650 -9
  19. data/docs/influence/cq.md +187 -0
  20. data/docs/plugin.md +13 -6
  21. data/docs/quality_review.md +524 -172
  22. data/docs/reflection_memory_as_accumulating_judgment.md +67 -0
  23. data/lib/claude_memory/activity_log.rb +86 -0
  24. data/lib/claude_memory/commands/census_command.rb +210 -0
  25. data/lib/claude_memory/commands/completion_command.rb +3 -0
  26. data/lib/claude_memory/commands/dashboard_command.rb +54 -0
  27. data/lib/claude_memory/commands/dedupe_conflicts_command.rb +55 -0
  28. data/lib/claude_memory/commands/digest_command.rb +273 -0
  29. data/lib/claude_memory/commands/hook_command.rb +61 -2
  30. data/lib/claude_memory/commands/initializers/hooks_configurator.rb +7 -4
  31. data/lib/claude_memory/commands/reclassify_references_command.rb +56 -0
  32. data/lib/claude_memory/commands/registry.rb +7 -1
  33. data/lib/claude_memory/commands/show_command.rb +90 -0
  34. data/lib/claude_memory/commands/skills/distill-transcripts.md +13 -1
  35. data/lib/claude_memory/commands/stats_command.rb +131 -2
  36. data/lib/claude_memory/commands/sweep_command.rb +2 -0
  37. data/lib/claude_memory/configuration.rb +16 -0
  38. data/lib/claude_memory/core/relative_time.rb +9 -0
  39. data/lib/claude_memory/dashboard/api.rb +610 -0
  40. data/lib/claude_memory/dashboard/conflicts.rb +279 -0
  41. data/lib/claude_memory/dashboard/efficacy.rb +127 -0
  42. data/lib/claude_memory/dashboard/fact_presenter.rb +109 -0
  43. data/lib/claude_memory/dashboard/health.rb +175 -0
  44. data/lib/claude_memory/dashboard/index.html +2707 -0
  45. data/lib/claude_memory/dashboard/knowledge.rb +136 -0
  46. data/lib/claude_memory/dashboard/moments.rb +244 -0
  47. data/lib/claude_memory/dashboard/reuse.rb +97 -0
  48. data/lib/claude_memory/dashboard/scoped_fact_resolver.rb +95 -0
  49. data/lib/claude_memory/dashboard/server.rb +211 -0
  50. data/lib/claude_memory/dashboard/timeline.rb +68 -0
  51. data/lib/claude_memory/dashboard/trust.rb +454 -0
  52. data/lib/claude_memory/distill/bare_conclusion_detector.rb +71 -0
  53. data/lib/claude_memory/distill/reference_material_detector.rb +78 -0
  54. data/lib/claude_memory/hook/auto_memory_mirror.rb +112 -0
  55. data/lib/claude_memory/hook/context_injector.rb +97 -3
  56. data/lib/claude_memory/hook/handler.rb +191 -3
  57. data/lib/claude_memory/mcp/handlers/management_handlers.rb +8 -0
  58. data/lib/claude_memory/mcp/query_guide.rb +11 -0
  59. data/lib/claude_memory/mcp/text_summary.rb +29 -0
  60. data/lib/claude_memory/mcp/tool_definitions.rb +13 -0
  61. data/lib/claude_memory/mcp/tools.rb +148 -0
  62. data/lib/claude_memory/publish.rb +13 -21
  63. data/lib/claude_memory/recall/stale_detector.rb +67 -0
  64. data/lib/claude_memory/resolve/predicate_policy.rb +2 -0
  65. data/lib/claude_memory/resolve/resolver.rb +41 -11
  66. data/lib/claude_memory/store/llm_cache.rb +68 -0
  67. data/lib/claude_memory/store/metrics_aggregator.rb +96 -0
  68. data/lib/claude_memory/store/schema_manager.rb +1 -1
  69. data/lib/claude_memory/store/sqlite_store.rb +47 -143
  70. data/lib/claude_memory/store/store_manager.rb +29 -0
  71. data/lib/claude_memory/sweep/maintenance.rb +216 -0
  72. data/lib/claude_memory/sweep/recall_timestamp_refresher.rb +83 -0
  73. data/lib/claude_memory/sweep/sweeper.rb +2 -0
  74. data/lib/claude_memory/templates/hooks.example.json +5 -0
  75. data/lib/claude_memory/version.rb +1 -1
  76. data/lib/claude_memory.rb +24 -0
  77. metadata +51 -1
@@ -0,0 +1,131 @@
1
+ # Audit Queries
2
+
3
+ Pre-written SQL for validating that the ClaudeMemory plugin is being invoked when it should. Run via [cq](https://github.com/technicalpickles/cq) — install with `cargo install --git https://github.com/technicalpickles/cq`.
4
+
5
+ These query Claude Code's raw transcripts (in `~/.claude/projects/`), not ClaudeMemory's own SQLite databases. That's deliberate: cq sees *all* tool calls including ones that bypassed the MCP server entirely, which is exactly the angle needed to spot activation gaps.
6
+
7
+ For server-side telemetry (counts, latencies of MCP calls that *did* land), use `claude-memory stats --tools` against ClaudeMemory's `mcp_tool_calls` table instead.
8
+
9
+ ## Query 1 — Memory plugin activation rate
10
+
11
+ How often is any `mcp__memory__*` tool being called, normalized by total sessions?
12
+
13
+ ```bash
14
+ cq sql "
15
+ WITH session_window AS (
16
+ SELECT DISTINCT session_id FROM messages
17
+ ),
18
+ memory_sessions AS (
19
+ SELECT DISTINCT session_id FROM tool_calls
20
+ WHERE name LIKE 'mcp__memory__%'
21
+ )
22
+ SELECT
23
+ (SELECT count(*) FROM session_window) AS total_sessions,
24
+ (SELECT count(*) FROM memory_sessions) AS sessions_with_memory_call,
25
+ ROUND(100.0 * (SELECT count(*) FROM memory_sessions)
26
+ / NULLIF((SELECT count(*) FROM session_window), 0), 1) AS pct
27
+ " --since 30d --table
28
+ ```
29
+
30
+ **Why it matters**: a low percentage doesn't mean the plugin is broken — many sessions don't need memory. It's a denominator for the next two queries.
31
+
32
+ ## Query 2 — Sessions that asked memory-shaped questions but never called memory
33
+
34
+ The most useful query. Surfaces user prompts where memory *should* have been the obvious tool, but Claude went elsewhere (Read, Grep, Bash) instead.
35
+
36
+ ```bash
37
+ cq sql "
38
+ WITH memory_sessions AS (
39
+ SELECT DISTINCT session_id FROM tool_calls
40
+ WHERE name LIKE 'mcp__memory__%'
41
+ )
42
+ SELECT
43
+ m.session_id,
44
+ m.timestamp,
45
+ left(m.text, 200) AS user_prompt
46
+ FROM messages m
47
+ LEFT JOIN memory_sessions ms ON m.session_id = ms.session_id
48
+ WHERE m.type = 'user'
49
+ AND ms.session_id IS NULL
50
+ AND (
51
+ m.text ILIKE '%why did we%'
52
+ OR m.text ILIKE '%what convention%'
53
+ OR m.text ILIKE '%how do we usually%'
54
+ OR m.text ILIKE '%what did we decide%'
55
+ OR m.text ILIKE '%architecture%'
56
+ OR m.text ILIKE '%what''s the pattern%'
57
+ )
58
+ ORDER BY m.timestamp DESC
59
+ " --since 30d --table --limit 30
60
+ ```
61
+
62
+ **What to do with results**: each row is a candidate for either (a) a tightening of MCP server instructions / skill descriptions, or (b) confirmation that the question genuinely didn't need memory and the keyword filter is too loose.
63
+
64
+ ## Query 3 — Which memory tools actually get called?
65
+
66
+ ```bash
67
+ cq sql "
68
+ SELECT
69
+ name AS tool,
70
+ count(*) AS invocations,
71
+ count(DISTINCT session_id) AS sessions
72
+ FROM tool_calls
73
+ WHERE name LIKE 'mcp__memory__%'
74
+ GROUP BY name
75
+ ORDER BY invocations DESC
76
+ " --since 30d --table
77
+ ```
78
+
79
+ **Expected shape**: `mcp__memory__recall`, `mcp__memory__conventions`, `mcp__memory__decisions` should dominate. Tools that never fire (`memory_fact_graph`, `memory_explain`, `memory_search_concepts`, `memory_facts_by_*`) might have description/triggering issues — same pattern as cq's "skill audit" use case.
80
+
81
+ ## Query 4 — Error rate per memory tool
82
+
83
+ ```bash
84
+ cq sql "
85
+ SELECT
86
+ tc.name AS tool,
87
+ count(*) AS calls,
88
+ sum(CASE WHEN tr.is_error THEN 1 ELSE 0 END) AS errors,
89
+ ROUND(100.0 * sum(CASE WHEN tr.is_error THEN 1 ELSE 0 END)
90
+ / count(*), 1) AS pct_errors
91
+ FROM tool_calls tc
92
+ JOIN tool_results tr ON tc.tool_use_id = tr.tool_use_id
93
+ WHERE tc.name LIKE 'mcp__memory__%'
94
+ GROUP BY tc.name
95
+ ORDER BY errors DESC
96
+ " --since 30d --table
97
+ ```
98
+
99
+ **Why it matters**: a memory tool returning errors is much worse than not firing — Claude sees the failure and learns to avoid that tool. Triage anything above ~5%.
100
+
101
+ ## Query 5 — Result-size distribution (context budget hygiene)
102
+
103
+ ```bash
104
+ cq sql "
105
+ SELECT
106
+ tc.name AS tool,
107
+ count(*) AS calls,
108
+ MIN(length(tr.content)) AS min_chars,
109
+ ROUND(AVG(length(tr.content))) AS avg_chars,
110
+ MAX(length(tr.content)) AS max_chars
111
+ FROM tool_calls tc
112
+ JOIN tool_results tr ON tc.tool_use_id = tr.tool_use_id
113
+ WHERE tc.name LIKE 'mcp__memory__%'
114
+ GROUP BY tc.name
115
+ ORDER BY avg_chars DESC
116
+ " --since 30d --table
117
+ ```
118
+
119
+ **Why it matters**: ClaudeMemory exposes a `compact: true` option that drops receipts for ~60% smaller responses. If averages are large, either the compact flag isn't being passed by callers or the tools that don't accept it are dumping too much.
120
+
121
+ ## When to re-run
122
+
123
+ - Before each release — does the new version improve activation rate or reduce errors?
124
+ - After meaningful changes to MCP server instructions / skill descriptions
125
+ - If a user reports "the memory plugin doesn't seem to do anything" — Query 2 will usually surface the gap concretely
126
+
127
+ ## Related
128
+
129
+ - Source for the methodology: `docs/influence/cq.md`
130
+ - Server-side telemetry alternative: `claude-memory stats --tools --since 30`
131
+ - cq schema reference: `cq schema --examples`
data/docs/dashboard.md ADDED
@@ -0,0 +1,192 @@
1
+ # ClaudeMemory Dashboard
2
+
3
+ Local web UI for inspecting what memory knows, what it's been doing, and where
4
+ it's going wrong. Headline feature of v0.10.0.
5
+
6
+ ## Quick start
7
+
8
+ ```bash
9
+ claude-memory dashboard
10
+ ```
11
+
12
+ Opens `http://localhost:3377` in your default browser. Reads from both the
13
+ global (`~/.claude/memory.sqlite3`) and project (`.claude/memory.sqlite3`)
14
+ databases. No write side effects from page loads — the UI is a viewer with
15
+ explicit action buttons (reject / promote / feedback) where applicable.
16
+
17
+ ```bash
18
+ # Custom port
19
+ claude-memory dashboard --port 4000
20
+
21
+ # Don't auto-open the browser (e.g. running in tmux/headless)
22
+ claude-memory dashboard --no-open
23
+ ```
24
+
25
+ Press `Ctrl+C` in the terminal to stop the server.
26
+
27
+ ## What each panel shows
28
+
29
+ The dashboard is **feed-first**: the main view is a chronological stream of
30
+ *moments* (memory activity events), with sidebar panels giving aggregate signals.
31
+
32
+ ### Sidebar — Trust
33
+
34
+ At-a-glance signals so you can answer "is memory helping?" — and "what does
35
+ it cost?" — in one look:
36
+
37
+ - **This week's moments** — count of value-producing events (recall hits,
38
+ context injections, extractions). Includes a week-over-week delta.
39
+ - **What memory knows about you** — up to 5 global facts rendered as plain
40
+ English. The "fingerprint" of your cross-project preferences.
41
+ - **Needs review** — open conflicts (deduped to distinct contradictions) +
42
+ stale facts (active but not recalled in the configured window) + empty
43
+ recalls (queries that returned nothing).
44
+ - **Token budget (30d)** *(0.11.0+)* — p50/p95/avg `context_tokens` injected
45
+ per SessionStart over the last 30 days, with sample size. Answers "what
46
+ does memory cost per session?" — pairs with the digest's "Context cost"
47
+ section and `claude-memory stats --tokens`.
48
+ - **Quality score (live, 30d)** *(0.11.0+)* — 0–100 hallucination-rate
49
+ proxy. `score = 100 - (suspect_pct + bare_pct)` where suspect = facts
50
+ retagged as `predicate=reference` and bare = decision/convention facts
51
+ whose object skipped the prompt-mandated reason clause. Headline is the
52
+ live 30-day window; the underlying snapshot also exposes a `historical`
53
+ block over all active facts for context. Returns 100 on empty stores.
54
+ - **Utilization (30d)** — of facts extracted in the last 30 days, what % has
55
+ Claude actually surfaced via recall or context injection. Color-coded
56
+ (green ≥40%, yellow ≥15%, red below). Hidden on fresh installs.
57
+ - **Feedback (30d)** — thumbs-up/down ratio from moments you've rated.
58
+
59
+ ### Feed — Moments
60
+
61
+ Real-time stream of memory activity, classified by kind:
62
+
63
+ - `recall_hit` / `recall_empty` — `memory.recall*` calls, with top fact IDs
64
+ resolved to their sentences. Click for full payload.
65
+ - `context_injection` — SessionStart context emitted into Claude's prompt,
66
+ with a preview and the facts it carried. `context_skipped` when injection
67
+ was empty.
68
+ - `extraction` — facts/entities created via `memory.store_extraction`,
69
+ inlined with content preview.
70
+ - `hook_ingest` / `hook_sweep` — pipeline activity.
71
+
72
+ Each moment has a 👍/👎 button. Use them deliberately — the ratio feeds the
73
+ Trust panel and is a calibration signal we read against retrieval quality
74
+ benchmarks.
75
+
76
+ Filter via query params: `kinds=recall_hit`, `before=<ISO timestamp>`.
77
+
78
+ ### Knowledge
79
+
80
+ Active facts grouped by predicate. Sections include:
81
+
82
+ - Decisions, Conventions, Architecture (multi-value)
83
+ - Tech stack (uses_database, uses_framework, uses_language, deployment_platform, auth_method)
84
+ - **References** (added 0.10.0) — facts auto-tagged as reference material
85
+ by `Distill::ReferenceMaterialDetector` (LOC counts, "X is a plugin…"
86
+ templates, author attributions). Separated from conventions to keep the
87
+ signal-to-noise ratio of the conventions section high.
88
+
89
+ ### Conflicts
90
+
91
+ Open contradictions, deduped at the display layer: identical
92
+ `(subject, predicate, object_pair)` detections collapse into one row with a
93
+ `×N` badge. The "Needs review" sidebar count uses the deduped count, not
94
+ raw rows.
95
+
96
+ Each row links to:
97
+ - Both sides of the conflict with provenance
98
+ - A bulk-reject action ("reject all rows that match this exact contradiction")
99
+ - The originating activity event
100
+
101
+ ### Reuse
102
+
103
+ Most-used facts in the time window. Useful for answering "which facts are
104
+ actually doing the work?" when you suspect memory is accumulating dead weight.
105
+
106
+ ### Activity (timeline)
107
+
108
+ Daily rollup of facts created, content ingested, hook events fired, and
109
+ recalls performed over the last 30 days. Click any day to drill into the
110
+ underlying events.
111
+
112
+ ### Health
113
+
114
+ Four checks: global database, project database, hooks installation,
115
+ sqlite-vec coverage. Each surfaces an actionable fix string (e.g.,
116
+ "Run `claude-memory init` to install the standard hook set"). Status
117
+ escalates to the worst individual check (error > warning > healthy).
118
+
119
+ ### Activity drill-down
120
+
121
+ Clicking any moment opens a modal with the parsed payload, prettified JSON,
122
+ and — for recall events — a "what triggered this?" correlation showing the
123
+ preceding ingest and the user prompt that motivated the recall.
124
+
125
+ ### Query tester
126
+
127
+ Run `memory.recall*` queries inline and see scored results, with optional
128
+ score traces (`vec_rank`, `fts_rank`, `rrf_final`) for hybrid retrieval
129
+ debugging. Surfaces an actionable hint if FTS5 corruption is detected
130
+ (suggests `claude-memory compact`).
131
+
132
+ ## When to use it
133
+
134
+ - **After any session that surprised you** — was the recall actually firing?
135
+ Did the fact you taught get extracted? The Moments feed answers both.
136
+ - **Before promoting a fact to global** — see what's already there in the
137
+ Knowledge panel, including dedupe siblings.
138
+ - **When `claude-memory doctor` warns about conflicts** — the Conflicts
139
+ panel groups duplicates so you don't have to handle them one row at a time.
140
+ - **When deciding what to keep** — the Reuse panel shows which facts have
141
+ earned their spot; everything else is staleness candidate per the
142
+ `claude-memory stats --stale` listing.
143
+
144
+ ## What it's not
145
+
146
+ - Not an editor for fact text (use `claude-memory promote` / `reject`).
147
+ - Not a replacement for the CLI — for headless / scripted use, prefer
148
+ `claude-memory stats`, `claude-memory digest`, `claude-memory census`.
149
+ - Not a multi-user surface — bound to localhost, single-process WEBrick.
150
+ - Not a long-running service — runs in the foreground; close when done.
151
+
152
+ ## Architecture
153
+
154
+ The dashboard is a thin web layer over the same `Recall`, `Conflicts`,
155
+ `Trust`, `Moments`, etc. classes the MCP server uses. Each panel is backed by
156
+ a dedicated module under `lib/claude_memory/dashboard/`:
157
+
158
+ | Panel / endpoint | Module | Responsibility |
159
+ |---|---|---|
160
+ | Trust sidebar | `Dashboard::Trust` | Weekly moments, fingerprint, utilization, feedback |
161
+ | Feed | `Dashboard::Moments` | Activity-event classification + presenter |
162
+ | Knowledge | `Dashboard::Knowledge` | Predicate-grouped fact summary |
163
+ | Conflicts | `Dashboard::Conflicts` | Dedup grouping, bulk-reject helper |
164
+ | Reuse | `Dashboard::Reuse` | Most-used-fact ranking |
165
+ | Health | `Dashboard::Health` | Four system checks with fix strings |
166
+ | Timeline | `Dashboard::Timeline` | 30-day daily rollup |
167
+ | Routing | `Dashboard::API` | HTTP-shape glue + per-endpoint formatting |
168
+
169
+ Connections are released after each request so the dashboard never holds a
170
+ WAL writer lock open across page loads.
171
+
172
+ ## Related CLI
173
+
174
+ - `claude-memory digest [--since DAYS] [--output FILE]` — markdown report of
175
+ the same Trust + Knowledge + Conflicts + Feedback signals plus
176
+ **Context cost** (token-budget p50/p95) and **Quality** (score + rejection
177
+ rate) sections. Suitable for email or commit-into-repo.
178
+ - `claude-memory show [--pending] [--source SOURCE]` *(0.11.0+)* — print
179
+ what memory would inject at the next SessionStart in plain Markdown.
180
+ Same `Hook::ContextInjector` path real sessions use, so the output
181
+ matches what Claude actually receives. Footer reports fact count, ~token
182
+ estimate, and char count.
183
+ - `claude-memory stats --tokens [--since DAYS]` *(0.11.0+)* — token budget
184
+ histogram (p50/p95/avg/min/max + bucketed distribution) for SessionStart
185
+ context injections. Same data the Trust panel's Token budget block aggregates.
186
+ - `claude-memory census [--root DIR]` — privacy-safe cross-project
187
+ predicate vocabulary scan; pairs with the Knowledge panel for "what
188
+ predicates does my whole tree use?".
189
+ - `claude-memory stats --stale [--stale-days N]` — list facts the dashboard
190
+ flags as stale.
191
+ - `claude-memory dedupe-conflicts` / `reclassify-references` — one-shot
192
+ cleanups for what the Conflicts and Knowledge → References panels surface.