claude_memory 0.9.0 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/memory.sqlite3 +0 -0
- data/.claude/rules/claude_memory.generated.md +63 -1
- data/.claude/skills/dashboard/SKILL.md +42 -0
- data/.claude/skills/release/SKILL.md +168 -0
- data/.claude-plugin/marketplace.json +1 -1
- data/.claude-plugin/plugin.json +1 -1
- data/CHANGELOG.md +92 -0
- data/CLAUDE.md +21 -5
- data/README.md +32 -2
- data/db/migrations/015_add_activity_events.rb +26 -0
- data/db/migrations/016_add_moment_feedback.rb +22 -0
- data/db/migrations/017_add_last_recalled_at.rb +15 -0
- data/docs/1_0_punchlist.md +190 -0
- data/docs/EXAMPLES.md +41 -2
- data/docs/GETTING_STARTED.md +31 -4
- data/docs/architecture.md +22 -7
- data/docs/audit-queries.md +131 -0
- data/docs/dashboard.md +172 -0
- data/docs/improvements.md +465 -9
- data/docs/influence/cq.md +187 -0
- data/docs/plugin.md +13 -6
- data/docs/quality_review.md +489 -172
- data/docs/reflection_memory_as_accumulating_judgment.md +67 -0
- data/lib/claude_memory/activity_log.rb +86 -0
- data/lib/claude_memory/commands/census_command.rb +210 -0
- data/lib/claude_memory/commands/completion_command.rb +3 -0
- data/lib/claude_memory/commands/dashboard_command.rb +54 -0
- data/lib/claude_memory/commands/dedupe_conflicts_command.rb +55 -0
- data/lib/claude_memory/commands/digest_command.rb +181 -0
- data/lib/claude_memory/commands/hook_command.rb +34 -0
- data/lib/claude_memory/commands/reclassify_references_command.rb +56 -0
- data/lib/claude_memory/commands/registry.rb +6 -1
- data/lib/claude_memory/commands/skills/distill-transcripts.md +13 -1
- data/lib/claude_memory/commands/stats_command.rb +38 -1
- data/lib/claude_memory/commands/sweep_command.rb +2 -0
- data/lib/claude_memory/configuration.rb +16 -0
- data/lib/claude_memory/core/relative_time.rb +9 -0
- data/lib/claude_memory/dashboard/api.rb +610 -0
- data/lib/claude_memory/dashboard/conflicts.rb +279 -0
- data/lib/claude_memory/dashboard/efficacy.rb +127 -0
- data/lib/claude_memory/dashboard/fact_presenter.rb +109 -0
- data/lib/claude_memory/dashboard/health.rb +175 -0
- data/lib/claude_memory/dashboard/index.html +2707 -0
- data/lib/claude_memory/dashboard/knowledge.rb +136 -0
- data/lib/claude_memory/dashboard/moments.rb +244 -0
- data/lib/claude_memory/dashboard/reuse.rb +97 -0
- data/lib/claude_memory/dashboard/scoped_fact_resolver.rb +95 -0
- data/lib/claude_memory/dashboard/server.rb +211 -0
- data/lib/claude_memory/dashboard/timeline.rb +68 -0
- data/lib/claude_memory/dashboard/trust.rb +285 -0
- data/lib/claude_memory/distill/reference_material_detector.rb +78 -0
- data/lib/claude_memory/hook/auto_memory_mirror.rb +112 -0
- data/lib/claude_memory/hook/context_injector.rb +97 -3
- data/lib/claude_memory/hook/handler.rb +50 -3
- data/lib/claude_memory/mcp/handlers/management_handlers.rb +8 -0
- data/lib/claude_memory/mcp/query_guide.rb +11 -0
- data/lib/claude_memory/mcp/server.rb +8 -2
- data/lib/claude_memory/mcp/text_summary.rb +29 -0
- data/lib/claude_memory/mcp/tool_definitions.rb +13 -0
- data/lib/claude_memory/mcp/tools.rb +148 -0
- data/lib/claude_memory/publish.rb +13 -21
- data/lib/claude_memory/recall/stale_detector.rb +67 -0
- data/lib/claude_memory/resolve/predicate_policy.rb +2 -0
- data/lib/claude_memory/resolve/resolver.rb +41 -11
- data/lib/claude_memory/store/llm_cache.rb +68 -0
- data/lib/claude_memory/store/metrics_aggregator.rb +96 -0
- data/lib/claude_memory/store/schema_manager.rb +1 -1
- data/lib/claude_memory/store/sqlite_store.rb +47 -143
- data/lib/claude_memory/store/store_manager.rb +29 -0
- data/lib/claude_memory/sweep/maintenance.rb +216 -0
- data/lib/claude_memory/sweep/recall_timestamp_refresher.rb +83 -0
- data/lib/claude_memory/sweep/sweeper.rb +2 -0
- data/lib/claude_memory/version.rb +1 -1
- data/lib/claude_memory.rb +22 -0
- metadata +50 -1
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
# 1.0 Punchlist
|
|
2
|
+
|
|
3
|
+
*Created: 2026-04-28*
|
|
4
|
+
|
|
5
|
+
The remaining work for a stable 1.0 release. Distinct from `improvements.md` —
|
|
6
|
+
that file tracks the long tail of inbound study/idea entries; this file tracks
|
|
7
|
+
**what blocks 1.0 confidence**.
|
|
8
|
+
|
|
9
|
+
Guiding question: *a skeptical Ruby developer should be able to look at one
|
|
10
|
+
screen and say "yes, this is helping, here's the evidence" without trusting our
|
|
11
|
+
marketing.* Today the dashboard tells that story in pieces but not as a
|
|
12
|
+
headline. Each item below closes a specific gap that prevents that headline
|
|
13
|
+
from existing.
|
|
14
|
+
|
|
15
|
+
Items are cross-linked to the canonical entry in `improvements.md` where the
|
|
16
|
+
implementation detail and acceptance criteria live. This file is the
|
|
17
|
+
prioritization view; that file is the work view.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Must-have for 1.0
|
|
22
|
+
|
|
23
|
+
### 1. Token budget telemetry — *what does memory cost?*
|
|
24
|
+
|
|
25
|
+
**Gap.** `Core::TokenEstimator` exists and is unused outside one helper. We
|
|
26
|
+
have no idea what % of the SessionStart token budget memory consumes per
|
|
27
|
+
session, how it scales with DB size, or whether it's growing.
|
|
28
|
+
|
|
29
|
+
**Acceptance.** Trust panel + `claude-memory digest` show p50/p95 injected
|
|
30
|
+
tokens per session over the last 30 days. Per-session count rides on every
|
|
31
|
+
`hook_context` activity event so the data is queryable post-hoc.
|
|
32
|
+
|
|
33
|
+
**Why must-have.** "Costs you tokens forever" is the strongest critique of any
|
|
34
|
+
context-injection memory system; if we can't answer it numerically, we can't
|
|
35
|
+
defend the trade.
|
|
36
|
+
|
|
37
|
+
→ improvements.md entry: *Token Budget Telemetry*
|
|
38
|
+
|
|
39
|
+
### 2. Hallucination rate as a first-class trust metric
|
|
40
|
+
|
|
41
|
+
**Gap.** `ReferenceMaterialDetector` already classifies suspect facts and we
|
|
42
|
+
know from the #34 audit that ~25% of facts had embedded reasoning (i.e.
|
|
43
|
+
~75% were bare conclusions at audit time). Neither signal is exposed on the
|
|
44
|
+
dashboard. We display clean numbers; we should display stained ones.
|
|
45
|
+
|
|
46
|
+
**Acceptance.** Trust panel surfaces a `quality_score` derived from
|
|
47
|
+
suspect-fact ratio + bare-conclusion ratio over active facts in both stores.
|
|
48
|
+
Digest includes a 30-day rejection rate ("how much of what we extracted got
|
|
49
|
+
rejected within a week?") so calibration drift is visible.
|
|
50
|
+
|
|
51
|
+
**Why must-have.** We can't claim "memory is helping" if we can't show "memory
|
|
52
|
+
isn't poisoning the well."
|
|
53
|
+
|
|
54
|
+
→ improvements.md entry: *Hallucination Rate Metric*
|
|
55
|
+
|
|
56
|
+
### 3. Negative-fact harm benchmark
|
|
57
|
+
|
|
58
|
+
**Gap.** Every benchmark we run today measures whether memory **helps**.
|
|
59
|
+
Nothing measures whether memory **harms** — i.e. injects a wrong fact and
|
|
60
|
+
Claude follows it. Without this, "memory helps" is unfalsifiable.
|
|
61
|
+
|
|
62
|
+
**Acceptance.** New `spec/benchmarks/dataset/harm_scenarios.yml` with 10–15
|
|
63
|
+
cases where memory holds a stale or wrong fact. Each case scores `harm` if
|
|
64
|
+
Claude's response follows the wrong fact, `safe` otherwise. Wired into
|
|
65
|
+
`bin/run-evals`. >1% harm rate blocks release.
|
|
66
|
+
|
|
67
|
+
**Why must-have.** A retrieval system that occasionally makes Claude *wrong*
|
|
68
|
+
is strictly worse than no memory; we need a release gate that proves we're
|
|
69
|
+
not in that regime.
|
|
70
|
+
|
|
71
|
+
→ improvements.md entry: *Negative-Fact Harm Benchmark*
|
|
72
|
+
|
|
73
|
+
### 4. Publish the CLAUDE.md baseline in headline E2E results
|
|
74
|
+
|
|
75
|
+
**Gap.** `claude_md_adapter` exists in `spec/benchmarks/comparative/adapters/`
|
|
76
|
+
and supports E2E. The adapter is wired into `comparative_helper.rb` but the
|
|
77
|
+
README's headline comparative table doesn't include it. The single most
|
|
78
|
+
important question for adoption — *"is this better than a hand-written
|
|
79
|
+
CLAUDE.md?"* — is currently unanswered in our published numbers.
|
|
80
|
+
|
|
81
|
+
**Acceptance.** Comparative E2E report includes `CLAUDE.md baseline` row in
|
|
82
|
+
`spec/benchmarks/README.md` and in `bin/run-evals --comparative` summary
|
|
83
|
+
output. README explicitly states the win/loss versus the static baseline.
|
|
84
|
+
|
|
85
|
+
**Why must-have.** Cheapest item on the list — adapter already built, just
|
|
86
|
+
surface the number. If we can't beat a static CLAUDE.md on developer
|
|
87
|
+
scenarios, that's the loudest possible signal that the rest of the system
|
|
88
|
+
needs work; if we can, that's the headline 1.0 brag.
|
|
89
|
+
|
|
90
|
+
→ improvements.md entry: *CLAUDE.md Baseline in Headline Results*
|
|
91
|
+
|
|
92
|
+
### 5. `claude-memory show` — human-readable "what would be injected"
|
|
93
|
+
|
|
94
|
+
**Gap.** Inspecting memory state today requires the dashboard or several CLI
|
|
95
|
+
commands (`recall`, `stats`, `census`). The CLAUDE.md alternative is
|
|
96
|
+
`cat CLAUDE.md` — instant, plain-English, no tool. We need the same one-line
|
|
97
|
+
inspect surface.
|
|
98
|
+
|
|
99
|
+
**Acceptance.** `claude-memory show` runs the same `Hook::ContextInjector`
|
|
100
|
+
path real sessions use, prints what would be injected next session in plain
|
|
101
|
+
English (not JSON), sized to fit a terminal, with predicate-grouped sections
|
|
102
|
+
matching the snapshot format.
|
|
103
|
+
|
|
104
|
+
**Why must-have.** Trust requires inspectability. A user who can't see what
|
|
105
|
+
memory will inject can't develop confidence in it.
|
|
106
|
+
|
|
107
|
+
→ improvements.md entry: *claude-memory show*
|
|
108
|
+
|
|
109
|
+
### 6. Release-to-release benchmark scoreboard
|
|
110
|
+
|
|
111
|
+
**Gap.** Benchmark output is textual today. Nothing diff-able across versions.
|
|
112
|
+
Regressions land silently — the only reason we caught the FTS5/RRF
|
|
113
|
+
normalization bug was a manual run.
|
|
114
|
+
|
|
115
|
+
**Acceptance.** Each `bin/run-evals` run writes
|
|
116
|
+
`spec/benchmarks/results/<version>.json`. New `bin/bench-diff` (or rake task)
|
|
117
|
+
compares against the last tagged version's JSON and reports deltas. Release
|
|
118
|
+
script (`/release` skill) reads it and refuses to ship on regressions over a
|
|
119
|
+
configurable threshold.
|
|
120
|
+
|
|
121
|
+
**Why must-have.** Without longitudinal tracking, every benchmark we run is a
|
|
122
|
+
snapshot. 1.0 is the moment we commit to *not regressing* what we ship.
|
|
123
|
+
|
|
124
|
+
→ improvements.md entry: *Benchmark Scoreboard Diff*
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## Strong post-1.0
|
|
129
|
+
|
|
130
|
+
These shouldn't block 1.0 but should land in the next release window.
|
|
131
|
+
|
|
132
|
+
### 7. First-week ROI nudge
|
|
133
|
+
|
|
134
|
+
SessionEnd hook prints `memory contributed N facts this session, %used = X`
|
|
135
|
+
inline for the first ~10 sessions. Closes the cold-start gap where new users
|
|
136
|
+
don't see value because they don't think to look.
|
|
137
|
+
|
|
138
|
+
→ improvements.md entry: *First-Week ROI Nudge*
|
|
139
|
+
|
|
140
|
+
### 8. Real-session repeat-correction detector
|
|
141
|
+
|
|
142
|
+
The repeat-correction benchmark (#32) is synthetic; production has no
|
|
143
|
+
equivalent signal. Analyze `activity_events` to detect "this fact was injected
|
|
144
|
+
last session, the user re-stated it this session" — that's where memory is
|
|
145
|
+
silently failing.
|
|
146
|
+
|
|
147
|
+
→ improvements.md entry: *Real-Session Repeat-Correction Detection*
|
|
148
|
+
|
|
149
|
+
### 9. Token-cost growth tracking
|
|
150
|
+
|
|
151
|
+
Builds on #1. Weekly digest reports "context cost grew X% over 30d" as an
|
|
152
|
+
anomaly signal that the DB is bloating or context injection is going wide.
|
|
153
|
+
|
|
154
|
+
→ improvements.md entry: *Token-Cost Growth Tracking*
|
|
155
|
+
|
|
156
|
+
### 10. Drift dashboard
|
|
157
|
+
|
|
158
|
+
Snapshot `census` weekly, surface predicate distribution shifts on the
|
|
159
|
+
dashboard. Answers "is my fact base going off?" without a manual audit.
|
|
160
|
+
|
|
161
|
+
→ improvements.md entry: *Drift Dashboard*
|
|
162
|
+
|
|
163
|
+
---
|
|
164
|
+
|
|
165
|
+
## Defer / skip for 1.0
|
|
166
|
+
|
|
167
|
+
- **#44 Universal search box** — cosmetic given the gaps above. Knowledge tab
|
|
168
|
+
drawers cover the primary need.
|
|
169
|
+
- **#45 Live SSE/WebSocket feed** — polling is adequate; dashboard polish, not
|
|
170
|
+
a confidence gap.
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Sequencing recommendation
|
|
175
|
+
|
|
176
|
+
Smallest set that materially shifts 1.0 confidence (~2 days):
|
|
177
|
+
|
|
178
|
+
1. **Token budget telemetry** (#1) — closes the loudest critique.
|
|
179
|
+
2. **CLAUDE.md baseline publish** (#4) — adapter already built, one report change.
|
|
180
|
+
3. **Hallucination rate** (#2) — reuses ReferenceMaterialDetector.
|
|
181
|
+
|
|
182
|
+
Then in roughly priority order: `claude-memory show` (#5), harm benchmark
|
|
183
|
+
(#3), scoreboard (#6). Post-1.0 items follow naturally once the must-haves
|
|
184
|
+
land.
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
*Last updated: 2026-04-28 — initial punchlist drawn from session-end critique
|
|
189
|
+
of observability/outcome gaps. Each entry will be elaborated with concrete
|
|
190
|
+
file:line refs in improvements.md as it's worked.*
|
data/docs/EXAMPLES.md
CHANGED
|
@@ -428,9 +428,48 @@ Claude: "You're using Context API for state management. You previously used Redu
|
|
|
428
428
|
|
|
429
429
|
---
|
|
430
430
|
|
|
431
|
+
## Inspecting What Memory Knows (0.10.0+)
|
|
432
|
+
|
|
433
|
+
When you want to see what's actually in memory — what's been extracted, which
|
|
434
|
+
facts Claude has been reaching for, what's stale, what's contradicting — open
|
|
435
|
+
the dashboard:
|
|
436
|
+
|
|
437
|
+
```bash
|
|
438
|
+
claude-memory dashboard
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
Default port `http://localhost:3377`. Surfaces:
|
|
442
|
+
|
|
443
|
+
- A **moments feed** — every recall, context injection, extraction event with
|
|
444
|
+
the facts they touched. Click any moment for the full payload.
|
|
445
|
+
- A **Trust sidebar** — week-over-week activity, your global "fingerprint",
|
|
446
|
+
utilization ratio (% of recently extracted facts Claude actually used), and
|
|
447
|
+
your 👍/👎 feedback ratio.
|
|
448
|
+
- **Conflicts** with display-layer dedup so you don't have to triage 11 rows
|
|
449
|
+
of the same contradiction one at a time.
|
|
450
|
+
- **Knowledge** — facts grouped by predicate, with a separate References
|
|
451
|
+
section for auto-detected reference material.
|
|
452
|
+
|
|
453
|
+
For a markdown summary you can email or commit:
|
|
454
|
+
|
|
455
|
+
```bash
|
|
456
|
+
claude-memory digest --since 7
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
For a privacy-safe cross-project audit:
|
|
460
|
+
|
|
461
|
+
```bash
|
|
462
|
+
claude-memory census
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
See **[Dashboard guide →](dashboard.md)** for the full panel reference.
|
|
466
|
+
|
|
467
|
+
---
|
|
468
|
+
|
|
431
469
|
## Next Steps
|
|
432
470
|
|
|
433
|
-
- 📖 [Read the Getting Started Guide](GETTING_STARTED.md)
|
|
434
|
-
-
|
|
471
|
+
- 📖 [Read the Getting Started Guide](GETTING_STARTED.md)
|
|
472
|
+
- 📊 [Inspect with the Dashboard](dashboard.md)
|
|
473
|
+
- 🔧 [Set up the Claude Code Plugin](plugin.md)
|
|
435
474
|
- 🏗️ [Understand the Architecture](architecture.md)
|
|
436
475
|
- 📝 [Check the Changelog](../CHANGELOG.md)
|
data/docs/GETTING_STARTED.md
CHANGED
|
@@ -19,7 +19,7 @@ gem install claude_memory
|
|
|
19
19
|
Verify installation:
|
|
20
20
|
```bash
|
|
21
21
|
claude-memory --version
|
|
22
|
-
# => claude_memory 0.
|
|
22
|
+
# => claude_memory 0.10.0
|
|
23
23
|
```
|
|
24
24
|
|
|
25
25
|
### Step 2: Install the Plugin
|
|
@@ -283,13 +283,13 @@ ClaudeMemory Doctor Report
|
|
|
283
283
|
==========================
|
|
284
284
|
|
|
285
285
|
✓ Global database: ~/.claude/memory.sqlite3
|
|
286
|
-
- Schema version:
|
|
286
|
+
- Schema version: 17
|
|
287
287
|
- Facts: 12
|
|
288
288
|
- Entities: 8
|
|
289
289
|
- Status: Healthy
|
|
290
290
|
|
|
291
291
|
✓ Project database: .claude/memory.sqlite3
|
|
292
|
-
- Schema version:
|
|
292
|
+
- Schema version: 17
|
|
293
293
|
- Facts: 23
|
|
294
294
|
- Entities: 15
|
|
295
295
|
- Status: Healthy
|
|
@@ -314,6 +314,22 @@ ls -lh .claude/memory.sqlite3
|
|
|
314
314
|
# => -rw-r--r-- 1 user staff 64K Jan 26 10:35 .claude/memory.sqlite3
|
|
315
315
|
```
|
|
316
316
|
|
|
317
|
+
### Open the Dashboard (0.10.0+)
|
|
318
|
+
|
|
319
|
+
Once you have a few sessions worth of memory, the dashboard is the fastest
|
|
320
|
+
way to see what's actually in there:
|
|
321
|
+
|
|
322
|
+
```bash
|
|
323
|
+
claude-memory dashboard
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
Opens `http://localhost:3377` with a moments feed (every recall, context
|
|
327
|
+
injection, and extraction event), a Trust sidebar showing your global
|
|
328
|
+
"fingerprint" and 30-day utilization ratio, a deduped Conflicts panel, and a
|
|
329
|
+
Knowledge panel grouping facts by predicate.
|
|
330
|
+
|
|
331
|
+
See **[docs/dashboard.md](dashboard.md)** for the full panel guide.
|
|
332
|
+
|
|
317
333
|
### Test Memory Recall
|
|
318
334
|
|
|
319
335
|
Have a conversation with Claude to test:
|
|
@@ -560,7 +576,8 @@ sqlite3 .claude/memory.sqlite3 "SELECT * FROM facts LIMIT 5;"
|
|
|
560
576
|
Now that you're up and running:
|
|
561
577
|
|
|
562
578
|
- 📖 Read [Examples](EXAMPLES.md) for common use cases
|
|
563
|
-
-
|
|
579
|
+
- 📊 Open the [Dashboard](dashboard.md) for live inspection (0.10.0+)
|
|
580
|
+
- 🔧 Explore [Plugin Documentation](plugin.md) for advanced configuration
|
|
564
581
|
- 🏗️ Review [Architecture](architecture.md) for technical details
|
|
565
582
|
- 💬 Join [Discussions](https://github.com/codenamev/claude_memory/discussions) to share feedback
|
|
566
583
|
|
|
@@ -572,8 +589,18 @@ Now that you're up and running:
|
|
|
572
589
|
| `claude-memory doctor` | Check system health |
|
|
573
590
|
| `claude-memory recall <query>` | Search for facts |
|
|
574
591
|
| `claude-memory promote <fact_id>` | Make fact global |
|
|
592
|
+
| `claude-memory reject <id_or_docid>` | Mark a fact as rejected |
|
|
575
593
|
| `claude-memory changes` | Recent updates |
|
|
576
594
|
| `claude-memory conflicts` | Show contradictions |
|
|
595
|
+
| `claude-memory dashboard` | Open the local web UI (0.10.0+) |
|
|
596
|
+
| `claude-memory digest --since 7` | Markdown report of the last 7 days (0.10.0+) |
|
|
597
|
+
| `claude-memory stats --stale` | List facts not recalled recently (0.10.0+) |
|
|
598
|
+
| `claude-memory stats --tools` | MCP tool-call telemetry (0.9.0+) |
|
|
599
|
+
| `claude-memory census` | Privacy-safe predicate audit across projects (0.10.0+) |
|
|
600
|
+
| `claude-memory dedupe-conflicts --dry-run` | Preview historical conflict-row dedup (0.10.0+) |
|
|
601
|
+
| `claude-memory reclassify-references --dry-run` | Preview reference-material retag (0.10.0+) |
|
|
602
|
+
| `claude-memory compact` | VACUUM databases |
|
|
603
|
+
| `claude-memory export` | Dump facts to JSON |
|
|
577
604
|
| `/claude-memory:analyze` | Bootstrap project knowledge |
|
|
578
605
|
|
|
579
606
|
## Support
|
data/docs/architecture.md
CHANGED
|
@@ -9,7 +9,7 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
|
|
|
9
9
|
```
|
|
10
10
|
┌─────────────────────────────────────────────────────────────┐
|
|
11
11
|
│ Application Layer │
|
|
12
|
-
│ CLI (Router) → Commands (
|
|
12
|
+
│ CLI (Router) → Commands (32 classes) → Configuration │
|
|
13
13
|
└──────────────────────┬──────────────────────────────────────┘
|
|
14
14
|
│
|
|
15
15
|
┌──────────────────────▼──────────────────────────────────────┐
|
|
@@ -27,7 +27,7 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
|
|
|
27
27
|
│
|
|
28
28
|
┌──────────────────────▼──────────────────────────────────────┐
|
|
29
29
|
│ Infrastructure Layer │
|
|
30
|
-
│ Store (SQLite
|
|
30
|
+
│ Store (SQLite v17 + WAL) → FileSystem → Index (FTS5+Vector)│
|
|
31
31
|
│ Templates │
|
|
32
32
|
└─────────────────────────────────────────────────────────────┘
|
|
33
33
|
```
|
|
@@ -40,7 +40,7 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
|
|
|
40
40
|
|
|
41
41
|
**Components:**
|
|
42
42
|
- **CLI** (`cli.rb`): Thin router that dispatches to command classes
|
|
43
|
-
- **Commands** (`commands/`):
|
|
43
|
+
- **Commands** (`commands/`): 32 command classes, each handling one CLI command
|
|
44
44
|
- **Configuration** (`configuration.rb`): Centralized ENV access and path calculation
|
|
45
45
|
|
|
46
46
|
**Key Principles:**
|
|
@@ -179,7 +179,7 @@ end
|
|
|
179
179
|
**Components:**
|
|
180
180
|
|
|
181
181
|
#### Store (`store/`)
|
|
182
|
-
- **SQLiteStore**: Direct database access via Sequel (schema
|
|
182
|
+
- **SQLiteStore**: Direct database access via Sequel (schema v17)
|
|
183
183
|
- **StoreManager**: Manages dual databases (global + project)
|
|
184
184
|
- **Transaction safety**: Atomic multi-step operations
|
|
185
185
|
- **WAL mode**: Write-Ahead Logging for better concurrency
|
|
@@ -201,6 +201,21 @@ end
|
|
|
201
201
|
- Output style templates (`output-styles/memory-aware.md`)
|
|
202
202
|
- Setup and configuration scaffolding
|
|
203
203
|
|
|
204
|
+
#### Dashboard (`dashboard/`)
|
|
205
|
+
- **Server**: WEBrick HTTP server (default port 3377), starts via `claude-memory dashboard`
|
|
206
|
+
- **API**: HTTP-shape glue + per-endpoint formatting; routes/delegates to panel classes
|
|
207
|
+
- **Panels** (each backed by a dedicated class with focused responsibility):
|
|
208
|
+
- `Trust`: weekly moments, fingerprint, utilization, feedback ratio, needs-review
|
|
209
|
+
- `Moments`: feed-first activity stream with kind classification
|
|
210
|
+
- `Knowledge`: predicate-grouped fact summary (incl. References section)
|
|
211
|
+
- `Conflicts`: display-layer dedup with bulk-reject helper
|
|
212
|
+
- `Reuse`: most-used facts within window
|
|
213
|
+
- `Health`: db / hooks / vec checks with actionable fix strings
|
|
214
|
+
- `Timeline`: 30-day daily rollup
|
|
215
|
+
- `FactPresenter`, `ScopedFactResolver`: shared rendering / scope-aware ID resolution
|
|
216
|
+
- Connections released after every request — no held WAL writer locks across page loads
|
|
217
|
+
- See [docs/dashboard.md](dashboard.md) for the user-facing guide
|
|
218
|
+
|
|
204
219
|
**Key Principles:**
|
|
205
220
|
- Ports and Adapters: Clear interfaces for external systems
|
|
206
221
|
- Dependency Injection: Real vs. test implementations
|
|
@@ -346,10 +361,10 @@ FileSystem (write)
|
|
|
346
361
|
- Value objects (SessionId, TranscriptPath, FactId)
|
|
347
362
|
- Centralized Configuration
|
|
348
363
|
- 4 domain models with business logic
|
|
349
|
-
-
|
|
350
|
-
-
|
|
364
|
+
- 32 command classes
|
|
365
|
+
- 25 MCP tools
|
|
351
366
|
- Semantic search with local embeddings (FastEmbed + TF-IDF fallback)
|
|
352
|
-
- Schema
|
|
367
|
+
- Schema v17 with WAL mode
|
|
353
368
|
|
|
354
369
|
## Future Improvements
|
|
355
370
|
|
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
# Audit Queries
|
|
2
|
+
|
|
3
|
+
Pre-written SQL for validating that the ClaudeMemory plugin is being invoked when it should. Run via [cq](https://github.com/technicalpickles/cq) — install with `cargo install --git https://github.com/technicalpickles/cq`.
|
|
4
|
+
|
|
5
|
+
These query Claude Code's raw transcripts (in `~/.claude/projects/`), not ClaudeMemory's own SQLite databases. That's deliberate: cq sees *all* tool calls including ones that bypassed the MCP server entirely, which is exactly the angle needed to spot activation gaps.
|
|
6
|
+
|
|
7
|
+
For server-side telemetry (counts, latencies of MCP calls that *did* land), use `claude-memory stats --tools` against ClaudeMemory's `mcp_tool_calls` table instead.
|
|
8
|
+
|
|
9
|
+
## Query 1 — Memory plugin activation rate
|
|
10
|
+
|
|
11
|
+
How often is any `mcp__memory__*` tool being called, normalized by total sessions?
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
cq sql "
|
|
15
|
+
WITH session_window AS (
|
|
16
|
+
SELECT DISTINCT session_id FROM messages
|
|
17
|
+
),
|
|
18
|
+
memory_sessions AS (
|
|
19
|
+
SELECT DISTINCT session_id FROM tool_calls
|
|
20
|
+
WHERE name LIKE 'mcp__memory__%'
|
|
21
|
+
)
|
|
22
|
+
SELECT
|
|
23
|
+
(SELECT count(*) FROM session_window) AS total_sessions,
|
|
24
|
+
(SELECT count(*) FROM memory_sessions) AS sessions_with_memory_call,
|
|
25
|
+
ROUND(100.0 * (SELECT count(*) FROM memory_sessions)
|
|
26
|
+
/ NULLIF((SELECT count(*) FROM session_window), 0), 1) AS pct
|
|
27
|
+
" --since 30d --table
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
**Why it matters**: a low percentage doesn't mean the plugin is broken — many sessions don't need memory. It's a denominator for the next two queries.
|
|
31
|
+
|
|
32
|
+
## Query 2 — Sessions that asked memory-shaped questions but never called memory
|
|
33
|
+
|
|
34
|
+
The most useful query. Surfaces user prompts where memory *should* have been the obvious tool, but Claude went elsewhere (Read, Grep, Bash) instead.
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
cq sql "
|
|
38
|
+
WITH memory_sessions AS (
|
|
39
|
+
SELECT DISTINCT session_id FROM tool_calls
|
|
40
|
+
WHERE name LIKE 'mcp__memory__%'
|
|
41
|
+
)
|
|
42
|
+
SELECT
|
|
43
|
+
m.session_id,
|
|
44
|
+
m.timestamp,
|
|
45
|
+
left(m.text, 200) AS user_prompt
|
|
46
|
+
FROM messages m
|
|
47
|
+
LEFT JOIN memory_sessions ms ON m.session_id = ms.session_id
|
|
48
|
+
WHERE m.type = 'user'
|
|
49
|
+
AND ms.session_id IS NULL
|
|
50
|
+
AND (
|
|
51
|
+
m.text ILIKE '%why did we%'
|
|
52
|
+
OR m.text ILIKE '%what convention%'
|
|
53
|
+
OR m.text ILIKE '%how do we usually%'
|
|
54
|
+
OR m.text ILIKE '%what did we decide%'
|
|
55
|
+
OR m.text ILIKE '%architecture%'
|
|
56
|
+
OR m.text ILIKE '%what''s the pattern%'
|
|
57
|
+
)
|
|
58
|
+
ORDER BY m.timestamp DESC
|
|
59
|
+
" --since 30d --table --limit 30
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
**What to do with results**: each row is a candidate for either (a) a tightening of MCP server instructions / skill descriptions, or (b) confirmation that the question genuinely didn't need memory and the keyword filter is too loose.
|
|
63
|
+
|
|
64
|
+
## Query 3 — Which memory tools actually get called?
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
cq sql "
|
|
68
|
+
SELECT
|
|
69
|
+
name AS tool,
|
|
70
|
+
count(*) AS invocations,
|
|
71
|
+
count(DISTINCT session_id) AS sessions
|
|
72
|
+
FROM tool_calls
|
|
73
|
+
WHERE name LIKE 'mcp__memory__%'
|
|
74
|
+
GROUP BY name
|
|
75
|
+
ORDER BY invocations DESC
|
|
76
|
+
" --since 30d --table
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
**Expected shape**: `mcp__memory__recall`, `mcp__memory__conventions`, `mcp__memory__decisions` should dominate. Tools that never fire (`memory_fact_graph`, `memory_explain`, `memory_search_concepts`, `memory_facts_by_*`) might have description/triggering issues — same pattern as cq's "skill audit" use case.
|
|
80
|
+
|
|
81
|
+
## Query 4 — Error rate per memory tool
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
cq sql "
|
|
85
|
+
SELECT
|
|
86
|
+
tc.name AS tool,
|
|
87
|
+
count(*) AS calls,
|
|
88
|
+
sum(CASE WHEN tr.is_error THEN 1 ELSE 0 END) AS errors,
|
|
89
|
+
ROUND(100.0 * sum(CASE WHEN tr.is_error THEN 1 ELSE 0 END)
|
|
90
|
+
/ count(*), 1) AS pct_errors
|
|
91
|
+
FROM tool_calls tc
|
|
92
|
+
JOIN tool_results tr ON tc.tool_use_id = tr.tool_use_id
|
|
93
|
+
WHERE tc.name LIKE 'mcp__memory__%'
|
|
94
|
+
GROUP BY tc.name
|
|
95
|
+
ORDER BY errors DESC
|
|
96
|
+
" --since 30d --table
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
**Why it matters**: a memory tool returning errors is much worse than not firing — Claude sees the failure and learns to avoid that tool. Triage anything above ~5%.
|
|
100
|
+
|
|
101
|
+
## Query 5 — Result-size distribution (context budget hygiene)
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
cq sql "
|
|
105
|
+
SELECT
|
|
106
|
+
tc.name AS tool,
|
|
107
|
+
count(*) AS calls,
|
|
108
|
+
MIN(length(tr.content)) AS min_chars,
|
|
109
|
+
ROUND(AVG(length(tr.content))) AS avg_chars,
|
|
110
|
+
MAX(length(tr.content)) AS max_chars
|
|
111
|
+
FROM tool_calls tc
|
|
112
|
+
JOIN tool_results tr ON tc.tool_use_id = tr.tool_use_id
|
|
113
|
+
WHERE tc.name LIKE 'mcp__memory__%'
|
|
114
|
+
GROUP BY tc.name
|
|
115
|
+
ORDER BY avg_chars DESC
|
|
116
|
+
" --since 30d --table
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
**Why it matters**: ClaudeMemory exposes a `compact: true` option that drops receipts for ~60% smaller responses. If averages are large, either the compact flag isn't being passed by callers or the tools that don't accept it are dumping too much.
|
|
120
|
+
|
|
121
|
+
## When to re-run
|
|
122
|
+
|
|
123
|
+
- Before each release — does the new version improve activation rate or reduce errors?
|
|
124
|
+
- After meaningful changes to MCP server instructions / skill descriptions
|
|
125
|
+
- If a user reports "the memory plugin doesn't seem to do anything" — Query 2 will usually surface the gap concretely
|
|
126
|
+
|
|
127
|
+
## Related
|
|
128
|
+
|
|
129
|
+
- Source for the methodology: `docs/influence/cq.md`
|
|
130
|
+
- Server-side telemetry alternative: `claude-memory stats --tools --since 30`
|
|
131
|
+
- cq schema reference: `cq schema --examples`
|