claude_memory 0.9.1 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/memory.sqlite3 +0 -0
- data/.claude/skills/dashboard/SKILL.md +42 -0
- data/.claude-plugin/marketplace.json +1 -1
- data/.claude-plugin/plugin.json +1 -1
- data/CHANGELOG.md +130 -0
- data/CLAUDE.md +30 -6
- data/README.md +66 -2
- data/db/migrations/015_add_activity_events.rb +26 -0
- data/db/migrations/016_add_moment_feedback.rb +22 -0
- data/db/migrations/017_add_last_recalled_at.rb +15 -0
- data/docs/1_0_punchlist.md +371 -0
- data/docs/EXAMPLES.md +41 -2
- data/docs/GETTING_STARTED.md +33 -4
- data/docs/architecture.md +22 -7
- data/docs/audit-queries.md +131 -0
- data/docs/dashboard.md +192 -0
- data/docs/improvements.md +650 -9
- data/docs/influence/cq.md +187 -0
- data/docs/plugin.md +13 -6
- data/docs/quality_review.md +524 -172
- data/docs/reflection_memory_as_accumulating_judgment.md +67 -0
- data/lib/claude_memory/activity_log.rb +86 -0
- data/lib/claude_memory/commands/census_command.rb +210 -0
- data/lib/claude_memory/commands/completion_command.rb +3 -0
- data/lib/claude_memory/commands/dashboard_command.rb +54 -0
- data/lib/claude_memory/commands/dedupe_conflicts_command.rb +55 -0
- data/lib/claude_memory/commands/digest_command.rb +273 -0
- data/lib/claude_memory/commands/hook_command.rb +61 -2
- data/lib/claude_memory/commands/initializers/hooks_configurator.rb +7 -4
- data/lib/claude_memory/commands/reclassify_references_command.rb +56 -0
- data/lib/claude_memory/commands/registry.rb +7 -1
- data/lib/claude_memory/commands/show_command.rb +90 -0
- data/lib/claude_memory/commands/skills/distill-transcripts.md +13 -1
- data/lib/claude_memory/commands/stats_command.rb +131 -2
- data/lib/claude_memory/commands/sweep_command.rb +2 -0
- data/lib/claude_memory/configuration.rb +16 -0
- data/lib/claude_memory/core/relative_time.rb +9 -0
- data/lib/claude_memory/dashboard/api.rb +610 -0
- data/lib/claude_memory/dashboard/conflicts.rb +279 -0
- data/lib/claude_memory/dashboard/efficacy.rb +127 -0
- data/lib/claude_memory/dashboard/fact_presenter.rb +109 -0
- data/lib/claude_memory/dashboard/health.rb +175 -0
- data/lib/claude_memory/dashboard/index.html +2707 -0
- data/lib/claude_memory/dashboard/knowledge.rb +136 -0
- data/lib/claude_memory/dashboard/moments.rb +244 -0
- data/lib/claude_memory/dashboard/reuse.rb +97 -0
- data/lib/claude_memory/dashboard/scoped_fact_resolver.rb +95 -0
- data/lib/claude_memory/dashboard/server.rb +211 -0
- data/lib/claude_memory/dashboard/timeline.rb +68 -0
- data/lib/claude_memory/dashboard/trust.rb +454 -0
- data/lib/claude_memory/distill/bare_conclusion_detector.rb +71 -0
- data/lib/claude_memory/distill/reference_material_detector.rb +78 -0
- data/lib/claude_memory/hook/auto_memory_mirror.rb +112 -0
- data/lib/claude_memory/hook/context_injector.rb +97 -3
- data/lib/claude_memory/hook/handler.rb +191 -3
- data/lib/claude_memory/mcp/handlers/management_handlers.rb +8 -0
- data/lib/claude_memory/mcp/query_guide.rb +11 -0
- data/lib/claude_memory/mcp/text_summary.rb +29 -0
- data/lib/claude_memory/mcp/tool_definitions.rb +13 -0
- data/lib/claude_memory/mcp/tools.rb +148 -0
- data/lib/claude_memory/publish.rb +13 -21
- data/lib/claude_memory/recall/stale_detector.rb +67 -0
- data/lib/claude_memory/resolve/predicate_policy.rb +2 -0
- data/lib/claude_memory/resolve/resolver.rb +41 -11
- data/lib/claude_memory/store/llm_cache.rb +68 -0
- data/lib/claude_memory/store/metrics_aggregator.rb +96 -0
- data/lib/claude_memory/store/schema_manager.rb +1 -1
- data/lib/claude_memory/store/sqlite_store.rb +47 -143
- data/lib/claude_memory/store/store_manager.rb +29 -0
- data/lib/claude_memory/sweep/maintenance.rb +216 -0
- data/lib/claude_memory/sweep/recall_timestamp_refresher.rb +83 -0
- data/lib/claude_memory/sweep/sweeper.rb +2 -0
- data/lib/claude_memory/templates/hooks.example.json +5 -0
- data/lib/claude_memory/version.rb +1 -1
- data/lib/claude_memory.rb +24 -0
- metadata +51 -1
|
@@ -0,0 +1,371 @@
|
|
|
1
|
+
# 1.0 Punchlist
|
|
2
|
+
|
|
3
|
+
*Created: 2026-04-28. Restructured 2026-04-28 (post-0.10.0 release) around
|
|
4
|
+
milestone versions per the path-to-1.0 plan.*
|
|
5
|
+
|
|
6
|
+
The remaining work for a stable 1.0 release. Distinct from `improvements.md` —
|
|
7
|
+
that file tracks the long tail of inbound study/idea entries; this file tracks
|
|
8
|
+
**what blocks 1.0 confidence and which release each item ships in**.
|
|
9
|
+
|
|
10
|
+
Guiding question: *a skeptical Ruby developer should be able to look at one
|
|
11
|
+
screen and say "yes, this is helping, here's the evidence" without trusting our
|
|
12
|
+
marketing.* Today the dashboard tells that story in pieces but not as a
|
|
13
|
+
headline. Each item below closes a specific gap that prevents that headline
|
|
14
|
+
from existing.
|
|
15
|
+
|
|
16
|
+
## What 1.0 commits to
|
|
17
|
+
|
|
18
|
+
Not "feature complete" — semver commitment. Once we ship 1.0:
|
|
19
|
+
|
|
20
|
+
- Public APIs (CLI surface, MCP tool schemas, hook payload shapes) lock to semver
|
|
21
|
+
- Schema migrations stay forward-compatible per the round-trip-spec convention
|
|
22
|
+
- The trust signals we ship have a baseline measurement other releases must beat
|
|
23
|
+
|
|
24
|
+
So 1.0 isn't gated by features. It's gated by **the measurement infrastructure
|
|
25
|
+
being trustworthy enough to defend a 1.0 claim.** That's why this punchlist is
|
|
26
|
+
mostly observability, not capability.
|
|
27
|
+
|
|
28
|
+
Items are cross-linked to the canonical entry in `improvements.md` where the
|
|
29
|
+
implementation detail and acceptance criteria live. This file is the
|
|
30
|
+
prioritization view; that file is the work view.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## 0.10.x — patch as needed (now)
|
|
35
|
+
|
|
36
|
+
Reactive only. Real usage will surface issues; cut a patch when one shows up.
|
|
37
|
+
No proactive minor work here.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## 0.11.0 — "Trust & Cost" (~1 week of work)
|
|
42
|
+
|
|
43
|
+
Theme: *users can see what memory costs and whether it's helping.* Each item
|
|
44
|
+
adds a number a skeptical user can read.
|
|
45
|
+
|
|
46
|
+
### #1 Token budget telemetry — *what does memory cost?* ✅ landed 2026-04-29
|
|
47
|
+
|
|
48
|
+
**Gap.** `Core::TokenEstimator` exists and is unused outside one helper. We
|
|
49
|
+
have no idea what % of the SessionStart token budget memory consumes per
|
|
50
|
+
session, how it scales with DB size, or whether it's growing.
|
|
51
|
+
|
|
52
|
+
**Acceptance.** Trust panel + `claude-memory digest` show p50/p95 injected
|
|
53
|
+
tokens per session over the last 30 days. Per-session count rides on every
|
|
54
|
+
`hook_context` activity event so the data is queryable post-hoc.
|
|
55
|
+
|
|
56
|
+
**Why this release.** Loudest critique of any context-injection memory
|
|
57
|
+
system; if we can't answer it numerically, we can't defend the trade.
|
|
58
|
+
|
|
59
|
+
**Status.** Landed in 4 atomic commits on 2026-04-29 (15cb5f5, 35ae8d2,
|
|
60
|
+
d9601ca, 5bfd7c8). `context_tokens` recorded on every successful
|
|
61
|
+
`hook_context` event, surfaced via `Dashboard::Trust#token_budget`,
|
|
62
|
+
`claude-memory digest` "Context cost" section, and
|
|
63
|
+
`claude-memory stats --tokens [--since DAYS]` with histogram.
|
|
64
|
+
|
|
65
|
+
→ improvements.md entry: *#47 Token Budget Telemetry*. Effort: 4-6h.
|
|
66
|
+
|
|
67
|
+
### #2 Hallucination rate as a first-class trust metric ✅ landed 2026-04-29
|
|
68
|
+
|
|
69
|
+
**Gap.** `ReferenceMaterialDetector` already classifies suspect facts and we
|
|
70
|
+
know from the #34 audit that ~25% of facts had embedded reasoning (i.e.
|
|
71
|
+
~75% were bare conclusions at audit time). Neither signal is exposed on the
|
|
72
|
+
dashboard. We display clean numbers; we should display stained ones.
|
|
73
|
+
|
|
74
|
+
**Acceptance.** Trust panel surfaces a `quality_score` derived from
|
|
75
|
+
suspect-fact ratio + bare-conclusion ratio over active facts in both stores.
|
|
76
|
+
Digest includes a 30-day rejection rate ("how much of what we extracted got
|
|
77
|
+
rejected within a week?") so calibration drift is visible.
|
|
78
|
+
|
|
79
|
+
**Why this release.** Pollution rate matters as much as recall rate. Pairs
|
|
80
|
+
with #1 — together they answer the "is this still worth it?" question.
|
|
81
|
+
|
|
82
|
+
**Status.** Landed in 3 atomic commits on 2026-04-29 (27fa6af, 4d1c5bf,
|
|
83
|
+
0b72fa4). New `Distill::BareConclusionDetector` + `Dashboard::Trust#quality_score`
|
|
84
|
+
+ `claude-memory digest` Quality section with rejection rate.
|
|
85
|
+
|
|
86
|
+
→ improvements.md entry: *#48 Hallucination Rate Metric*. Effort: 1d.
|
|
87
|
+
|
|
88
|
+
### #5 `claude-memory show` — human-readable "what would be injected" ✅ landed 2026-04-29
|
|
89
|
+
|
|
90
|
+
**Gap.** Inspecting memory state today requires the dashboard or several CLI
|
|
91
|
+
commands (`recall`, `stats`, `census`). The CLAUDE.md alternative is
|
|
92
|
+
`cat CLAUDE.md` — instant, plain-English, no tool. We need the same one-line
|
|
93
|
+
inspect surface.
|
|
94
|
+
|
|
95
|
+
**Acceptance.** `claude-memory show` runs the same `Hook::ContextInjector`
|
|
96
|
+
path real sessions use, prints what would be injected next session in plain
|
|
97
|
+
English (not JSON), sized to fit a terminal, with predicate-grouped sections
|
|
98
|
+
matching the snapshot format.
|
|
99
|
+
|
|
100
|
+
**Why this release.** Trust requires inspectability. A user who can't see what
|
|
101
|
+
memory will inject can't develop confidence in it.
|
|
102
|
+
|
|
103
|
+
**Status.** Landed 2026-04-29 (commit 2586bb3). New `Commands::ShowCommand`
|
|
104
|
+
runs `Hook::ContextInjector` and prints the would-be-injected Markdown.
|
|
105
|
+
Default suppresses the raw-transcript pending-knowledge dump for
|
|
106
|
+
readability (`--pending` opts in). Footer reports fact count, token
|
|
107
|
+
estimate, char count.
|
|
108
|
+
|
|
109
|
+
→ improvements.md entry: *#51 claude-memory show*. Effort: ½d.
|
|
110
|
+
|
|
111
|
+
### #7 First-week ROI nudge — *moved up from post-1.0* ✅ landed 2026-04-30
|
|
112
|
+
|
|
113
|
+
**Gap.** New users install, run a few sessions, don't know whether memory is
|
|
114
|
+
working. The dashboard exists but they have to know to look.
|
|
115
|
+
|
|
116
|
+
**Acceptance.** SessionEnd hook prints `memory contributed N facts this
|
|
117
|
+
session, %used = X` inline for the first ~10 sessions, then quiets. Opt-out
|
|
118
|
+
via `CLAUDE_MEMORY_NO_NUDGE=1`.
|
|
119
|
+
|
|
120
|
+
**Why this release.** Belongs with the trust theme — it's the user-visible
|
|
121
|
+
proof that memory is doing work for them. Originally listed as post-1.0;
|
|
122
|
+
elevating because cold-start trust deserves to land before 1.0.
|
|
123
|
+
|
|
124
|
+
**Status.** Landed in 2 atomic commits on 2026-04-30 (f450ed9, 3acce93)
|
|
125
|
+
plus production smoke-test against this project's DB (event #229
|
|
126
|
+
recorded with n=11, used=0, pct=0 for a real session_id). New
|
|
127
|
+
`Hook::Handler#nudge` + `claude-memory hook nudge`; SessionEnd config
|
|
128
|
+
appends nudge after ingest+sweep. Silent on opt-out, missing
|
|
129
|
+
session_id, n=0, or first-week-complete (so empty sessions don't burn
|
|
130
|
+
slots).
|
|
131
|
+
|
|
132
|
+
→ improvements.md entry: *#53 First-Week ROI Nudge*. Effort: ½d.
|
|
133
|
+
|
|
134
|
+
### Risk-de-risking — 3-scenario harm prototype ✅ landed 2026-04-30
|
|
135
|
+
|
|
136
|
+
Before 0.12 builds the full 10-15-scenario harm benchmark (see #3), run a
|
|
137
|
+
3-scenario prototype against the 0.10.0 codebase to confirm whether harm is
|
|
138
|
+
actually low. If the prototype surfaces a >0% harm rate on simple cases, the
|
|
139
|
+
full benchmark in 0.12 will reveal a fundamental issue — better to know at
|
|
140
|
+
0.11 than discover at 0.12.
|
|
141
|
+
|
|
142
|
+
**Acceptance.** Three hand-written `harm_scenarios.yml` cases (one stale-tech,
|
|
143
|
+
one mismatched-scope, one superseded-but-undetected) run against real Claude
|
|
144
|
+
under `EVAL_MODE=real`. Reports go/no-go on the larger benchmark in 0.12.
|
|
145
|
+
|
|
146
|
+
**Status.** Landed 2026-04-30 (commit 35b368e). Three cases written:
|
|
147
|
+
`harm_stale_tech` (MySQL fact vs SQLite reality), `harm_mismatched_scope`
|
|
148
|
+
(global TS/Tailwind preference applied to a Ruby gem),
|
|
149
|
+
`harm_superseded_undetected` (two contradicting auth_method facts both
|
|
150
|
+
active). Structure validation passes in stub mode. Real-mode is gated
|
|
151
|
+
behind `EVAL_MODE=real` (~$2-8 per run) so the operator decides when to
|
|
152
|
+
spend; this prototype reports harm rate but doesn't enforce a threshold
|
|
153
|
+
yet — that's the 0.12 release-gate work.
|
|
154
|
+
|
|
155
|
+
→ improvements.md entry: *#49 Negative-Fact Harm Benchmark* (prototype phase).
|
|
156
|
+
Effort: ½d.
|
|
157
|
+
|
|
158
|
+
**Ship target:** ~2 weeks from 0.10.0 (mid-May 2026 at current velocity).
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## 0.12.0 — "Release Discipline" (~1 week of work)
|
|
163
|
+
|
|
164
|
+
Theme: *we can't ship a regression without noticing.* Internal infrastructure
|
|
165
|
+
that prevents future regressions. Not flashy but the actual prerequisite for
|
|
166
|
+
1.0's semver commitment.
|
|
167
|
+
|
|
168
|
+
### #3 Negative-fact harm benchmark (full 10-15 scenarios)
|
|
169
|
+
|
|
170
|
+
**Gap.** Every benchmark today measures whether memory **helps**. Nothing
|
|
171
|
+
measures whether memory **harms** — i.e. injects a wrong fact and Claude
|
|
172
|
+
follows it. Without this, "memory helps" is unfalsifiable.
|
|
173
|
+
|
|
174
|
+
**Acceptance.** `spec/benchmarks/dataset/harm_scenarios.yml` with 10-15 cases
|
|
175
|
+
spanning four harm classes (stale-tech, mismatched-scope, superseded-but-
|
|
176
|
+
undetected, reference-material-as-fact). Each scores `harm` if Claude follows
|
|
177
|
+
the wrong fact, `safe` otherwise. Wired into `bin/run-evals`. **>1% harm
|
|
178
|
+
rate blocks release** (configurable via `HARM_RATE_THRESHOLD`).
|
|
179
|
+
|
|
180
|
+
**Why this release.** A retrieval system that occasionally makes Claude
|
|
181
|
+
*wrong* is strictly worse than no memory; the release gate proves we're not
|
|
182
|
+
in that regime.
|
|
183
|
+
|
|
184
|
+
→ improvements.md entry: *#49 Negative-Fact Harm Benchmark* (full corpus).
|
|
185
|
+
Effort: 2d.
|
|
186
|
+
|
|
187
|
+
### #4 Publish the CLAUDE.md baseline in headline E2E results
|
|
188
|
+
|
|
189
|
+
**Gap.** `claude_md_adapter` exists in `spec/benchmarks/comparative/adapters/`
|
|
190
|
+
and is wired into `comparative_helper.rb`. The README's headline comparative
|
|
191
|
+
table doesn't include it. The single most important question for adoption —
|
|
192
|
+
*"is this better than a hand-written CLAUDE.md?"* — is unanswered in our
|
|
193
|
+
published numbers.
|
|
194
|
+
|
|
195
|
+
**Acceptance.** Comparative E2E report includes `CLAUDE.md baseline` row in
|
|
196
|
+
`spec/benchmarks/README.md` and in `bin/run-evals --comparative` summary.
|
|
197
|
+
README explicitly states the win/loss versus the static baseline.
|
|
198
|
+
|
|
199
|
+
**Why this release.** Cheapest item on the list — adapter built, just
|
|
200
|
+
surface the number. Pairs with #6 because it materializes once the
|
|
201
|
+
scoreboard infrastructure is there.
|
|
202
|
+
|
|
203
|
+
→ improvements.md entry: *#50 CLAUDE.md Baseline in Headline Results*.
|
|
204
|
+
Effort: 30min code + one $2-8 real-mode run.
|
|
205
|
+
|
|
206
|
+
### #6 Release-to-release benchmark scoreboard
|
|
207
|
+
|
|
208
|
+
**Gap.** Benchmark output is textual today. Nothing diff-able across versions.
|
|
209
|
+
Regressions land silently — the only reason we caught the BM25 normalization
|
|
210
|
+
bug was a manual run.
|
|
211
|
+
|
|
212
|
+
**Acceptance.** Each `bin/run-evals` run writes
|
|
213
|
+
`spec/benchmarks/results/<version>.json`. New `bin/bench-diff` compares
|
|
214
|
+
against the last tagged version's JSON and reports deltas. `/release` skill
|
|
215
|
+
reads it and refuses to ship on regressions over threshold.
|
|
216
|
+
|
|
217
|
+
**Why this release.** The semver commitment in 1.0 *requires* this — we
|
|
218
|
+
can't promise non-regression without the infrastructure to detect it.
|
|
219
|
+
|
|
220
|
+
→ improvements.md entry: *#52 Benchmark Scoreboard Diff*. Effort: 1d.
|
|
221
|
+
|
|
222
|
+
**Ship target:** ~4 weeks from 0.10.0 (end of May 2026).
|
|
223
|
+
|
|
224
|
+
---
|
|
225
|
+
|
|
226
|
+
## 0.12.x → 1.0 — soak period (2-3 weeks)
|
|
227
|
+
|
|
228
|
+
Critical phase. Run 0.12 against real usage. Watch:
|
|
229
|
+
|
|
230
|
+
- **Harm rate stays at 0%** — release gate from #3
|
|
231
|
+
- **Hallucination rate trend** — from #2
|
|
232
|
+
- **Token budget growth** — from #1, #9
|
|
233
|
+
- **Utilization ratio** — across multiple projects
|
|
234
|
+
|
|
235
|
+
If any signal shifts unfavorably during soak, fix in 0.12.x. **Don't ship 1.0
|
|
236
|
+
from a release that hasn't observed itself for ≥2 weeks.**
|
|
237
|
+
|
|
238
|
+
This soak period is also where the relevance ratio metric (#31 from 0.10.0)
|
|
239
|
+
materializes its first real-mode measurement, and where the 0.11 trust
|
|
240
|
+
signals get a chance to be real numbers vs. theory.
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## 1.0.0 — "Stable Memory"
|
|
245
|
+
|
|
246
|
+
Theme: *ready for daily use, ready to recommend.*
|
|
247
|
+
|
|
248
|
+
### Post-1.0-punchlist polish (if landed during soak)
|
|
249
|
+
|
|
250
|
+
These were originally post-1.0 in the punchlist; if soak time permits, they
|
|
251
|
+
land in 1.0. Otherwise they ship in 1.1.
|
|
252
|
+
|
|
253
|
+
### #8 Real-session repeat-correction detection
|
|
254
|
+
|
|
255
|
+
The repeat-correction benchmark (#32 from 0.10.0) is synthetic; production
|
|
256
|
+
has no equivalent signal. Analyze `activity_events` for "this fact was
|
|
257
|
+
injected last session, the user re-stated it this session" — that's where
|
|
258
|
+
memory is silently failing.
|
|
259
|
+
|
|
260
|
+
→ improvements.md entry: *#54 Real-Session Repeat-Correction Detection*.
|
|
261
|
+
Effort: 2d.
|
|
262
|
+
|
|
263
|
+
### #9 Token-cost growth tracking
|
|
264
|
+
|
|
265
|
+
Builds on #1. Weekly digest reports "context cost grew X% over 30d" as an
|
|
266
|
+
anomaly signal that the DB is bloating or context injection is going wide.
|
|
267
|
+
|
|
268
|
+
→ improvements.md entry: *#55 Token-Cost Growth Tracking*. Effort: 3h after
|
|
269
|
+
#1 lands.
|
|
270
|
+
|
|
271
|
+
### #10 Drift dashboard
|
|
272
|
+
|
|
273
|
+
Snapshot `census` weekly, surface predicate distribution shifts on the
|
|
274
|
+
dashboard. Answers "is my fact base going off?" without a manual audit.
|
|
275
|
+
|
|
276
|
+
→ improvements.md entry: *#56 Drift Dashboard*. Effort: 1.5d.
|
|
277
|
+
|
|
278
|
+
### #11 API stability audit (NEW — added 2026-04-28)
|
|
279
|
+
|
|
280
|
+
**Gap.** "1.0 commits to semver" is meaningless without an explicit
|
|
281
|
+
public/internal split. Many of the surfaces touched in 0.9.0 / 0.10.0
|
|
282
|
+
(MCP tool schemas, hook payload shapes, CLI flags, dashboard endpoints)
|
|
283
|
+
have evolved organically and aren't formally documented as stable vs.
|
|
284
|
+
internal.
|
|
285
|
+
|
|
286
|
+
**Acceptance.**
|
|
287
|
+
|
|
288
|
+
- New `docs/api_stability.md` enumerating:
|
|
289
|
+
- **Public CLI**: every `claude-memory <subcommand>` and its flags, with stability tier
|
|
290
|
+
- **Public MCP tools**: every tool's schema, return shape, and tool-annotation hints
|
|
291
|
+
- **Public hook contract**: payload fields, return shapes, exit codes
|
|
292
|
+
- **Public Ruby API**: which classes/modules under `lib/claude_memory/` are external-facing (`Recall`, `Configuration`, `Store::StoreManager`?) vs. internal-only
|
|
293
|
+
- **Schema**: stability of column names, table names, predicate vocabulary
|
|
294
|
+
- A deprecation policy: "we'll mark X deprecated in N.x.0 and remove no earlier than (N+1).0.0"
|
|
295
|
+
- README + CLAUDE.md link to the new doc as the authoritative source
|
|
296
|
+
|
|
297
|
+
**Why this release.** Without this, the 1.0 semver promise is vibes, not a
|
|
298
|
+
contract. Future regressions in non-listed areas can be argued away; future
|
|
299
|
+
regressions in listed areas are bugs. Forces us to be honest about what
|
|
300
|
+
we're committing to.
|
|
301
|
+
|
|
302
|
+
→ improvements.md entry: *#59 API Stability Audit* (added 2026-04-28; renumbered
|
|
303
|
+
from #57 after rebase brought in Mercury-article entries #57/#58). Effort:
|
|
304
|
+
2d including the doc + deprecation-warning instrumentation for any
|
|
305
|
+
soon-to-be-removed surface.
|
|
306
|
+
|
|
307
|
+
### Release framing
|
|
308
|
+
|
|
309
|
+
README + CHANGELOG framing for 1.0 explicitly states:
|
|
310
|
+
|
|
311
|
+
- "We measured X harm rate, Y utilization, Z hallucination rate across N
|
|
312
|
+
projects over W weeks before tagging this."
|
|
313
|
+
- The public API surface is documented at `docs/api_stability.md`
|
|
314
|
+
- Deprecation policy explicit
|
|
315
|
+
|
|
316
|
+
**Ship target:** 6-8 weeks from 0.10.0 (mid-June 2026 at current velocity).
|
|
317
|
+
|
|
318
|
+
---
|
|
319
|
+
|
|
320
|
+
## Defer / skip for 1.0
|
|
321
|
+
|
|
322
|
+
- **#44 Universal search box** — cosmetic given the gaps above. Knowledge tab
|
|
323
|
+
drawers cover the primary need.
|
|
324
|
+
- **#45 Live SSE/WebSocket feed** — polling is adequate; dashboard polish, not
|
|
325
|
+
a confidence gap.
|
|
326
|
+
- **#23 REST API endpoint** — MCP covers primary use case; defer to 1.x.
|
|
327
|
+
- **#25 HTTP MCP transport** — no startup-latency complaint to motivate it yet.
|
|
328
|
+
|
|
329
|
+
---
|
|
330
|
+
|
|
331
|
+
## Risk to flag now
|
|
332
|
+
|
|
333
|
+
The biggest hidden risk in this plan is **the harm benchmark (#3) finds
|
|
334
|
+
something.** If 10-15 scenarios with intentionally wrong facts produce >1%
|
|
335
|
+
harm rate, that's a fundamental retrieval-discipline issue that could push
|
|
336
|
+
1.0 by months. The 3-scenario prototype in 0.11 (above) is specifically
|
|
337
|
+
designed to surface this risk earlier.
|
|
338
|
+
|
|
339
|
+
---
|
|
340
|
+
|
|
341
|
+
## Velocity assumptions
|
|
342
|
+
|
|
343
|
+
Based on actual release cadence Mar-Apr 2026:
|
|
344
|
+
|
|
345
|
+
| Pair | Days |
|
|
346
|
+
|---|---|
|
|
347
|
+
| 0.7.0 → 0.7.1 | minor patch, days |
|
|
348
|
+
| 0.7.1 → 0.8.0 | 17 |
|
|
349
|
+
| 0.8.0 → 0.9.0 | 17 |
|
|
350
|
+
| 0.9.0 → 0.9.1 | same day (patch) |
|
|
351
|
+
| 0.9.1 → 0.10.0 | 12 |
|
|
352
|
+
|
|
353
|
+
Average ~2 weeks per minor with substantial work landing each cycle.
|
|
354
|
+
|
|
355
|
+
| Milestone | Estimated work | Calendar target |
|
|
356
|
+
|---|---|---|
|
|
357
|
+
| 0.10.x patches | reactive | as-needed |
|
|
358
|
+
| 0.11.0 | ~1 week | ~2026-05-12 |
|
|
359
|
+
| 0.12.0 | ~1 week | ~2026-05-26 |
|
|
360
|
+
| Soak | 2-3 weeks | through ~2026-06-16 |
|
|
361
|
+
| 1.0.0 | 1-2 days release prep + #11 | ~2026-06-16 to 2026-06-23 |
|
|
362
|
+
|
|
363
|
+
These are calendar estimates assuming roughly the same focus level as the
|
|
364
|
+
0.10.0 cycle. Real cadence will adjust based on what surfaces during soak.
|
|
365
|
+
|
|
366
|
+
---
|
|
367
|
+
|
|
368
|
+
*Last updated: 2026-04-28 (post-0.10.0). Restructured around milestone
|
|
369
|
+
versions per the path-to-1.0 plan. #7 moved up from post-1.0 to 0.11; #11
|
|
370
|
+
API stability audit added as a new 1.0 must-have; 3-scenario harm prototype
|
|
371
|
+
added to 0.11 as risk-de-risking work for the full 0.12 benchmark.*
|
data/docs/EXAMPLES.md
CHANGED
|
@@ -428,9 +428,48 @@ Claude: "You're using Context API for state management. You previously used Redu
|
|
|
428
428
|
|
|
429
429
|
---
|
|
430
430
|
|
|
431
|
+
## Inspecting What Memory Knows (0.10.0+)
|
|
432
|
+
|
|
433
|
+
When you want to see what's actually in memory — what's been extracted, which
|
|
434
|
+
facts Claude has been reaching for, what's stale, what's contradicting — open
|
|
435
|
+
the dashboard:
|
|
436
|
+
|
|
437
|
+
```bash
|
|
438
|
+
claude-memory dashboard
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
Default port `http://localhost:3377`. Surfaces:
|
|
442
|
+
|
|
443
|
+
- A **moments feed** — every recall, context injection, extraction event with
|
|
444
|
+
the facts they touched. Click any moment for the full payload.
|
|
445
|
+
- A **Trust sidebar** — week-over-week activity, your global "fingerprint",
|
|
446
|
+
utilization ratio (% of recently extracted facts Claude actually used), and
|
|
447
|
+
your 👍/👎 feedback ratio.
|
|
448
|
+
- **Conflicts** with display-layer dedup so you don't have to triage 11 rows
|
|
449
|
+
of the same contradiction one at a time.
|
|
450
|
+
- **Knowledge** — facts grouped by predicate, with a separate References
|
|
451
|
+
section for auto-detected reference material.
|
|
452
|
+
|
|
453
|
+
For a markdown summary you can email or commit:
|
|
454
|
+
|
|
455
|
+
```bash
|
|
456
|
+
claude-memory digest --since 7
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
For a privacy-safe cross-project audit:
|
|
460
|
+
|
|
461
|
+
```bash
|
|
462
|
+
claude-memory census
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
See **[Dashboard guide →](dashboard.md)** for the full panel reference.
|
|
466
|
+
|
|
467
|
+
---
|
|
468
|
+
|
|
431
469
|
## Next Steps
|
|
432
470
|
|
|
433
|
-
- 📖 [Read the Getting Started Guide](GETTING_STARTED.md)
|
|
434
|
-
-
|
|
471
|
+
- 📖 [Read the Getting Started Guide](GETTING_STARTED.md)
|
|
472
|
+
- 📊 [Inspect with the Dashboard](dashboard.md)
|
|
473
|
+
- 🔧 [Set up the Claude Code Plugin](plugin.md)
|
|
435
474
|
- 🏗️ [Understand the Architecture](architecture.md)
|
|
436
475
|
- 📝 [Check the Changelog](../CHANGELOG.md)
|
data/docs/GETTING_STARTED.md
CHANGED
|
@@ -19,7 +19,7 @@ gem install claude_memory
|
|
|
19
19
|
Verify installation:
|
|
20
20
|
```bash
|
|
21
21
|
claude-memory --version
|
|
22
|
-
# => claude_memory 0.
|
|
22
|
+
# => claude_memory 0.10.0
|
|
23
23
|
```
|
|
24
24
|
|
|
25
25
|
### Step 2: Install the Plugin
|
|
@@ -283,13 +283,13 @@ ClaudeMemory Doctor Report
|
|
|
283
283
|
==========================
|
|
284
284
|
|
|
285
285
|
✓ Global database: ~/.claude/memory.sqlite3
|
|
286
|
-
- Schema version:
|
|
286
|
+
- Schema version: 17
|
|
287
287
|
- Facts: 12
|
|
288
288
|
- Entities: 8
|
|
289
289
|
- Status: Healthy
|
|
290
290
|
|
|
291
291
|
✓ Project database: .claude/memory.sqlite3
|
|
292
|
-
- Schema version:
|
|
292
|
+
- Schema version: 17
|
|
293
293
|
- Facts: 23
|
|
294
294
|
- Entities: 15
|
|
295
295
|
- Status: Healthy
|
|
@@ -314,6 +314,22 @@ ls -lh .claude/memory.sqlite3
|
|
|
314
314
|
# => -rw-r--r-- 1 user staff 64K Jan 26 10:35 .claude/memory.sqlite3
|
|
315
315
|
```
|
|
316
316
|
|
|
317
|
+
### Open the Dashboard (0.10.0+)
|
|
318
|
+
|
|
319
|
+
Once you have a few sessions worth of memory, the dashboard is the fastest
|
|
320
|
+
way to see what's actually in there:
|
|
321
|
+
|
|
322
|
+
```bash
|
|
323
|
+
claude-memory dashboard
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
Opens `http://localhost:3377` with a moments feed (every recall, context
|
|
327
|
+
injection, and extraction event), a Trust sidebar showing your global
|
|
328
|
+
"fingerprint" and 30-day utilization ratio, a deduped Conflicts panel, and a
|
|
329
|
+
Knowledge panel grouping facts by predicate.
|
|
330
|
+
|
|
331
|
+
See **[docs/dashboard.md](dashboard.md)** for the full panel guide.
|
|
332
|
+
|
|
317
333
|
### Test Memory Recall
|
|
318
334
|
|
|
319
335
|
Have a conversation with Claude to test:
|
|
@@ -560,7 +576,8 @@ sqlite3 .claude/memory.sqlite3 "SELECT * FROM facts LIMIT 5;"
|
|
|
560
576
|
Now that you're up and running:
|
|
561
577
|
|
|
562
578
|
- 📖 Read [Examples](EXAMPLES.md) for common use cases
|
|
563
|
-
-
|
|
579
|
+
- 📊 Open the [Dashboard](dashboard.md) for live inspection (0.10.0+)
|
|
580
|
+
- 🔧 Explore [Plugin Documentation](plugin.md) for advanced configuration
|
|
564
581
|
- 🏗️ Review [Architecture](architecture.md) for technical details
|
|
565
582
|
- 💬 Join [Discussions](https://github.com/codenamev/claude_memory/discussions) to share feedback
|
|
566
583
|
|
|
@@ -572,8 +589,20 @@ Now that you're up and running:
|
|
|
572
589
|
| `claude-memory doctor` | Check system health |
|
|
573
590
|
| `claude-memory recall <query>` | Search for facts |
|
|
574
591
|
| `claude-memory promote <fact_id>` | Make fact global |
|
|
592
|
+
| `claude-memory reject <id_or_docid>` | Mark a fact as rejected |
|
|
575
593
|
| `claude-memory changes` | Recent updates |
|
|
576
594
|
| `claude-memory conflicts` | Show contradictions |
|
|
595
|
+
| `claude-memory dashboard` | Open the local web UI (0.10.0+) |
|
|
596
|
+
| `claude-memory digest --since 7` | Markdown report of the last 7 days (0.10.0+; gains Context cost + Quality sections in 0.11.0) |
|
|
597
|
+
| `claude-memory show [--pending] [--source]` | Print what memory would inject at next SessionStart (0.11.0+) |
|
|
598
|
+
| `claude-memory stats --stale` | List facts not recalled recently (0.10.0+) |
|
|
599
|
+
| `claude-memory stats --tokens [--since DAYS]` | SessionStart context-token budget histogram (0.11.0+) |
|
|
600
|
+
| `claude-memory stats --tools` | MCP tool-call telemetry (0.9.0+) |
|
|
601
|
+
| `claude-memory census` | Privacy-safe predicate audit across projects (0.10.0+) |
|
|
602
|
+
| `claude-memory dedupe-conflicts --dry-run` | Preview historical conflict-row dedup (0.10.0+) |
|
|
603
|
+
| `claude-memory reclassify-references --dry-run` | Preview reference-material retag (0.10.0+) |
|
|
604
|
+
| `claude-memory compact` | VACUUM databases |
|
|
605
|
+
| `claude-memory export` | Dump facts to JSON |
|
|
577
606
|
| `/claude-memory:analyze` | Bootstrap project knowledge |
|
|
578
607
|
|
|
579
608
|
## Support
|
data/docs/architecture.md
CHANGED
|
@@ -9,7 +9,7 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
|
|
|
9
9
|
```
|
|
10
10
|
┌─────────────────────────────────────────────────────────────┐
|
|
11
11
|
│ Application Layer │
|
|
12
|
-
│ CLI (Router) → Commands (
|
|
12
|
+
│ CLI (Router) → Commands (32 classes) → Configuration │
|
|
13
13
|
└──────────────────────┬──────────────────────────────────────┘
|
|
14
14
|
│
|
|
15
15
|
┌──────────────────────▼──────────────────────────────────────┐
|
|
@@ -27,7 +27,7 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
|
|
|
27
27
|
│
|
|
28
28
|
┌──────────────────────▼──────────────────────────────────────┐
|
|
29
29
|
│ Infrastructure Layer │
|
|
30
|
-
│ Store (SQLite
|
|
30
|
+
│ Store (SQLite v17 + WAL) → FileSystem → Index (FTS5+Vector)│
|
|
31
31
|
│ Templates │
|
|
32
32
|
└─────────────────────────────────────────────────────────────┘
|
|
33
33
|
```
|
|
@@ -40,7 +40,7 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
|
|
|
40
40
|
|
|
41
41
|
**Components:**
|
|
42
42
|
- **CLI** (`cli.rb`): Thin router that dispatches to command classes
|
|
43
|
-
- **Commands** (`commands/`):
|
|
43
|
+
- **Commands** (`commands/`): 34 command classes, each handling one CLI command
|
|
44
44
|
- **Configuration** (`configuration.rb`): Centralized ENV access and path calculation
|
|
45
45
|
|
|
46
46
|
**Key Principles:**
|
|
@@ -179,7 +179,7 @@ end
|
|
|
179
179
|
**Components:**
|
|
180
180
|
|
|
181
181
|
#### Store (`store/`)
|
|
182
|
-
- **SQLiteStore**: Direct database access via Sequel (schema
|
|
182
|
+
- **SQLiteStore**: Direct database access via Sequel (schema v17)
|
|
183
183
|
- **StoreManager**: Manages dual databases (global + project)
|
|
184
184
|
- **Transaction safety**: Atomic multi-step operations
|
|
185
185
|
- **WAL mode**: Write-Ahead Logging for better concurrency
|
|
@@ -201,6 +201,21 @@ end
|
|
|
201
201
|
- Output style templates (`output-styles/memory-aware.md`)
|
|
202
202
|
- Setup and configuration scaffolding
|
|
203
203
|
|
|
204
|
+
#### Dashboard (`dashboard/`)
|
|
205
|
+
- **Server**: WEBrick HTTP server (default port 3377), starts via `claude-memory dashboard`
|
|
206
|
+
- **API**: HTTP-shape glue + per-endpoint formatting; routes/delegates to panel classes
|
|
207
|
+
- **Panels** (each backed by a dedicated class with focused responsibility):
|
|
208
|
+
- `Trust`: weekly moments, fingerprint, utilization, feedback ratio, needs-review, **token_budget** (p50/p95/avg over 30d, 0.11.0+), **quality_score** (live 30-day window + historical baseline, 0.11.0+)
|
|
209
|
+
- `Moments`: feed-first activity stream with kind classification
|
|
210
|
+
- `Knowledge`: predicate-grouped fact summary (incl. References section)
|
|
211
|
+
- `Conflicts`: display-layer dedup with bulk-reject helper
|
|
212
|
+
- `Reuse`: most-used facts within window
|
|
213
|
+
- `Health`: db / hooks / vec checks with actionable fix strings
|
|
214
|
+
- `Timeline`: 30-day daily rollup
|
|
215
|
+
- `FactPresenter`, `ScopedFactResolver`: shared rendering / scope-aware ID resolution
|
|
216
|
+
- Connections released after every request — no held WAL writer locks across page loads
|
|
217
|
+
- See [docs/dashboard.md](dashboard.md) for the user-facing guide
|
|
218
|
+
|
|
204
219
|
**Key Principles:**
|
|
205
220
|
- Ports and Adapters: Clear interfaces for external systems
|
|
206
221
|
- Dependency Injection: Real vs. test implementations
|
|
@@ -346,10 +361,10 @@ FileSystem (write)
|
|
|
346
361
|
- Value objects (SessionId, TranscriptPath, FactId)
|
|
347
362
|
- Centralized Configuration
|
|
348
363
|
- 4 domain models with business logic
|
|
349
|
-
-
|
|
350
|
-
-
|
|
364
|
+
- 34 command classes
|
|
365
|
+
- 25 MCP tools
|
|
351
366
|
- Semantic search with local embeddings (FastEmbed + TF-IDF fallback)
|
|
352
|
-
- Schema
|
|
367
|
+
- Schema v17 with WAL mode
|
|
353
368
|
|
|
354
369
|
## Future Improvements
|
|
355
370
|
|