claude_memory 0.10.0 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/memory.sqlite3 +0 -0
- data/.claude-plugin/marketplace.json +1 -1
- data/.claude-plugin/plugin.json +1 -1
- data/CHANGELOG.md +44 -0
- data/CLAUDE.md +11 -3
- data/README.md +35 -1
- data/docs/1_0_punchlist.md +269 -88
- data/docs/GETTING_STARTED.md +3 -1
- data/docs/architecture.md +3 -3
- data/docs/dashboard.md +23 -3
- data/docs/improvements.md +190 -5
- data/docs/quality_review.md +35 -0
- data/lib/claude_memory/commands/digest_command.rb +95 -3
- data/lib/claude_memory/commands/hook_command.rb +27 -2
- data/lib/claude_memory/commands/initializers/hooks_configurator.rb +7 -4
- data/lib/claude_memory/commands/registry.rb +2 -1
- data/lib/claude_memory/commands/show_command.rb +90 -0
- data/lib/claude_memory/commands/stats_command.rb +94 -2
- data/lib/claude_memory/dashboard/trust.rb +180 -11
- data/lib/claude_memory/distill/bare_conclusion_detector.rb +71 -0
- data/lib/claude_memory/hook/handler.rb +142 -1
- data/lib/claude_memory/templates/hooks.example.json +5 -0
- data/lib/claude_memory/version.rb +1 -1
- data/lib/claude_memory.rb +2 -0
- metadata +3 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: c2164011e2c50c7fdb0bcad468a25814f372384c3a49fa4c9414313ab3975e00
|
|
4
|
+
data.tar.gz: 3e2843979d9b9e0d4a21bfa3650f6cd6843ce18d2a95af884e303572259bca62
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 6c074b607c1e4f13743de36bb2074495d0ad24d0c826b62b49e0e827f311e3424bc881f42236db22a92dd2d5281e6ef13ca450966b1d9438ac1b36ceaa3ab2ce
|
|
7
|
+
data.tar.gz: 4e06c8fed9c323974ee4d7e5b41386ee4682ba5ba88b67797ac6864bbdf03663e5b74cb33497a371a8263eff4c24d405708a80e4862c4f578b221286bf40b236
|
data/.claude/memory.sqlite3
CHANGED
|
Binary file
|
|
@@ -7,7 +7,7 @@
|
|
|
7
7
|
"plugins": [
|
|
8
8
|
{
|
|
9
9
|
"name": "claude-memory",
|
|
10
|
-
"version": "0.
|
|
10
|
+
"version": "0.11.0",
|
|
11
11
|
"source": "./",
|
|
12
12
|
"description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
|
|
13
13
|
"repository": "https://github.com/codenamev/claude_memory"
|
data/.claude-plugin/plugin.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "claude-memory",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.11.0",
|
|
4
4
|
"description": "Long-term memory for Claude Code. Recalls architecture, conventions, and decisions across sessions — so Claude explains your codebase without file traversal, follows your patterns, and never re-asks what it already learned.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Valentino Stoll",
|
data/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,50 @@ All notable changes to this project will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
## [Unreleased]
|
|
6
6
|
|
|
7
|
+
## [0.11.0] - 2026-04-30
|
|
8
|
+
|
|
9
|
+
Theme: **Trust & Cost** — five user-visible signals that answer "is memory still worth it?" with numbers a skeptical user can read in <30 seconds.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- **Token budget telemetry** — every successful SessionStart context injection now records an estimated `context_tokens` count on its `activity_events` row. Surfaced three ways:
|
|
14
|
+
- Dashboard Trust panel emits a `token_budget` block with p50/p95/avg/sample_size over the last 30 days, so the JSON dashboard endpoint and any downstream consumer answer "what does memory cost per session?"
|
|
15
|
+
- `claude-memory digest` includes a "Context cost" subsection between activity and new-knowledge so the weekly report shows the price tag next to the value.
|
|
16
|
+
- `claude-memory stats --tokens [--since DAYS]` reports total sessions, p50/p95/avg/min/max, and a histogram across <500 / 500-1k / 1-2k / 2-5k / 5k+ buckets.
|
|
17
|
+
- Pure additive — no schema migration. Historical events written before this release simply contribute zero samples until new injections accumulate.
|
|
18
|
+
- First 0.11.0 milestone item from the 1.0 punchlist (Trust & Cost). Closes the "what % of my SessionStart token budget does memory consume?" gap.
|
|
19
|
+
- **Hallucination rate metric** — the dashboard now quantifies how clean the fact base is, not just how full it is. `Distill::BareConclusionDetector` is the production-side mirror of the SessionStart prompt's reason-clause requirement (decision/convention facts must embed "because…" / "so that…" / "to avoid…"). Surfaced two ways:
|
|
20
|
+
- Dashboard Trust panel emits a `quality_score` block aggregating across project + global active facts: `suspect_count` (predicate=reference, retagged by ReferenceMaterialDetector), `bare_conclusion_count`, percentages, and an overall 0–100 score (higher = cleaner). Returns 100 on empty stores so fresh installs aren't penalized.
|
|
21
|
+
- `claude-memory digest` includes a "Quality" section showing the score breakdown plus the in-window rejection rate ("of facts created in the last 7 days, X% have been rejected since"), so calibration drift is visible.
|
|
22
|
+
- Second 0.11.0 milestone item. Pairs with token-budget telemetry to answer "is memory still worth its cost?" via two skeptic-friendly numbers.
|
|
23
|
+
- **`claude-memory show`** — new CLI command prints what memory would inject at the next SessionStart in plain Markdown. Runs the exact `Hook::ContextInjector` path real sessions use, so output matches what Claude actually receives. Footer reports fact count, ~token estimate, and char count so users see the SessionStart cost at a glance.
|
|
24
|
+
- Default suppresses the raw-transcript "Pending Knowledge Extraction" dump (intended for LLM distillation, not human reading); pass `--pending` to include it.
|
|
25
|
+
- `--source SOURCE` (startup/resume/clear) simulates each fresh-session entrypoint so users can preview which sections would appear.
|
|
26
|
+
- Third 0.11.0 milestone item. Closes the inspectability gap — trust requires being able to see what memory will inject, the same way `cat CLAUDE.md` works.
|
|
27
|
+
- **First-week ROI nudge** — at SessionEnd, memory now prints `memory contributed N facts this session, %used = X` for the first 10 sessions, then quiets. New users get user-visible proof memory is doing work for them without having to know about the dashboard. Once trust is established (or it isn't), the nudge gets out of the way.
|
|
28
|
+
- New `claude-memory hook nudge` subcommand + `Hook::Handler#nudge`. SessionEnd config now wires `[ingest, sweep, nudge]` in order.
|
|
29
|
+
- Silent on `CLAUDE_MEMORY_NO_NUDGE=1` opt-out, missing session_id, n=0 contributions, and after MAX_NUDGES emissions. The empty-session silent path doesn't burn a slot — quiet sessions don't count toward the 10.
|
|
30
|
+
- Activity event `roi_nudge` records `{n, used, pct, prior_count}` per emission so a future migration could change the threshold without re-counting from raw events.
|
|
31
|
+
- Fourth 0.11.0 milestone item. Cold-start trust signal that pairs with #47 (token cost) and #48 (quality) to make the first-week answer to "is this worth it?" visible without effort.
|
|
32
|
+
- **Harm benchmark prototype** — `spec/benchmarks/dataset/harm_scenarios.yml` + `spec/benchmarks/e2e/harm_bench_spec.rb`. Three hand-written cases spanning the riskiest harm classes (stale_tech, mismatched_scope, superseded_undetected). The first ClaudeMemory benchmark that measures whether memory can make Claude *wrong* — every other benchmark only measures whether memory helps.
|
|
33
|
+
- Structure validation (regex compile, fact loadability, harm-class coverage) runs in stub mode as part of `:benchmark` tag.
|
|
34
|
+
- Real-mode runner: `EVAL_MODE=real bundle exec rspec spec/benchmarks/e2e/harm_bench_spec.rb` — needs `claude` CLI on PATH, ~$2-8 per run. Reports harm rate; doesn't enforce a threshold yet (that's the 0.12 release gate).
|
|
35
|
+
- 0.11.0 risk-de-risking item. If even one of these three surfaces a harm now, the full 10-15-case benchmark planned for 0.12 will likely reveal a fundamental issue — better to learn that at 0.11 than at 0.12. **Real-mode prototype run on 2026-04-30 reported 0/3 harm** — green light to expand to the full corpus in 0.12.
|
|
36
|
+
|
|
37
|
+
### Changed
|
|
38
|
+
|
|
39
|
+
- **Hallucination-rate metric calibration** — `Dashboard::Trust#quality_score` now reports a windowed (last 30d) "live" score as the headline plus a "historical" block over all active facts. Production verification on 2026-04-30 (recorded in `docs/quality_review.md`) showed the unwindowed metric was technically correct but pragmatically misleading: 97% of bare-conclusion facts pre-dated the 2026-04-20 reason-clause prompt commit, and the entire 7-day rejection cluster was a single-class systemic failure (a `/study-repo` burst), not ongoing noise. The split makes the metric actionable: live score = ongoing extraction quality, historical = legacy data. The digest's "Quality" section uses the live score as the headline.
|
|
40
|
+
|
|
41
|
+
### Fixed
|
|
42
|
+
|
|
43
|
+
- Real-eval CLI runner now passes `allowed_tools` through explicitly so the harm benchmark and other real-mode benches can pre-allow MCP memory tools without per-test wiring.
|
|
44
|
+
|
|
45
|
+
### Upgrade Notes
|
|
46
|
+
|
|
47
|
+
- No schema migration. All new features ship purely additive.
|
|
48
|
+
- Hooks run the installed gem from PATH, not the working tree. After upgrading, `bundle exec rake install` (or `gem install claude_memory`) is required for the new SessionEnd nudge, `claude-memory show` command, `--tokens` stats flag, and `context_tokens` activity-event field to actually fire on real hook events.
|
|
49
|
+
- Existing `quality_score` consumers will see additional fields (`window_days`, `historical`) in the snapshot. The original keys (`score`, `total_active`, `suspect_count`, `bare_conclusion_count`, `suspect_pct`, `bare_pct`) remain at the top level and now reflect the 30-day live window — historical numbers move to the `historical` sub-hash.
|
|
50
|
+
|
|
7
51
|
## [0.10.0] - 2026-04-28
|
|
8
52
|
|
|
9
53
|
### Added
|
data/CLAUDE.md
CHANGED
|
@@ -163,7 +163,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
|
|
|
163
163
|
- Each command is a separate class (HelpCommand, DoctorCommand, etc.)
|
|
164
164
|
- All commands inherit from BaseCommand
|
|
165
165
|
- Dependency injection for I/O (stdout, stderr, stdin)
|
|
166
|
-
-
|
|
166
|
+
- 34 commands total, each focused on single responsibility
|
|
167
167
|
|
|
168
168
|
- **`Configuration`**: Centralized ENV access (`configuration.rb`)
|
|
169
169
|
- Single source of truth for paths and environment variables
|
|
@@ -209,6 +209,7 @@ New MCP tools `memory.undistilled` and `memory.mark_distilled` support the pipel
|
|
|
209
209
|
- Pluggable distiller design (current: NullDistiller stub)
|
|
210
210
|
- Extracts entities, facts, scope hints from content
|
|
211
211
|
- `ReferenceMaterialDetector`: classifies "X is a plugin/library/tool" templates, LOC counts, "by Firstname Lastname" attributions as reference material. Runs in `ManagementHandlers#store_extraction` so mislabeling can't persist
|
|
212
|
+
- `BareConclusionDetector` (0.11.0+): production-side mirror of the SessionStart prompt's reason-clause requirement. Pure function — flags `decision` / `convention` facts whose object lacks a reason-clause signal ("because", "so that", "to avoid", etc.). Powers the `quality_score` metric on the Trust panel and the digest's Quality section.
|
|
212
213
|
- SessionStart distillation prompt enforces reason clauses ("because…", "so that…") for `decision` and `convention` predicates — bare conclusions are explicitly disallowed
|
|
213
214
|
|
|
214
215
|
- **`Resolve`**: Truth maintenance and conflict resolution (`resolve/`)
|
|
@@ -249,7 +250,7 @@ Key tables (defined in `sqlite_store.rb`):
|
|
|
249
250
|
- `fact_links`: Supersession and conflict relationships
|
|
250
251
|
- `conflicts`: Open contradictions
|
|
251
252
|
- `mcp_tool_calls`: MCP server tool invocation telemetry (schema v13)
|
|
252
|
-
- `activity_events`: Hook/recall/context/sweep telemetry (schema v15) — powers the dashboard timeline, moments feed, efficacy reports
|
|
253
|
+
- `activity_events`: Hook/recall/context/sweep/nudge telemetry (schema v15) — powers the dashboard timeline, moments feed, efficacy reports. Event types: `hook_ingest`, `hook_context` (carries `context_tokens` since 0.11.0), `hook_sweep`, `hook_publish`, `recall`, `store_extraction`, `roi_nudge` (since 0.11.0).
|
|
253
254
|
- `moment_feedback`: Per-moment 👍/👎 verdicts with optional notes (schema v16) — unique on event_id, repeat clicks upsert
|
|
254
255
|
|
|
255
256
|
Facts include:
|
|
@@ -331,7 +332,7 @@ Also update `SECTION_MAP` if the predicate should appear in a specific snapshot
|
|
|
331
332
|
|
|
332
333
|
- `lib/claude_memory.rb`: Main module, requires, database path helpers
|
|
333
334
|
- `lib/claude_memory/cli.rb`: Thin command router (41 lines)
|
|
334
|
-
- `lib/claude_memory/commands/`: Individual command classes (
|
|
335
|
+
- `lib/claude_memory/commands/`: Individual command classes (34 commands)
|
|
335
336
|
- `lib/claude_memory/configuration.rb`: Centralized configuration and ENV access
|
|
336
337
|
- `lib/claude_memory/domain/`: Domain models (Fact, Entity, Provenance, Conflict)
|
|
337
338
|
- `lib/claude_memory/core/`: Value objects and null objects
|
|
@@ -373,6 +374,13 @@ ClaudeMemory integrates with Claude Code via hooks in `.claude/settings.json`:
|
|
|
373
374
|
- Runs time-bounded maintenance on both databases
|
|
374
375
|
- Cleans up vec0 entries for superseded/expired facts
|
|
375
376
|
|
|
377
|
+
- **Nudge hook** (0.11.0+): Triggers on SessionEnd, fires after ingest+sweep
|
|
378
|
+
- Calls `claude-memory hook nudge`
|
|
379
|
+
- For the first 10 sessions only, prints "memory contributed N facts this session, %used = X" to stdout so new users see ROI inline before they discover the dashboard
|
|
380
|
+
- Records `roi_nudge` activity_events; quiets after `MAX_NUDGES` emissions
|
|
381
|
+
- Opt out with `CLAUDE_MEMORY_NO_NUDGE=1` (no event recorded on opt-out)
|
|
382
|
+
- Empty sessions (n=0) silently no-op so quiet sessions don't burn nudge slots
|
|
383
|
+
|
|
376
384
|
Hook commands read JSON payloads from stdin for robustness. Supports `--async` flag for non-blocking execution.
|
|
377
385
|
|
|
378
386
|
## Dashboard
|
data/README.md
CHANGED
|
@@ -140,7 +140,41 @@ File-searchable questions ("what version is this?") and one-shot code generation
|
|
|
140
140
|
- **Claude-Powered**: Uses Claude's intelligence to extract facts (no API key needed)
|
|
141
141
|
- **Token Efficient**: 10x reduction in memory queries with progressive disclosure
|
|
142
142
|
- **Database Maintenance**: Compact, export, and backup commands
|
|
143
|
-
- **Built-in Observability** (0.10.0+): `claude-memory dashboard` opens a local web UI with a moments feed, trust panel, conflicts dedup, knowledge index, 👍/👎 feedback
|
|
143
|
+
- **Built-in Observability** (0.10.0+): `claude-memory dashboard` opens a local web UI with a moments feed, trust panel (token budget, quality score, utilization, feedback), conflicts dedup, knowledge index, and 👍/👎 feedback. See **[Dashboard guide →](docs/dashboard.md)**. `claude-memory digest` writes a weekly markdown report (Activity, Context cost, Quality, New knowledge, Utilization, Conflicts, Feedback); `claude-memory show` prints what would be injected next SessionStart; `claude-memory census` audits the predicate vocabulary across projects.
|
|
144
|
+
|
|
145
|
+
## What's New in 0.11.0
|
|
146
|
+
|
|
147
|
+
Five user-visible signals so you can answer "is memory still worth it?" with
|
|
148
|
+
numbers, not vibes:
|
|
149
|
+
|
|
150
|
+
- **Token budget telemetry** — every SessionStart context injection now
|
|
151
|
+
records its estimated `context_tokens`. `claude-memory stats --tokens
|
|
152
|
+
[--since DAYS]` reports p50/p95/avg/min/max plus a histogram across
|
|
153
|
+
<500 / 500-1k / 1-2k / 2-5k / 5k+ buckets so you can see the per-session
|
|
154
|
+
cost at a glance. The dashboard's Trust panel and `claude-memory digest`
|
|
155
|
+
surface the same numbers.
|
|
156
|
+
- **Hallucination-rate metric** — the dashboard now scores how *clean* the
|
|
157
|
+
fact base is, not just how full it is. `Distill::BareConclusionDetector`
|
|
158
|
+
flags `decision` / `convention` facts that skipped the reason-clause
|
|
159
|
+
requirement. Trust panel shows `quality_score` (live 30-day window with
|
|
160
|
+
historical baseline beneath). `claude-memory digest` adds a Quality
|
|
161
|
+
section with rejection rate.
|
|
162
|
+
- **`claude-memory show`** — new command prints what memory *would* inject
|
|
163
|
+
at the next SessionStart in plain Markdown. Footer reports fact count,
|
|
164
|
+
~token estimate, and char count so you see the cost at a glance. Default
|
|
165
|
+
hides the raw-transcript "Pending Knowledge" dump for readability;
|
|
166
|
+
`--pending` opts in. `--source startup|resume|clear` simulates each
|
|
167
|
+
fresh-session entrypoint.
|
|
168
|
+
- **First-week ROI nudge** — at SessionEnd, memory now prints
|
|
169
|
+
`memory contributed N facts this session, %used = X` for the first 10
|
|
170
|
+
sessions, then quiets. Cold-start trust signal — you don't have to know
|
|
171
|
+
about the dashboard. Opt out with `CLAUDE_MEMORY_NO_NUDGE=1`.
|
|
172
|
+
- **Harm benchmark prototype** — first ClaudeMemory benchmark that
|
|
173
|
+
measures whether memory can make Claude *wrong*. Three hand-written
|
|
174
|
+
cases (stale-tech, mismatched-scope, superseded-but-undetected) under
|
|
175
|
+
`spec/benchmarks/e2e/harm_bench_spec.rb`. Real-mode run on the 0.11
|
|
176
|
+
release reported 0/3 harm; the full 10-15-case corpus + release gate
|
|
177
|
+
lands in 0.12.
|
|
144
178
|
|
|
145
179
|
## What's New in 0.10.0
|
|
146
180
|
|
data/docs/1_0_punchlist.md
CHANGED
|
@@ -1,10 +1,11 @@
|
|
|
1
1
|
# 1.0 Punchlist
|
|
2
2
|
|
|
3
|
-
*Created: 2026-04-28
|
|
3
|
+
*Created: 2026-04-28. Restructured 2026-04-28 (post-0.10.0 release) around
|
|
4
|
+
milestone versions per the path-to-1.0 plan.*
|
|
4
5
|
|
|
5
6
|
The remaining work for a stable 1.0 release. Distinct from `improvements.md` —
|
|
6
7
|
that file tracks the long tail of inbound study/idea entries; this file tracks
|
|
7
|
-
**what blocks 1.0 confidence**.
|
|
8
|
+
**what blocks 1.0 confidence and which release each item ships in**.
|
|
8
9
|
|
|
9
10
|
Guiding question: *a skeptical Ruby developer should be able to look at one
|
|
10
11
|
screen and say "yes, this is helping, here's the evidence" without trusting our
|
|
@@ -12,15 +13,37 @@ marketing.* Today the dashboard tells that story in pieces but not as a
|
|
|
12
13
|
headline. Each item below closes a specific gap that prevents that headline
|
|
13
14
|
from existing.
|
|
14
15
|
|
|
16
|
+
## What 1.0 commits to
|
|
17
|
+
|
|
18
|
+
Not "feature complete" — semver commitment. Once we ship 1.0:
|
|
19
|
+
|
|
20
|
+
- Public APIs (CLI surface, MCP tool schemas, hook payload shapes) lock to semver
|
|
21
|
+
- Schema migrations stay forward-compatible per the round-trip-spec convention
|
|
22
|
+
- The trust signals we ship have a baseline measurement other releases must beat
|
|
23
|
+
|
|
24
|
+
So 1.0 isn't gated by features. It's gated by **the measurement infrastructure
|
|
25
|
+
being trustworthy enough to defend a 1.0 claim.** That's why this punchlist is
|
|
26
|
+
mostly observability, not capability.
|
|
27
|
+
|
|
15
28
|
Items are cross-linked to the canonical entry in `improvements.md` where the
|
|
16
29
|
implementation detail and acceptance criteria live. This file is the
|
|
17
30
|
prioritization view; that file is the work view.
|
|
18
31
|
|
|
19
32
|
---
|
|
20
33
|
|
|
21
|
-
##
|
|
34
|
+
## 0.10.x — patch as needed (now)
|
|
35
|
+
|
|
36
|
+
Reactive only. Real usage will surface issues; cut a patch when one shows up.
|
|
37
|
+
No proactive minor work here.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## 0.11.0 — "Trust & Cost" (~1 week of work)
|
|
22
42
|
|
|
23
|
-
|
|
43
|
+
Theme: *users can see what memory costs and whether it's helping.* Each item
|
|
44
|
+
adds a number a skeptical user can read.
|
|
45
|
+
|
|
46
|
+
### #1 Token budget telemetry — *what does memory cost?* ✅ landed 2026-04-29
|
|
24
47
|
|
|
25
48
|
**Gap.** `Core::TokenEstimator` exists and is unused outside one helper. We
|
|
26
49
|
have no idea what % of the SessionStart token budget memory consumes per
|
|
@@ -30,13 +53,18 @@ session, how it scales with DB size, or whether it's growing.
|
|
|
30
53
|
tokens per session over the last 30 days. Per-session count rides on every
|
|
31
54
|
`hook_context` activity event so the data is queryable post-hoc.
|
|
32
55
|
|
|
33
|
-
**Why
|
|
34
|
-
|
|
35
|
-
|
|
56
|
+
**Why this release.** Loudest critique of any context-injection memory
|
|
57
|
+
system; if we can't answer it numerically, we can't defend the trade.
|
|
58
|
+
|
|
59
|
+
**Status.** Landed in 4 atomic commits on 2026-04-29 (15cb5f5, 35ae8d2,
|
|
60
|
+
d9601ca, 5bfd7c8). `context_tokens` recorded on every successful
|
|
61
|
+
`hook_context` event, surfaced via `Dashboard::Trust#token_budget`,
|
|
62
|
+
`claude-memory digest` "Context cost" section, and
|
|
63
|
+
`claude-memory stats --tokens [--since DAYS]` with histogram.
|
|
36
64
|
|
|
37
|
-
→ improvements.md entry:
|
|
65
|
+
→ improvements.md entry: *#47 Token Budget Telemetry*. Effort: 4-6h.
|
|
38
66
|
|
|
39
|
-
### 2
|
|
67
|
+
### #2 Hallucination rate as a first-class trust metric ✅ landed 2026-04-29
|
|
40
68
|
|
|
41
69
|
**Gap.** `ReferenceMaterialDetector` already classifies suspect facts and we
|
|
42
70
|
know from the #34 audit that ~25% of facts had embedded reasoning (i.e.
|
|
@@ -48,48 +76,16 @@ suspect-fact ratio + bare-conclusion ratio over active facts in both stores.
|
|
|
48
76
|
Digest includes a 30-day rejection rate ("how much of what we extracted got
|
|
49
77
|
rejected within a week?") so calibration drift is visible.
|
|
50
78
|
|
|
51
|
-
**Why
|
|
52
|
-
|
|
79
|
+
**Why this release.** Pollution rate matters as much as recall rate. Pairs
|
|
80
|
+
with #1 — together they answer the "is this still worth it?" question.
|
|
53
81
|
|
|
54
|
-
|
|
82
|
+
**Status.** Landed in 3 atomic commits on 2026-04-29 (27fa6af, 4d1c5bf,
|
|
83
|
+
0b72fa4). New `Distill::BareConclusionDetector` + `Dashboard::Trust#quality_score`
|
|
84
|
+
+ `claude-memory digest` Quality section with rejection rate.
|
|
55
85
|
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
**Gap.** Every benchmark we run today measures whether memory **helps**.
|
|
59
|
-
Nothing measures whether memory **harms** — i.e. injects a wrong fact and
|
|
60
|
-
Claude follows it. Without this, "memory helps" is unfalsifiable.
|
|
61
|
-
|
|
62
|
-
**Acceptance.** New `spec/benchmarks/dataset/harm_scenarios.yml` with 10–15
|
|
63
|
-
cases where memory holds a stale or wrong fact. Each case scores `harm` if
|
|
64
|
-
Claude's response follows the wrong fact, `safe` otherwise. Wired into
|
|
65
|
-
`bin/run-evals`. >1% harm rate blocks release.
|
|
66
|
-
|
|
67
|
-
**Why must-have.** A retrieval system that occasionally makes Claude *wrong*
|
|
68
|
-
is strictly worse than no memory; we need a release gate that proves we're
|
|
69
|
-
not in that regime.
|
|
70
|
-
|
|
71
|
-
→ improvements.md entry: *Negative-Fact Harm Benchmark*
|
|
72
|
-
|
|
73
|
-
### 4. Publish the CLAUDE.md baseline in headline E2E results
|
|
74
|
-
|
|
75
|
-
**Gap.** `claude_md_adapter` exists in `spec/benchmarks/comparative/adapters/`
|
|
76
|
-
and supports E2E. The adapter is wired into `comparative_helper.rb` but the
|
|
77
|
-
README's headline comparative table doesn't include it. The single most
|
|
78
|
-
important question for adoption — *"is this better than a hand-written
|
|
79
|
-
CLAUDE.md?"* — is currently unanswered in our published numbers.
|
|
80
|
-
|
|
81
|
-
**Acceptance.** Comparative E2E report includes `CLAUDE.md baseline` row in
|
|
82
|
-
`spec/benchmarks/README.md` and in `bin/run-evals --comparative` summary
|
|
83
|
-
output. README explicitly states the win/loss versus the static baseline.
|
|
86
|
+
→ improvements.md entry: *#48 Hallucination Rate Metric*. Effort: 1d.
|
|
84
87
|
|
|
85
|
-
|
|
86
|
-
surface the number. If we can't beat a static CLAUDE.md on developer
|
|
87
|
-
scenarios, that's the loudest possible signal that the rest of the system
|
|
88
|
-
needs work; if we can, that's the headline 1.0 brag.
|
|
89
|
-
|
|
90
|
-
→ improvements.md entry: *CLAUDE.md Baseline in Headline Results*
|
|
91
|
-
|
|
92
|
-
### 5. `claude-memory show` — human-readable "what would be injected"
|
|
88
|
+
### #5 `claude-memory show` — human-readable "what would be injected" ✅ landed 2026-04-29
|
|
93
89
|
|
|
94
90
|
**Gap.** Inspecting memory state today requires the dashboard or several CLI
|
|
95
91
|
commands (`recall`, `stats`, `census`). The CLAUDE.md alternative is
|
|
@@ -101,64 +97,223 @@ path real sessions use, prints what would be injected next session in plain
|
|
|
101
97
|
English (not JSON), sized to fit a terminal, with predicate-grouped sections
|
|
102
98
|
matching the snapshot format.
|
|
103
99
|
|
|
104
|
-
**Why
|
|
100
|
+
**Why this release.** Trust requires inspectability. A user who can't see what
|
|
105
101
|
memory will inject can't develop confidence in it.
|
|
106
102
|
|
|
107
|
-
|
|
103
|
+
**Status.** Landed 2026-04-29 (commit 2586bb3). New `Commands::ShowCommand`
|
|
104
|
+
runs `Hook::ContextInjector` and prints the would-be-injected Markdown.
|
|
105
|
+
Default suppresses the raw-transcript pending-knowledge dump for
|
|
106
|
+
readability (`--pending` opts in). Footer reports fact count, token
|
|
107
|
+
estimate, char count.
|
|
108
|
+
|
|
109
|
+
→ improvements.md entry: *#51 claude-memory show*. Effort: ½d.
|
|
110
|
+
|
|
111
|
+
### #7 First-week ROI nudge — *moved up from post-1.0* ✅ landed 2026-04-30
|
|
112
|
+
|
|
113
|
+
**Gap.** New users install, run a few sessions, don't know whether memory is
|
|
114
|
+
working. The dashboard exists but they have to know to look.
|
|
115
|
+
|
|
116
|
+
**Acceptance.** SessionEnd hook prints `memory contributed N facts this
|
|
117
|
+
session, %used = X` inline for the first ~10 sessions, then quiets. Opt-out
|
|
118
|
+
via `CLAUDE_MEMORY_NO_NUDGE=1`.
|
|
119
|
+
|
|
120
|
+
**Why this release.** Belongs with the trust theme — it's the user-visible
|
|
121
|
+
proof that memory is doing work for them. Originally listed as post-1.0;
|
|
122
|
+
elevating because cold-start trust deserves to land before 1.0.
|
|
123
|
+
|
|
124
|
+
**Status.** Landed in 2 atomic commits on 2026-04-30 (f450ed9, 3acce93)
|
|
125
|
+
plus production smoke-test against this project's DB (event #229
|
|
126
|
+
recorded with n=11, used=0, pct=0 for a real session_id). New
|
|
127
|
+
`Hook::Handler#nudge` + `claude-memory hook nudge`; SessionEnd config
|
|
128
|
+
appends nudge after ingest+sweep. Silent on opt-out, missing
|
|
129
|
+
session_id, n=0, or first-week-complete (so empty sessions don't burn
|
|
130
|
+
slots).
|
|
131
|
+
|
|
132
|
+
→ improvements.md entry: *#53 First-Week ROI Nudge*. Effort: ½d.
|
|
133
|
+
|
|
134
|
+
### Risk-de-risking — 3-scenario harm prototype ✅ landed 2026-04-30
|
|
135
|
+
|
|
136
|
+
Before 0.12 builds the full 10-15-scenario harm benchmark (see #3), run a
|
|
137
|
+
3-scenario prototype against the 0.10.0 codebase to confirm whether harm is
|
|
138
|
+
actually low. If the prototype surfaces a >0% harm rate on simple cases, the
|
|
139
|
+
full benchmark in 0.12 will reveal a fundamental issue — better to know at
|
|
140
|
+
0.11 than discover at 0.12.
|
|
141
|
+
|
|
142
|
+
**Acceptance.** Three hand-written `harm_scenarios.yml` cases (one stale-tech,
|
|
143
|
+
one mismatched-scope, one superseded-but-undetected) run against real Claude
|
|
144
|
+
under `EVAL_MODE=real`. Reports go/no-go on the larger benchmark in 0.12.
|
|
145
|
+
|
|
146
|
+
**Status.** Landed 2026-04-30 (commit 35b368e). Three cases written:
|
|
147
|
+
`harm_stale_tech` (MySQL fact vs SQLite reality), `harm_mismatched_scope`
|
|
148
|
+
(global TS/Tailwind preference applied to a Ruby gem),
|
|
149
|
+
`harm_superseded_undetected` (two contradicting auth_method facts both
|
|
150
|
+
active). Structure validation passes in stub mode. Real-mode is gated
|
|
151
|
+
behind `EVAL_MODE=real` (~$2-8 per run) so the operator decides when to
|
|
152
|
+
spend; this prototype reports harm rate but doesn't enforce a threshold
|
|
153
|
+
yet — that's the 0.12 release-gate work.
|
|
154
|
+
|
|
155
|
+
→ improvements.md entry: *#49 Negative-Fact Harm Benchmark* (prototype phase).
|
|
156
|
+
Effort: ½d.
|
|
157
|
+
|
|
158
|
+
**Ship target:** ~2 weeks from 0.10.0 (mid-May 2026 at current velocity).
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## 0.12.0 — "Release Discipline" (~1 week of work)
|
|
108
163
|
|
|
109
|
-
|
|
164
|
+
Theme: *we can't ship a regression without noticing.* Internal infrastructure
|
|
165
|
+
that prevents future regressions. Not flashy but the actual prerequisite for
|
|
166
|
+
1.0's semver commitment.
|
|
167
|
+
|
|
168
|
+
### #3 Negative-fact harm benchmark (full 10-15 scenarios)
|
|
169
|
+
|
|
170
|
+
**Gap.** Every benchmark today measures whether memory **helps**. Nothing
|
|
171
|
+
measures whether memory **harms** — i.e. injects a wrong fact and Claude
|
|
172
|
+
follows it. Without this, "memory helps" is unfalsifiable.
|
|
173
|
+
|
|
174
|
+
**Acceptance.** `spec/benchmarks/dataset/harm_scenarios.yml` with 10-15 cases
|
|
175
|
+
spanning four harm classes (stale-tech, mismatched-scope, superseded-but-
|
|
176
|
+
undetected, reference-material-as-fact). Each scores `harm` if Claude follows
|
|
177
|
+
the wrong fact, `safe` otherwise. Wired into `bin/run-evals`. **>1% harm
|
|
178
|
+
rate blocks release** (configurable via `HARM_RATE_THRESHOLD`).
|
|
179
|
+
|
|
180
|
+
**Why this release.** A retrieval system that occasionally makes Claude
|
|
181
|
+
*wrong* is strictly worse than no memory; the release gate proves we're not
|
|
182
|
+
in that regime.
|
|
183
|
+
|
|
184
|
+
→ improvements.md entry: *#49 Negative-Fact Harm Benchmark* (full corpus).
|
|
185
|
+
Effort: 2d.
|
|
186
|
+
|
|
187
|
+
### #4 Publish the CLAUDE.md baseline in headline E2E results
|
|
188
|
+
|
|
189
|
+
**Gap.** `claude_md_adapter` exists in `spec/benchmarks/comparative/adapters/`
|
|
190
|
+
and is wired into `comparative_helper.rb`. The README's headline comparative
|
|
191
|
+
table doesn't include it. The single most important question for adoption —
|
|
192
|
+
*"is this better than a hand-written CLAUDE.md?"* — is unanswered in our
|
|
193
|
+
published numbers.
|
|
194
|
+
|
|
195
|
+
**Acceptance.** Comparative E2E report includes `CLAUDE.md baseline` row in
|
|
196
|
+
`spec/benchmarks/README.md` and in `bin/run-evals --comparative` summary.
|
|
197
|
+
README explicitly states the win/loss versus the static baseline.
|
|
198
|
+
|
|
199
|
+
**Why this release.** Cheapest item on the list — adapter built, just
|
|
200
|
+
surface the number. Pairs with #6 because it materializes once the
|
|
201
|
+
scoreboard infrastructure is there.
|
|
202
|
+
|
|
203
|
+
→ improvements.md entry: *#50 CLAUDE.md Baseline in Headline Results*.
|
|
204
|
+
Effort: 30min code + one $2-8 real-mode run.
|
|
205
|
+
|
|
206
|
+
### #6 Release-to-release benchmark scoreboard
|
|
110
207
|
|
|
111
208
|
**Gap.** Benchmark output is textual today. Nothing diff-able across versions.
|
|
112
|
-
Regressions land silently — the only reason we caught the
|
|
113
|
-
|
|
209
|
+
Regressions land silently — the only reason we caught the BM25 normalization
|
|
210
|
+
bug was a manual run.
|
|
114
211
|
|
|
115
212
|
**Acceptance.** Each `bin/run-evals` run writes
|
|
116
|
-
`spec/benchmarks/results/<version>.json`. New `bin/bench-diff`
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
213
|
+
`spec/benchmarks/results/<version>.json`. New `bin/bench-diff` compares
|
|
214
|
+
against the last tagged version's JSON and reports deltas. `/release` skill
|
|
215
|
+
reads it and refuses to ship on regressions over threshold.
|
|
216
|
+
|
|
217
|
+
**Why this release.** The semver commitment in 1.0 *requires* this — we
|
|
218
|
+
can't promise non-regression without the infrastructure to detect it.
|
|
120
219
|
|
|
121
|
-
|
|
122
|
-
snapshot. 1.0 is the moment we commit to *not regressing* what we ship.
|
|
220
|
+
→ improvements.md entry: *#52 Benchmark Scoreboard Diff*. Effort: 1d.
|
|
123
221
|
|
|
124
|
-
|
|
222
|
+
**Ship target:** ~4 weeks from 0.10.0 (end of May 2026).
|
|
125
223
|
|
|
126
224
|
---
|
|
127
225
|
|
|
128
|
-
##
|
|
226
|
+
## 0.12.x → 1.0 — soak period (2-3 weeks)
|
|
227
|
+
|
|
228
|
+
Critical phase. Run 0.12 against real usage. Watch:
|
|
229
|
+
|
|
230
|
+
- **Harm rate stays at 0%** — release gate from #3
|
|
231
|
+
- **Hallucination rate trend** — from #2
|
|
232
|
+
- **Token budget growth** — from #1, #9
|
|
233
|
+
- **Utilization ratio** — across multiple projects
|
|
234
|
+
|
|
235
|
+
If any signal shifts unfavorably during soak, fix in 0.12.x. **Don't ship 1.0
|
|
236
|
+
from a release that hasn't observed itself for ≥2 weeks.**
|
|
237
|
+
|
|
238
|
+
This soak period is also where the relevance ratio metric (#31 from 0.10.0)
|
|
239
|
+
materializes its first real-mode measurement, and where the 0.11 trust
|
|
240
|
+
signals get a chance to be real numbers vs. theory.
|
|
129
241
|
|
|
130
|
-
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## 1.0.0 — "Stable Memory"
|
|
131
245
|
|
|
132
|
-
|
|
246
|
+
Theme: *ready for daily use, ready to recommend.*
|
|
133
247
|
|
|
134
|
-
|
|
135
|
-
inline for the first ~10 sessions. Closes the cold-start gap where new users
|
|
136
|
-
don't see value because they don't think to look.
|
|
248
|
+
### Post-1.0-punchlist polish (if landed during soak)
|
|
137
249
|
|
|
138
|
-
|
|
250
|
+
These were originally post-1.0 in the punchlist; if soak time permits, they
|
|
251
|
+
land in 1.0. Otherwise they ship in 1.1.
|
|
139
252
|
|
|
140
|
-
### 8
|
|
253
|
+
### #8 Real-session repeat-correction detection
|
|
141
254
|
|
|
142
|
-
The repeat-correction benchmark (#32) is synthetic; production
|
|
143
|
-
equivalent signal. Analyze `activity_events`
|
|
144
|
-
last session, the user re-stated it this session" — that's where
|
|
145
|
-
silently failing.
|
|
255
|
+
The repeat-correction benchmark (#32 from 0.10.0) is synthetic; production
|
|
256
|
+
has no equivalent signal. Analyze `activity_events` for "this fact was
|
|
257
|
+
injected last session, the user re-stated it this session" — that's where
|
|
258
|
+
memory is silently failing.
|
|
146
259
|
|
|
147
|
-
→ improvements.md entry:
|
|
260
|
+
→ improvements.md entry: *#54 Real-Session Repeat-Correction Detection*.
|
|
261
|
+
Effort: 2d.
|
|
148
262
|
|
|
149
|
-
### 9
|
|
263
|
+
### #9 Token-cost growth tracking
|
|
150
264
|
|
|
151
265
|
Builds on #1. Weekly digest reports "context cost grew X% over 30d" as an
|
|
152
266
|
anomaly signal that the DB is bloating or context injection is going wide.
|
|
153
267
|
|
|
154
|
-
→ improvements.md entry:
|
|
268
|
+
→ improvements.md entry: *#55 Token-Cost Growth Tracking*. Effort: 3h after
|
|
269
|
+
#1 lands.
|
|
155
270
|
|
|
156
|
-
### 10
|
|
271
|
+
### #10 Drift dashboard
|
|
157
272
|
|
|
158
273
|
Snapshot `census` weekly, surface predicate distribution shifts on the
|
|
159
274
|
dashboard. Answers "is my fact base going off?" without a manual audit.
|
|
160
275
|
|
|
161
|
-
→ improvements.md entry:
|
|
276
|
+
→ improvements.md entry: *#56 Drift Dashboard*. Effort: 1.5d.
|
|
277
|
+
|
|
278
|
+
### #11 API stability audit (NEW — added 2026-04-28)
|
|
279
|
+
|
|
280
|
+
**Gap.** "1.0 commits to semver" is meaningless without an explicit
|
|
281
|
+
public/internal split. Many of the surfaces touched in 0.9.0 / 0.10.0
|
|
282
|
+
(MCP tool schemas, hook payload shapes, CLI flags, dashboard endpoints)
|
|
283
|
+
have evolved organically and aren't formally documented as stable vs.
|
|
284
|
+
internal.
|
|
285
|
+
|
|
286
|
+
**Acceptance.**
|
|
287
|
+
|
|
288
|
+
- New `docs/api_stability.md` enumerating:
|
|
289
|
+
- **Public CLI**: every `claude-memory <subcommand>` and its flags, with stability tier
|
|
290
|
+
- **Public MCP tools**: every tool's schema, return shape, and tool-annotation hints
|
|
291
|
+
- **Public hook contract**: payload fields, return shapes, exit codes
|
|
292
|
+
- **Public Ruby API**: which classes/modules under `lib/claude_memory/` are external-facing (`Recall`, `Configuration`, `Store::StoreManager`?) vs. internal-only
|
|
293
|
+
- **Schema**: stability of column names, table names, predicate vocabulary
|
|
294
|
+
- A deprecation policy: "we'll mark X deprecated in N.x.0 and remove no earlier than (N+1).0.0"
|
|
295
|
+
- README + CLAUDE.md link to the new doc as the authoritative source
|
|
296
|
+
|
|
297
|
+
**Why this release.** Without this, the 1.0 semver promise is vibes, not a
|
|
298
|
+
contract. Future regressions in non-listed areas can be argued away; future
|
|
299
|
+
regressions in listed areas are bugs. Forces us to be honest about what
|
|
300
|
+
we're committing to.
|
|
301
|
+
|
|
302
|
+
→ improvements.md entry: *#59 API Stability Audit* (added 2026-04-28; renumbered
|
|
303
|
+
from #57 after rebase brought in Mercury-article entries #57/#58). Effort:
|
|
304
|
+
2d including the doc + deprecation-warning instrumentation for any
|
|
305
|
+
soon-to-be-removed surface.
|
|
306
|
+
|
|
307
|
+
### Release framing
|
|
308
|
+
|
|
309
|
+
README + CHANGELOG framing for 1.0 explicitly states:
|
|
310
|
+
|
|
311
|
+
- "We measured X harm rate, Y utilization, Z hallucination rate across N
|
|
312
|
+
projects over W weeks before tagging this."
|
|
313
|
+
- The public API surface is documented at `docs/api_stability.md`
|
|
314
|
+
- Deprecation policy explicit
|
|
315
|
+
|
|
316
|
+
**Ship target:** 6-8 weeks from 0.10.0 (mid-June 2026 at current velocity).
|
|
162
317
|
|
|
163
318
|
---
|
|
164
319
|
|
|
@@ -168,23 +323,49 @@ dashboard. Answers "is my fact base going off?" without a manual audit.
|
|
|
168
323
|
drawers cover the primary need.
|
|
169
324
|
- **#45 Live SSE/WebSocket feed** — polling is adequate; dashboard polish, not
|
|
170
325
|
a confidence gap.
|
|
326
|
+
- **#23 REST API endpoint** — MCP covers primary use case; defer to 1.x.
|
|
327
|
+
- **#25 HTTP MCP transport** — no startup-latency complaint to motivate it yet.
|
|
171
328
|
|
|
172
329
|
---
|
|
173
330
|
|
|
174
|
-
##
|
|
331
|
+
## Risk to flag now
|
|
332
|
+
|
|
333
|
+
The biggest hidden risk in this plan is **the harm benchmark (#3) finds
|
|
334
|
+
something.** If 10-15 scenarios with intentionally wrong facts produce >1%
|
|
335
|
+
harm rate, that's a fundamental retrieval-discipline issue that could push
|
|
336
|
+
1.0 by months. The 3-scenario prototype in 0.11 (above) is specifically
|
|
337
|
+
designed to surface this risk earlier.
|
|
338
|
+
|
|
339
|
+
---
|
|
340
|
+
|
|
341
|
+
## Velocity assumptions
|
|
342
|
+
|
|
343
|
+
Based on actual release cadence Mar-Apr 2026:
|
|
344
|
+
|
|
345
|
+
| Pair | Days |
|
|
346
|
+
|---|---|
|
|
347
|
+
| 0.7.0 → 0.7.1 | minor patch, days |
|
|
348
|
+
| 0.7.1 → 0.8.0 | 17 |
|
|
349
|
+
| 0.8.0 → 0.9.0 | 17 |
|
|
350
|
+
| 0.9.0 → 0.9.1 | same day (patch) |
|
|
351
|
+
| 0.9.1 → 0.10.0 | 12 |
|
|
175
352
|
|
|
176
|
-
|
|
353
|
+
Average ~2 weeks per minor with substantial work landing each cycle.
|
|
177
354
|
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
355
|
+
| Milestone | Estimated work | Calendar target |
|
|
356
|
+
|---|---|---|
|
|
357
|
+
| 0.10.x patches | reactive | as-needed |
|
|
358
|
+
| 0.11.0 | ~1 week | ~2026-05-12 |
|
|
359
|
+
| 0.12.0 | ~1 week | ~2026-05-26 |
|
|
360
|
+
| Soak | 2-3 weeks | through ~2026-06-16 |
|
|
361
|
+
| 1.0.0 | 1-2 days release prep + #11 | ~2026-06-16 to 2026-06-23 |
|
|
181
362
|
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
land.
|
|
363
|
+
These are calendar estimates assuming roughly the same focus level as the
|
|
364
|
+
0.10.0 cycle. Real cadence will adjust based on what surfaces during soak.
|
|
185
365
|
|
|
186
366
|
---
|
|
187
367
|
|
|
188
|
-
*Last updated: 2026-04-28
|
|
189
|
-
|
|
190
|
-
|
|
368
|
+
*Last updated: 2026-04-28 (post-0.10.0). Restructured around milestone
|
|
369
|
+
versions per the path-to-1.0 plan. #7 moved up from post-1.0 to 0.11; #11
|
|
370
|
+
API stability audit added as a new 1.0 must-have; 3-scenario harm prototype
|
|
371
|
+
added to 0.11 as risk-de-risking work for the full 0.12 benchmark.*
|
data/docs/GETTING_STARTED.md
CHANGED
|
@@ -593,8 +593,10 @@ Now that you're up and running:
|
|
|
593
593
|
| `claude-memory changes` | Recent updates |
|
|
594
594
|
| `claude-memory conflicts` | Show contradictions |
|
|
595
595
|
| `claude-memory dashboard` | Open the local web UI (0.10.0+) |
|
|
596
|
-
| `claude-memory digest --since 7` | Markdown report of the last 7 days (0.10.0+) |
|
|
596
|
+
| `claude-memory digest --since 7` | Markdown report of the last 7 days (0.10.0+; gains Context cost + Quality sections in 0.11.0) |
|
|
597
|
+
| `claude-memory show [--pending] [--source]` | Print what memory would inject at next SessionStart (0.11.0+) |
|
|
597
598
|
| `claude-memory stats --stale` | List facts not recalled recently (0.10.0+) |
|
|
599
|
+
| `claude-memory stats --tokens [--since DAYS]` | SessionStart context-token budget histogram (0.11.0+) |
|
|
598
600
|
| `claude-memory stats --tools` | MCP tool-call telemetry (0.9.0+) |
|
|
599
601
|
| `claude-memory census` | Privacy-safe predicate audit across projects (0.10.0+) |
|
|
600
602
|
| `claude-memory dedupe-conflicts --dry-run` | Preview historical conflict-row dedup (0.10.0+) |
|