RubyGems - claude_memory - Versions diffs - 0.9.1 → 0.11.0 - Mend

claude_memory 0.9.1 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (77) hide show

checksums.yaml +4 -4
data/.claude/memory.sqlite3 +0 -0
data/.claude/skills/dashboard/SKILL.md +42 -0
data/.claude-plugin/marketplace.json +1 -1
data/.claude-plugin/plugin.json +1 -1
data/CHANGELOG.md +130 -0
data/CLAUDE.md +30 -6
data/README.md +66 -2
data/db/migrations/015_add_activity_events.rb +26 -0
data/db/migrations/016_add_moment_feedback.rb +22 -0
data/db/migrations/017_add_last_recalled_at.rb +15 -0
data/docs/1_0_punchlist.md +371 -0
data/docs/EXAMPLES.md +41 -2
data/docs/GETTING_STARTED.md +33 -4
data/docs/architecture.md +22 -7
data/docs/audit-queries.md +131 -0
data/docs/dashboard.md +192 -0
data/docs/improvements.md +650 -9
data/docs/influence/cq.md +187 -0
data/docs/plugin.md +13 -6
data/docs/quality_review.md +524 -172
data/docs/reflection_memory_as_accumulating_judgment.md +67 -0
data/lib/claude_memory/activity_log.rb +86 -0
data/lib/claude_memory/commands/census_command.rb +210 -0
data/lib/claude_memory/commands/completion_command.rb +3 -0
data/lib/claude_memory/commands/dashboard_command.rb +54 -0
data/lib/claude_memory/commands/dedupe_conflicts_command.rb +55 -0
data/lib/claude_memory/commands/digest_command.rb +273 -0
data/lib/claude_memory/commands/hook_command.rb +61 -2
data/lib/claude_memory/commands/initializers/hooks_configurator.rb +7 -4
data/lib/claude_memory/commands/reclassify_references_command.rb +56 -0
data/lib/claude_memory/commands/registry.rb +7 -1
data/lib/claude_memory/commands/show_command.rb +90 -0
data/lib/claude_memory/commands/skills/distill-transcripts.md +13 -1
data/lib/claude_memory/commands/stats_command.rb +131 -2
data/lib/claude_memory/commands/sweep_command.rb +2 -0
data/lib/claude_memory/configuration.rb +16 -0
data/lib/claude_memory/core/relative_time.rb +9 -0
data/lib/claude_memory/dashboard/api.rb +610 -0
data/lib/claude_memory/dashboard/conflicts.rb +279 -0
data/lib/claude_memory/dashboard/efficacy.rb +127 -0
data/lib/claude_memory/dashboard/fact_presenter.rb +109 -0
data/lib/claude_memory/dashboard/health.rb +175 -0
data/lib/claude_memory/dashboard/index.html +2707 -0
data/lib/claude_memory/dashboard/knowledge.rb +136 -0
data/lib/claude_memory/dashboard/moments.rb +244 -0
data/lib/claude_memory/dashboard/reuse.rb +97 -0
data/lib/claude_memory/dashboard/scoped_fact_resolver.rb +95 -0
data/lib/claude_memory/dashboard/server.rb +211 -0
data/lib/claude_memory/dashboard/timeline.rb +68 -0
data/lib/claude_memory/dashboard/trust.rb +454 -0
data/lib/claude_memory/distill/bare_conclusion_detector.rb +71 -0
data/lib/claude_memory/distill/reference_material_detector.rb +78 -0
data/lib/claude_memory/hook/auto_memory_mirror.rb +112 -0
data/lib/claude_memory/hook/context_injector.rb +97 -3
data/lib/claude_memory/hook/handler.rb +191 -3
data/lib/claude_memory/mcp/handlers/management_handlers.rb +8 -0
data/lib/claude_memory/mcp/query_guide.rb +11 -0
data/lib/claude_memory/mcp/text_summary.rb +29 -0
data/lib/claude_memory/mcp/tool_definitions.rb +13 -0
data/lib/claude_memory/mcp/tools.rb +148 -0
data/lib/claude_memory/publish.rb +13 -21
data/lib/claude_memory/recall/stale_detector.rb +67 -0
data/lib/claude_memory/resolve/predicate_policy.rb +2 -0
data/lib/claude_memory/resolve/resolver.rb +41 -11
data/lib/claude_memory/store/llm_cache.rb +68 -0
data/lib/claude_memory/store/metrics_aggregator.rb +96 -0
data/lib/claude_memory/store/schema_manager.rb +1 -1
data/lib/claude_memory/store/sqlite_store.rb +47 -143
data/lib/claude_memory/store/store_manager.rb +29 -0
data/lib/claude_memory/sweep/maintenance.rb +216 -0
data/lib/claude_memory/sweep/recall_timestamp_refresher.rb +83 -0
data/lib/claude_memory/sweep/sweeper.rb +2 -0
data/lib/claude_memory/templates/hooks.example.json +5 -0
data/lib/claude_memory/version.rb +1 -1
data/lib/claude_memory.rb +24 -0
metadata +51 -1

data/docs/audit-queries.md ADDED Viewed

@@ -0,0 +1,131 @@
+# Audit Queries
+Pre-written SQL for validating that the ClaudeMemory plugin is being invoked when it should. Run via [cq](https://github.com/technicalpickles/cq) — install with `cargo install --git https://github.com/technicalpickles/cq`.
+These query Claude Code's raw transcripts (in `~/.claude/projects/`), not ClaudeMemory's own SQLite databases. That's deliberate: cq sees *all* tool calls including ones that bypassed the MCP server entirely, which is exactly the angle needed to spot activation gaps.
+For server-side telemetry (counts, latencies of MCP calls that *did* land), use `claude-memory stats --tools` against ClaudeMemory's `mcp_tool_calls` table instead.
+## Query 1 — Memory plugin activation rate
+How often is any `mcp__memory__*` tool being called, normalized by total sessions?
+```bash
+cq sql "
+WITH session_window AS (
+  SELECT DISTINCT session_id FROM messages
+),
+memory_sessions AS (
+  SELECT DISTINCT session_id FROM tool_calls
+  WHERE name LIKE 'mcp__memory__%'
+)
+SELECT
+  (SELECT count(*) FROM session_window) AS total_sessions,
+  (SELECT count(*) FROM memory_sessions) AS sessions_with_memory_call,
+  ROUND(100.0 * (SELECT count(*) FROM memory_sessions)
+        / NULLIF((SELECT count(*) FROM session_window), 0), 1) AS pct
+" --since 30d --table
+```
+**Why it matters**: a low percentage doesn't mean the plugin is broken — many sessions don't need memory. It's a denominator for the next two queries.
+## Query 2 — Sessions that asked memory-shaped questions but never called memory
+The most useful query. Surfaces user prompts where memory *should* have been the obvious tool, but Claude went elsewhere (Read, Grep, Bash) instead.
+```bash
+cq sql "
+WITH memory_sessions AS (
+  SELECT DISTINCT session_id FROM tool_calls
+  WHERE name LIKE 'mcp__memory__%'
+)
+SELECT
+  m.session_id,
+  m.timestamp,
+  left(m.text, 200) AS user_prompt
+FROM messages m
+LEFT JOIN memory_sessions ms ON m.session_id = ms.session_id
+WHERE m.type = 'user'
+  AND ms.session_id IS NULL
+  AND (
+    m.text ILIKE '%why did we%'
+    OR m.text ILIKE '%what convention%'
+    OR m.text ILIKE '%how do we usually%'
+    OR m.text ILIKE '%what did we decide%'
+    OR m.text ILIKE '%architecture%'
+    OR m.text ILIKE '%what''s the pattern%'
+  )
+ORDER BY m.timestamp DESC
+" --since 30d --table --limit 30
+```
+**What to do with results**: each row is a candidate for either (a) a tightening of MCP server instructions / skill descriptions, or (b) confirmation that the question genuinely didn't need memory and the keyword filter is too loose.
+## Query 3 — Which memory tools actually get called?
+```bash
+cq sql "
+SELECT
+  name AS tool,
+  count(*) AS invocations,
+  count(DISTINCT session_id) AS sessions
+FROM tool_calls
+WHERE name LIKE 'mcp__memory__%'
+GROUP BY name
+ORDER BY invocations DESC
+" --since 30d --table
+```
+**Expected shape**: `mcp__memory__recall`, `mcp__memory__conventions`, `mcp__memory__decisions` should dominate. Tools that never fire (`memory_fact_graph`, `memory_explain`, `memory_search_concepts`, `memory_facts_by_*`) might have description/triggering issues — same pattern as cq's "skill audit" use case.
+## Query 4 — Error rate per memory tool
+```bash
+cq sql "
+SELECT
+  tc.name AS tool,
+  count(*) AS calls,
+  sum(CASE WHEN tr.is_error THEN 1 ELSE 0 END) AS errors,
+  ROUND(100.0 * sum(CASE WHEN tr.is_error THEN 1 ELSE 0 END)
+        / count(*), 1) AS pct_errors
+FROM tool_calls tc
+JOIN tool_results tr ON tc.tool_use_id = tr.tool_use_id
+WHERE tc.name LIKE 'mcp__memory__%'
+GROUP BY tc.name
+ORDER BY errors DESC
+" --since 30d --table
+```
+**Why it matters**: a memory tool returning errors is much worse than not firing — Claude sees the failure and learns to avoid that tool. Triage anything above ~5%.
+## Query 5 — Result-size distribution (context budget hygiene)
+```bash
+cq sql "
+SELECT
+  tc.name AS tool,
+  count(*) AS calls,
+  MIN(length(tr.content)) AS min_chars,
+  ROUND(AVG(length(tr.content))) AS avg_chars,
+  MAX(length(tr.content)) AS max_chars
+FROM tool_calls tc
+JOIN tool_results tr ON tc.tool_use_id = tr.tool_use_id
+WHERE tc.name LIKE 'mcp__memory__%'
+GROUP BY tc.name
+ORDER BY avg_chars DESC
+" --since 30d --table
+```
+**Why it matters**: ClaudeMemory exposes a `compact: true` option that drops receipts for ~60% smaller responses. If averages are large, either the compact flag isn't being passed by callers or the tools that don't accept it are dumping too much.
+## When to re-run
+- Before each release — does the new version improve activation rate or reduce errors?
+- After meaningful changes to MCP server instructions / skill descriptions
+- If a user reports "the memory plugin doesn't seem to do anything" — Query 2 will usually surface the gap concretely
+## Related
+- Source for the methodology: `docs/influence/cq.md`
+- Server-side telemetry alternative: `claude-memory stats --tools --since 30`
+- cq schema reference: `cq schema --examples`

data/docs/dashboard.md ADDED Viewed

@@ -0,0 +1,192 @@
+# ClaudeMemory Dashboard
+Local web UI for inspecting what memory knows, what it's been doing, and where
+it's going wrong. Headline feature of v0.10.0.
+## Quick start
+```bash
+claude-memory dashboard
+```
+Opens `http://localhost:3377` in your default browser. Reads from both the
+global (`~/.claude/memory.sqlite3`) and project (`.claude/memory.sqlite3`)
+databases. No write side effects from page loads — the UI is a viewer with
+explicit action buttons (reject / promote / feedback) where applicable.
+```bash
+# Custom port
+claude-memory dashboard --port 4000
+# Don't auto-open the browser (e.g. running in tmux/headless)
+claude-memory dashboard --no-open
+```
+Press `Ctrl+C` in the terminal to stop the server.
+## What each panel shows
+The dashboard is **feed-first**: the main view is a chronological stream of
+*moments* (memory activity events), with sidebar panels giving aggregate signals.
+### Sidebar — Trust
+At-a-glance signals so you can answer "is memory helping?" — and "what does
+it cost?" — in one look:
+- **This week's moments** — count of value-producing events (recall hits,
+  context injections, extractions). Includes a week-over-week delta.
+- **What memory knows about you** — up to 5 global facts rendered as plain
+  English. The "fingerprint" of your cross-project preferences.
+- **Needs review** — open conflicts (deduped to distinct contradictions) +
+  stale facts (active but not recalled in the configured window) + empty
+  recalls (queries that returned nothing).
+- **Token budget (30d)** *(0.11.0+)* — p50/p95/avg `context_tokens` injected
+  per SessionStart over the last 30 days, with sample size. Answers "what
+  does memory cost per session?" — pairs with the digest's "Context cost"
+  section and `claude-memory stats --tokens`.
+- **Quality score (live, 30d)** *(0.11.0+)* — 0–100 hallucination-rate
+  proxy. `score = 100 - (suspect_pct + bare_pct)` where suspect = facts
+  retagged as `predicate=reference` and bare = decision/convention facts
+  whose object skipped the prompt-mandated reason clause. Headline is the
+  live 30-day window; the underlying snapshot also exposes a `historical`
+  block over all active facts for context. Returns 100 on empty stores.
+- **Utilization (30d)** — of facts extracted in the last 30 days, what % has
+  Claude actually surfaced via recall or context injection. Color-coded
+  (green ≥40%, yellow ≥15%, red below). Hidden on fresh installs.
+- **Feedback (30d)** — thumbs-up/down ratio from moments you've rated.
+### Feed — Moments
+Real-time stream of memory activity, classified by kind:
+- `recall_hit` / `recall_empty` — `memory.recall*` calls, with top fact IDs
+  resolved to their sentences. Click for full payload.
+- `context_injection` — SessionStart context emitted into Claude's prompt,
+  with a preview and the facts it carried. `context_skipped` when injection
+  was empty.
+- `extraction` — facts/entities created via `memory.store_extraction`,
+  inlined with content preview.
+- `hook_ingest` / `hook_sweep` — pipeline activity.
+Each moment has a 👍/👎 button. Use them deliberately — the ratio feeds the
+Trust panel and is a calibration signal we read against retrieval quality
+benchmarks.
+Filter via query params: `kinds=recall_hit`, `before=<ISO timestamp>`.
+### Knowledge
+Active facts grouped by predicate. Sections include:
+- Decisions, Conventions, Architecture (multi-value)
+- Tech stack (uses_database, uses_framework, uses_language, deployment_platform, auth_method)
+- **References** (added 0.10.0) — facts auto-tagged as reference material
+  by `Distill::ReferenceMaterialDetector` (LOC counts, "X is a plugin…"
+  templates, author attributions). Separated from conventions to keep the
+  signal-to-noise ratio of the conventions section high.
+### Conflicts
+Open contradictions, deduped at the display layer: identical
+`(subject, predicate, object_pair)` detections collapse into one row with a
+`×N` badge. The "Needs review" sidebar count uses the deduped count, not
+raw rows.
+Each row links to:
+- Both sides of the conflict with provenance
+- A bulk-reject action ("reject all rows that match this exact contradiction")
+- The originating activity event
+### Reuse
+Most-used facts in the time window. Useful for answering "which facts are
+actually doing the work?" when you suspect memory is accumulating dead weight.
+### Activity (timeline)
+Daily rollup of facts created, content ingested, hook events fired, and
+recalls performed over the last 30 days. Click any day to drill into the
+underlying events.
+### Health
+Four checks: global database, project database, hooks installation,
+sqlite-vec coverage. Each surfaces an actionable fix string (e.g.,
+"Run `claude-memory init` to install the standard hook set"). Status
+escalates to the worst individual check (error > warning > healthy).
+### Activity drill-down
+Clicking any moment opens a modal with the parsed payload, prettified JSON,
+and — for recall events — a "what triggered this?" correlation showing the
+preceding ingest and the user prompt that motivated the recall.
+### Query tester
+Run `memory.recall*` queries inline and see scored results, with optional
+score traces (`vec_rank`, `fts_rank`, `rrf_final`) for hybrid retrieval
+debugging. Surfaces an actionable hint if FTS5 corruption is detected
+(suggests `claude-memory compact`).
+## When to use it
+- **After any session that surprised you** — was the recall actually firing?
+  Did the fact you taught get extracted? The Moments feed answers both.
+- **Before promoting a fact to global** — see what's already there in the
+  Knowledge panel, including dedupe siblings.
+- **When `claude-memory doctor` warns about conflicts** — the Conflicts
+  panel groups duplicates so you don't have to handle them one row at a time.
+- **When deciding what to keep** — the Reuse panel shows which facts have
+  earned their spot; everything else is staleness candidate per the
+  `claude-memory stats --stale` listing.
+## What it's not
+- Not an editor for fact text (use `claude-memory promote` / `reject`).
+- Not a replacement for the CLI — for headless / scripted use, prefer
+  `claude-memory stats`, `claude-memory digest`, `claude-memory census`.
+- Not a multi-user surface — bound to localhost, single-process WEBrick.
+- Not a long-running service — runs in the foreground; close when done.
+## Architecture
+The dashboard is a thin web layer over the same `Recall`, `Conflicts`,
+`Trust`, `Moments`, etc. classes the MCP server uses. Each panel is backed by
+a dedicated module under `lib/claude_memory/dashboard/`:
+| Panel / endpoint | Module | Responsibility |
+|---|---|---|
+| Trust sidebar | `Dashboard::Trust` | Weekly moments, fingerprint, utilization, feedback |
+| Feed | `Dashboard::Moments` | Activity-event classification + presenter |
+| Knowledge | `Dashboard::Knowledge` | Predicate-grouped fact summary |
+| Conflicts | `Dashboard::Conflicts` | Dedup grouping, bulk-reject helper |
+| Reuse | `Dashboard::Reuse` | Most-used-fact ranking |
+| Health | `Dashboard::Health` | Four system checks with fix strings |
+| Timeline | `Dashboard::Timeline` | 30-day daily rollup |
+| Routing | `Dashboard::API` | HTTP-shape glue + per-endpoint formatting |
+Connections are released after each request so the dashboard never holds a
+WAL writer lock open across page loads.
+## Related CLI
+- `claude-memory digest [--since DAYS] [--output FILE]` — markdown report of
+  the same Trust + Knowledge + Conflicts + Feedback signals plus
+  **Context cost** (token-budget p50/p95) and **Quality** (score + rejection
+  rate) sections. Suitable for email or commit-into-repo.
+- `claude-memory show [--pending] [--source SOURCE]` *(0.11.0+)* — print
+  what memory would inject at the next SessionStart in plain Markdown.
+  Same `Hook::ContextInjector` path real sessions use, so the output
+  matches what Claude actually receives. Footer reports fact count, ~token
+  estimate, and char count.
+- `claude-memory stats --tokens [--since DAYS]` *(0.11.0+)* — token budget
+  histogram (p50/p95/avg/min/max + bucketed distribution) for SessionStart
+  context injections. Same data the Trust panel's Token budget block aggregates.
+- `claude-memory census [--root DIR]` — privacy-safe cross-project
+  predicate vocabulary scan; pairs with the Knowledge panel for "what
+  predicates does my whole tree use?".
+- `claude-memory stats --stale [--stale-days N]` — list facts the dashboard
+  flags as stale.
+- `claude-memory dedupe-conflicts` / `reclassify-references` — one-shot
+  cleanups for what the Conflicts and Knowledge → References panels surface.