claude_memory 0.12.0 → 0.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.claude/memory.sqlite3 +0 -0
- data/.claude/rules/claude_memory.generated.md +44 -48
- data/.claude/settings.local.json +2 -1
- data/.claude-plugin/marketplace.json +2 -2
- data/.claude-plugin/plugin.json +3 -5
- data/CHANGELOG.md +52 -0
- data/CLAUDE.md +13 -8
- data/README.md +46 -0
- data/db/migrations/019_add_observations.rb +43 -0
- data/db/migrations/020_add_observation_promotion.rb +33 -0
- data/docs/GETTING_STARTED.md +38 -0
- data/docs/api_stability.md +23 -7
- data/docs/architecture.md +18 -6
- data/docs/audit_runbook.md +67 -0
- data/docs/dashboard.md +28 -0
- data/docs/improvements.md +94 -1
- data/docs/influence/mastra-observational-memory.md +198 -0
- data/docs/influence/strands-agent-sops.md +163 -0
- data/docs/quality_review.md +45 -0
- data/docs/soak/audit_2026-06-03_agent-training-program.json +53 -0
- data/docs/soak/audit_2026-06-03_agentic.json +31 -0
- data/docs/soak/audit_2026-06-03_ai-software-architect.json +19 -0
- data/docs/soak/audit_2026-06-03_chaos_to_the_rescue.json +60 -0
- data/docs/soak/audit_2026-06-03_claude_memory.json +55 -0
- data/docs/soak/audit_2026-06-03_daily-vibe.json +59 -0
- data/docs/soak/audit_2026-06-03_minerva-sky.json +19 -0
- data/docs/soak/audit_2026-06-03_nowreading.dev.json +19 -0
- data/docs/soak/audit_2026-06-03_ups.dev.json +55 -0
- data/docs/soak/baseline_2026-06-03.md +145 -0
- data/lib/claude_memory/audit/checks.rb +149 -0
- data/lib/claude_memory/audit/runner.rb +4 -0
- data/lib/claude_memory/commands/census_command.rb +1 -1
- data/lib/claude_memory/commands/checks/embeddings_check.rb +97 -0
- data/lib/claude_memory/commands/doctor_command.rb +1 -0
- data/lib/claude_memory/commands/hook_command.rb +16 -3
- data/lib/claude_memory/commands/initializers/hooks_configurator.rb +3 -1
- data/lib/claude_memory/commands/install_skill_command.rb +4 -0
- data/lib/claude_memory/commands/observations_command.rb +367 -0
- data/lib/claude_memory/commands/registry.rb +2 -0
- data/lib/claude_memory/commands/setup_vectors_command.rb +182 -0
- data/lib/claude_memory/commands/skills/reflect.md +68 -0
- data/lib/claude_memory/commands/stats_command.rb +60 -1
- data/lib/claude_memory/dashboard/api.rb +4 -0
- data/lib/claude_memory/dashboard/index.html +154 -2
- data/lib/claude_memory/dashboard/observations.rb +115 -0
- data/lib/claude_memory/dashboard/server.rb +1 -0
- data/lib/claude_memory/distill/extraction.rb +6 -4
- data/lib/claude_memory/distill/null_distiller.rb +86 -3
- data/lib/claude_memory/distill/reference_material_detector.rb +4 -1
- data/lib/claude_memory/domain/observation.rb +118 -0
- data/lib/claude_memory/embeddings/generator.rb +1 -1
- data/lib/claude_memory/hook/context_injector.rb +100 -2
- data/lib/claude_memory/mcp/handlers/management_handlers.rb +113 -2
- data/lib/claude_memory/mcp/handlers/query_handlers.rb +48 -1
- data/lib/claude_memory/mcp/instructions_builder.rb +1 -0
- data/lib/claude_memory/mcp/query_guide.rb +28 -0
- data/lib/claude_memory/mcp/tool_definitions.rb +58 -0
- data/lib/claude_memory/mcp/tools.rb +3 -0
- data/lib/claude_memory/observe/observations_renderer.rb +49 -0
- data/lib/claude_memory/observe/reflector.rb +91 -0
- data/lib/claude_memory/publish.rb +53 -1
- data/lib/claude_memory/resolve/resolver.rb +45 -8
- data/lib/claude_memory/store/schema_manager.rb +1 -1
- data/lib/claude_memory/store/sqlite_store.rb +181 -0
- data/lib/claude_memory/sweep/maintenance.rb +15 -1
- data/lib/claude_memory/sweep/sweeper.rb +7 -1
- data/lib/claude_memory/version.rb +1 -1
- data/lib/claude_memory.rb +7 -0
- metadata +23 -1
data/docs/GETTING_STARTED.md
CHANGED
|
@@ -119,6 +119,44 @@ claude-memory promote <fact_id>
|
|
|
119
119
|
"Remember that I prefer descriptive commit messages - make that a global preference"
|
|
120
120
|
```
|
|
121
121
|
|
|
122
|
+
## Two Kinds of Memory: Facts and Observations
|
|
123
|
+
|
|
124
|
+
ClaudeMemory remembers two complementary things:
|
|
125
|
+
|
|
126
|
+
- **Facts** answer *"what is true"* — durable, structured truths about your
|
|
127
|
+
project (`uses_database: sqlite`, conventions, decisions). This is the
|
|
128
|
+
semantic layer the sections above describe.
|
|
129
|
+
- **Observations** answer *"what happened"* — an append-only narrative log of
|
|
130
|
+
the moments in your sessions ("decided to add a corroboration gate so
|
|
131
|
+
fleeting mentions don't harden into facts"). This is the **episodic** layer
|
|
132
|
+
(0.13.0+).
|
|
133
|
+
|
|
134
|
+
| | Facts | Observations |
|
|
135
|
+
|---|---|---|
|
|
136
|
+
| Captures | Durable truths | Events / narrative |
|
|
137
|
+
| Changes | Explicitly (supersession, rejection) | Automatically (dedup, consolidation, expiry) |
|
|
138
|
+
| Promotion | — | Promoted to a fact after corroboration (≥2 sightings) |
|
|
139
|
+
|
|
140
|
+
**Why this matters:** the distiller used to commit a fact the first time it
|
|
141
|
+
saw a claim — so a database named once in a comparison could become a false
|
|
142
|
+
`uses_database`. Observations make repeated sighting the gate: an observation
|
|
143
|
+
graduates to a fact only after it recurs. That's an anti-hallucination defense
|
|
144
|
+
built into the memory model.
|
|
145
|
+
|
|
146
|
+
Observations are managed for you — deduplicated and consolidated automatically
|
|
147
|
+
on `PreCompact`/`SessionEnd` at no extra API cost. To see or curate them:
|
|
148
|
+
|
|
149
|
+
```bash
|
|
150
|
+
# Inspect the episodic log (counts, promotion readiness, compression, recent)
|
|
151
|
+
claude-memory observations
|
|
152
|
+
|
|
153
|
+
# Promote a corroborated observation to a fact
|
|
154
|
+
claude-memory observations promote <id> --predicate uses_database --object sqlite
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
The dashboard's **Observations** panel shows the same at a glance, and the
|
|
158
|
+
`/reflect` skill runs a guided survey → consolidate → promote pass.
|
|
159
|
+
|
|
122
160
|
## Setting Up Your First Project
|
|
123
161
|
|
|
124
162
|
### Scenario 1: Fresh Install (New Project)
|
data/docs/api_stability.md
CHANGED
|
@@ -47,7 +47,7 @@ When ambiguous, default is **internal** — easier to promote later than demote.
|
|
|
47
47
|
|
|
48
48
|
## 2. Public CLI surface
|
|
49
49
|
|
|
50
|
-
All commands listed in `Commands::Registry::COMMANDS` are reachable via `claude-memory <subcommand>`. The full registered set (
|
|
50
|
+
All commands listed in `Commands::Registry::COMMANDS` are reachable via `claude-memory <subcommand>`. The full registered set (38 commands as of 0.12.1) is canonically stored in `lib/claude_memory/commands/registry.rb`. Stability:
|
|
51
51
|
|
|
52
52
|
### Stable commands (covered by semver)
|
|
53
53
|
|
|
@@ -90,6 +90,7 @@ May change in any minor; treat with care.
|
|
|
90
90
|
| `claude-memory recall --semantic` / `--mode=hybrid` | Semantic-recall flags depend on the embedding backend; `tfidf` is stable, `fastembed`/`api` may change configuration knobs. |
|
|
91
91
|
| `claude-memory embeddings` | Embedding-backend inspection; the JSON shape evolves with provider work. |
|
|
92
92
|
| `claude-memory import-auto-memory [--dry-run]` | Imports Claude Code auto-memory markdown files into the project DB as facts. Introduced 0.12.0 from the 2026-05-21 audit; argument shape and idempotency contract are stable but the heuristic for predicate mapping may evolve. |
|
|
93
|
+
| `claude-memory setup-vectors [--provider=NAME] [--model=NAME] [--no-reindex] [--dry-run] [--status]` | Documented opt-in path for enabling vector recall via fastembed. Introduced 0.12.1. Writes `CLAUDE_MEMORY_EMBEDDING_PROVIDER` (and optional `CLAUDE_MEMORY_EMBEDDING_MODEL`) to `.claude/settings.json` env block, then re-embeds via `IndexCommand`. fastembed remains a dev/test gem dep by design; install via `gem install fastembed` if not present. Argument shape stable; the underlying re-index implementation may evolve. |
|
|
93
94
|
|
|
94
95
|
### Internal / not for external automation
|
|
95
96
|
|
|
@@ -113,7 +114,7 @@ Renaming or repurposing a code is a major-version change.
|
|
|
113
114
|
|
|
114
115
|
## 3. Public MCP tool surface
|
|
115
116
|
|
|
116
|
-
All
|
|
117
|
+
All 28 tools registered via `MCP::ToolDefinitions.all` — 25 stable + 3 experimental (the observational-layer tools below). Argument schemas, return shapes (both `content` and `structuredContent`), and tool-annotation hints (`readOnlyHint`, `idempotentHint`, `destructiveHint`) are **stable** for the listed stable tools.
|
|
117
118
|
|
|
118
119
|
### Stable MCP tools
|
|
119
120
|
|
|
@@ -133,7 +134,7 @@ All 23 tools registered via `MCP::ToolDefinitions.all`. Argument schemas, return
|
|
|
133
134
|
| `memory.facts_by_context` | Context | Stable. |
|
|
134
135
|
| `memory.promote` | Management | Stable. |
|
|
135
136
|
| `memory.reject_fact` | Management | Stable since 0.10.0. |
|
|
136
|
-
| `memory.store_extraction` | Management | Argument schema (`facts`, `entities`, `decisions`) stable. |
|
|
137
|
+
| `memory.store_extraction` | Management | Argument schema (`facts`, `entities`, `decisions`) stable. The `observations` field (Layer-2 observer) is **experimental** while the observational layer is built out. |
|
|
137
138
|
| `memory.undistilled` | Distillation | Stable since 0.10.0. |
|
|
138
139
|
| `memory.mark_distilled` | Distillation | Stable since 0.10.0. |
|
|
139
140
|
| `memory.status` | Monitoring | Stable. |
|
|
@@ -145,6 +146,16 @@ All 23 tools registered via `MCP::ToolDefinitions.all`. Argument schemas, return
|
|
|
145
146
|
| `memory.check_setup` | Discovery | Stable. |
|
|
146
147
|
| `memory.list_projects` | Discovery | Stable since 0.10.0. |
|
|
147
148
|
|
|
149
|
+
### Experimental MCP tools
|
|
150
|
+
|
|
151
|
+
These are registered in `MCP::ToolDefinitions.all` but **not yet covered by the stability guarantees above** — argument schema and return shape may change while the feature is built out.
|
|
152
|
+
|
|
153
|
+
| Tool | Group | Status |
|
|
154
|
+
|---|---|---|
|
|
155
|
+
| `memory.observations` | Observational layer | Experimental (0.13.0+). Read-only listing of episodic observations by status/kind/priority with corroboration counts. Return shape may change. |
|
|
156
|
+
| `memory.promote_observation` | Observational layer | Experimental (0.13.0+). Promotes a corroborated observation (≥2 sightings) into a fact; refuses uncorroborated or already-promoted ones (anti-hallucination gate). Args/shape may change. |
|
|
157
|
+
| `memory.consolidate_observations` | Observational layer | Experimental (0.13.0+). Merges related observations into one synthesized row (corroboration combines, sources tombstoned via `consolidated_into`). Args/shape may change. |
|
|
158
|
+
|
|
148
159
|
### Stability of tool responses
|
|
149
160
|
|
|
150
161
|
Both response shapes are stable:
|
|
@@ -200,7 +211,7 @@ Adding a new field to `detail_json` is a stable-surface addition (non-breaking).
|
|
|
200
211
|
|
|
201
212
|
Current covered events (0.11.0):
|
|
202
213
|
|
|
203
|
-
- `hook_context`: `context_length`, `context_tokens` (since 0.11.0), `top_fact_ids`, `fact_count`.
|
|
214
|
+
- `hook_context`: `context_length`, `context_tokens` (since 0.11.0), `top_fact_ids`, `fact_count`. (`observation_count`, added with the observational layer, is additive and **experimental** — not yet on the smoke-gate manifest.)
|
|
204
215
|
- `roi_nudge`: `n`, `used`, `pct`, `prior_count` (all since 0.11.0).
|
|
205
216
|
|
|
206
217
|
`hook_ingest`, `hook_sweep`, `hook_publish` event detail fields are currently **internal** (not on the smoke-gate manifest). Promoting them to stable is a 0.12.x or later task.
|
|
@@ -254,7 +265,7 @@ If you need a feature from one of the internal classes, **open an issue** so we
|
|
|
254
265
|
|
|
255
266
|
### Schema migrations
|
|
256
267
|
|
|
257
|
-
Schema is at v18
|
|
268
|
+
Schema is at v20 (v18 shipped in 0.12.0; v19–v20 add the observational `observations` table in 0.13.0) with 20 migrations under `db/migrations/`. Migrations remain forward-compatible per the round-trip-spec convention (`feedback_round_trip_migration_specs.md`): each release's specs verify that DBs from the prior 3 schema boundaries can be migrated into the current schema without data loss.
|
|
258
269
|
|
|
259
270
|
**What's stable:**
|
|
260
271
|
|
|
@@ -267,6 +278,7 @@ Schema is at v18 as of 0.12.0 with 18 migrations under `db/migrations/`. Migrati
|
|
|
267
278
|
|
|
268
279
|
- The `vec0` virtual-table internals — sqlite-vec evolution may shift representation.
|
|
269
280
|
- `mcp_tool_calls` retention behavior (currently 90 days, configurable); the column set is stable, the retention default is not.
|
|
281
|
+
- The `observations` table (v19–v20, incl. `corroboration_count`/`promoted_at`/`promoted_fact_id`) — episodic layer. Column set may still change while the layer is experimental.
|
|
270
282
|
|
|
271
283
|
**What's internal:**
|
|
272
284
|
|
|
@@ -309,7 +321,11 @@ Run before tagging a release; wire into CI on the project's own DB to catch in-c
|
|
|
309
321
|
6. Distillation backlog < 100 (hard fail) / < 25 (warning).
|
|
310
322
|
7. Project active facts ≥ 5 (sanity floor — catches over-aggressive rejection).
|
|
311
323
|
|
|
312
|
-
Run via `bundle exec rspec spec/benchmarks/health/ --tag benchmark`.
|
|
324
|
+
Run via `bundle exec rspec spec/benchmarks/health/ --tag benchmark`. The
|
|
325
|
+
spec is **local-only** — `.claude/memory.sqlite3` is git-lfs tracked and CI
|
|
326
|
+
checkout doesn't pull LFS objects, so the spec auto-skips on CI when it
|
|
327
|
+
detects the unresolved pointer. Run it locally after `git lfs pull` to
|
|
328
|
+
validate signal contracts before tagging a release.
|
|
313
329
|
|
|
314
330
|
---
|
|
315
331
|
|
|
@@ -319,7 +335,7 @@ Listed here for honesty — these surfaces look public but are not.
|
|
|
319
335
|
|
|
320
336
|
- **Dashboard JSON HTTP API.** The `claude-memory dashboard` server's endpoints are an internal interface for the bundled UI. Don't build scripts against `GET /api/trust` etc. — endpoints, response shapes, and even URL paths may change without notice.
|
|
321
337
|
- **`activity_events.detail_json` fields not in `spec/smoke/expected_fields.yml`.** Inspecting a missing field during debugging is fine; relying on it in scripts is not.
|
|
322
|
-
- **The exact text of `additionalContext`.** The Markdown sections (`## Decisions`, `## Conventions`, `## Architecture`, `## Pending Knowledge Extraction`, `## Auto-Memory Mirror`) and their order are stable; the per-fact rendering format inside each section is tuned for prompt quality and may change.
|
|
338
|
+
- **The exact text of `additionalContext`.** The Markdown sections (`## Decisions`, `## Conventions`, `## Architecture`, `## Pending Knowledge Extraction`, `## Auto-Memory Mirror`) and their order are stable; the per-fact rendering format inside each section is tuned for prompt quality and may change. The `## Observations` and `## Observation Reflection` sections (observational layer) and the published `.claude/rules/claude_memory.observations.md` snapshot are **experimental** while the layer is built out.
|
|
323
339
|
- **Internal env vars** (anything not listed in `Configuration` instance methods or in this doc). Examples that exist but are internal: `CLAUDE_MEMORY_LOG_LEVEL`, debug flags surfaced during development.
|
|
324
340
|
- **Test/spec/fixture infrastructure.** `spec/benchmarks/`, `spec/evals/`, `spec/support/` are not public APIs.
|
|
325
341
|
- **Plugin-format paths.** `.claude-plugin/`, `scripts/serve-mcp.sh`, etc. are part of the Claude Code plugin format integration; treat them as opaque.
|
data/docs/architecture.md
CHANGED
|
@@ -14,20 +14,22 @@ ClaudeMemory is architected using Domain-Driven Design (DDD) principles with cle
|
|
|
14
14
|
│
|
|
15
15
|
┌──────────────────────▼──────────────────────────────────────┐
|
|
16
16
|
│ Core Domain Layer │
|
|
17
|
-
│ Domain Models: Fact, Entity, Provenance, Conflict
|
|
17
|
+
│ Domain Models: Fact, Entity, Provenance, Conflict, │
|
|
18
|
+
│ Observation (episodic) │
|
|
18
19
|
│ Value Objects: SessionId, TranscriptPath, FactId │
|
|
19
20
|
│ Null Objects: NullFact, NullExplanation │
|
|
20
21
|
└──────────────────────┬──────────────────────────────────────┘
|
|
21
22
|
│
|
|
22
23
|
┌──────────────────────▼──────────────────────────────────────┐
|
|
23
24
|
│ Business Logic Layer │
|
|
24
|
-
│ Recall → Resolve → Distill → Ingest → Publish
|
|
25
|
-
│
|
|
25
|
+
│ Recall → Resolve (semantic) → Distill → Ingest → Publish │
|
|
26
|
+
│ Observe → Reflect (episodic) → Sweep → Embeddings → │
|
|
27
|
+
│ MCP → Hook │
|
|
26
28
|
└──────────────────────┬──────────────────────────────────────┘
|
|
27
29
|
│
|
|
28
30
|
┌──────────────────────▼──────────────────────────────────────┐
|
|
29
31
|
│ Infrastructure Layer │
|
|
30
|
-
│ Store (SQLite
|
|
32
|
+
│ Store (SQLite v20 + WAL) → FileSystem → Index (FTS5+Vector)│
|
|
31
33
|
│ Templates │
|
|
32
34
|
└─────────────────────────────────────────────────────────────┘
|
|
33
35
|
```
|
|
@@ -129,10 +131,19 @@ end
|
|
|
129
131
|
- `DualQueryTemplate`: Query template handling for dual-database queries
|
|
130
132
|
|
|
131
133
|
#### Resolve (`resolve/`)
|
|
132
|
-
- Truth maintenance and conflict resolution
|
|
134
|
+
- Truth maintenance and conflict resolution (the **semantic** "what is true" layer)
|
|
133
135
|
- **Transaction safety**: Multi-step operations wrapped in DB transactions
|
|
134
136
|
- PredicatePolicy: Controls single vs. multi-value predicates
|
|
135
137
|
- Handles supersession and conflict detection
|
|
138
|
+
- Also persists observations from the extraction inside the same transaction (see Observe & Reflect)
|
|
139
|
+
|
|
140
|
+
#### Observe & Reflect (`observe/`)
|
|
141
|
+
- The **episodic** "what happened" layer, complementing the semantic fact store
|
|
142
|
+
- **Observer**: `NullDistiller` emits high-precision Layer-1 observations (regex); Claude-as-observer enriches them via the SessionStart context hook (Layer-2, no extra API cost)
|
|
143
|
+
- **Reflector** (`Observe::Reflector`): deterministic dedup + TTL-expiry of info-level observations runs on `PreCompact`/`SessionEnd`; semantic consolidation (Claude-as-reflector) rides the next turn's context hook
|
|
144
|
+
- **Append-only/tombstoning**: superseded observations are linked via `consolidated_into`, never deleted — preserving provenance
|
|
145
|
+
- **Promotion bridge**: observations are promoted to facts only after corroboration (≥2 sightings) — an anti-hallucination gate
|
|
146
|
+
- `ObservationsRenderer` formats the injected log; `Domain::Observation` is the immutable value object
|
|
136
147
|
|
|
137
148
|
#### Distill (`distill/`)
|
|
138
149
|
- Extracts facts and entities from transcripts
|
|
@@ -164,7 +175,7 @@ end
|
|
|
164
175
|
|
|
165
176
|
#### MCP (`mcp/`)
|
|
166
177
|
- Model Context Protocol server
|
|
167
|
-
- Exposes
|
|
178
|
+
- Exposes 28 tools including: recall, explain, promote, status, decisions, conventions, architecture, semantic search, check_setup, the observational-layer tools (observations, promote_observation, consolidate_observations), and more
|
|
168
179
|
- `ResponseFormatter`: Consistent MCP response formatting
|
|
169
180
|
- `SetupStatusAnalyzer`: Initialization and version status analysis
|
|
170
181
|
|
|
@@ -212,6 +223,7 @@ end
|
|
|
212
223
|
- `Reuse`: most-used facts within window
|
|
213
224
|
- `Health`: db / hooks / vec checks with actionable fix strings
|
|
214
225
|
- `Timeline`: 30-day daily rollup
|
|
226
|
+
- `Observations` (0.13.0+): episodic-layer panel — counts by status/kind/priority, corroboration + promotion readiness, source→observation compression ratio, recent timeline (first-class main-sidebar panel + Advanced tab)
|
|
215
227
|
- `FactPresenter`, `ScopedFactResolver`: shared rendering / scope-aware ID resolution
|
|
216
228
|
- Connections released after every request — no held WAL writer locks across page loads
|
|
217
229
|
- See [docs/dashboard.md](dashboard.md) for the user-facing guide
|
data/docs/audit_runbook.md
CHANGED
|
@@ -178,6 +178,73 @@ Exit code is `0` when `ok: true`, `1` otherwise. `--no-exit` always returns `0`.
|
|
|
178
178
|
3. Clean up: `claude-memory reject` the historical disputed/superseded rows (or accept them as historical record).
|
|
179
179
|
4. Re-audit.
|
|
180
180
|
|
|
181
|
+
### C011 — Orphaned observations
|
|
182
|
+
|
|
183
|
+
**Severity:** warn
|
|
184
|
+
|
|
185
|
+
**Scope:** both the project and global DBs (observations may be scoped either way).
|
|
186
|
+
|
|
187
|
+
**Triggered when:** an observation has `source_content_item_id` set but no `content_items` row with that id exists.
|
|
188
|
+
|
|
189
|
+
**Why it matters:** An observation's `source_content_item_id` is its provenance link back to the transcript chunk it was distilled from. A dangling pointer means the source row was pruned (or never existed), so the observation can no longer be explained — breaking the same provenance guarantee facts enjoy. Observations with a `nil` source (e.g. consolidated ones synthesized from several sources) are *not* flagged.
|
|
190
|
+
|
|
191
|
+
**Remediation:**
|
|
192
|
+
- Inspect with `memory.observations` (or the dashboard Observations tab).
|
|
193
|
+
- The table is append-only — do **not** delete. If the provenance is genuinely unrecoverable, let the Reflector consolidate or expire the row on the next PreCompact/SessionEnd pass.
|
|
194
|
+
|
|
195
|
+
### C012 — Observation promotion consistency
|
|
196
|
+
|
|
197
|
+
**Severity:** error
|
|
198
|
+
|
|
199
|
+
**Scope:** both DBs.
|
|
200
|
+
|
|
201
|
+
**Triggered when:** any of the following promotion-state invariants is violated —
|
|
202
|
+
- `promoted_at` is set but `promoted_fact_id` is `NULL`;
|
|
203
|
+
- `promoted_fact_id` points at a fact that does not exist;
|
|
204
|
+
- `promoted_fact_id` points at a fact that is **not** active (rejected/superseded);
|
|
205
|
+
- `promoted_fact_id` is set but `promoted_at` is `NULL`.
|
|
206
|
+
|
|
207
|
+
**Why it matters:** Promotion is meant to be atomic — `mark_observation_promoted` sets both `promoted_at` and `promoted_fact_id` pointing at a freshly-created, active fact. Half-set state means the write ran partially, or the target fact was later rejected/superseded, leaving the observation pointing at nothing usable. The promotion bridge keys off these columns, so an inconsistent row either re-promotes (duplicate facts) or is silently stuck.
|
|
208
|
+
|
|
209
|
+
**Remediation:**
|
|
210
|
+
1. `claude-memory explain <fact_id>` on the `promoted_fact_id` to see why the fact is missing/inactive.
|
|
211
|
+
2. If the fact was intentionally rejected, re-open the observation for re-promotion via `memory.promote_observation`.
|
|
212
|
+
3. If `mark_observation_promoted` half-ran, re-run promotion so both columns are set together.
|
|
213
|
+
|
|
214
|
+
### C013 — Observation tombstone-chain validity
|
|
215
|
+
|
|
216
|
+
**Severity:** error
|
|
217
|
+
|
|
218
|
+
**Scope:** both DBs.
|
|
219
|
+
|
|
220
|
+
**Triggered when:** any of the following tombstone invariants is violated —
|
|
221
|
+
- `consolidated_into` points at a non-existent observation;
|
|
222
|
+
- `consolidated_into` is a self-link (`consolidated_into == id`);
|
|
223
|
+
- a row is `status='active'` yet carries a `consolidated_into` target;
|
|
224
|
+
- a row is `status='consolidated'` yet has no `consolidated_into` keeper.
|
|
225
|
+
|
|
226
|
+
**Why it matters:** Supersession is append-only: a merged-away observation gets `status='consolidated'` and `consolidated_into` pointing at the surviving keeper, preserving lineage instead of hard-deleting (unlike Mastra's lossy drop). A broken chain corrupts that lineage — recall could surface a tombstoned row, or a consolidated row could orphan its history. A self-link or active-but-tombstoned row is a Reflector bug, not user error.
|
|
227
|
+
|
|
228
|
+
**Remediation:**
|
|
229
|
+
- Inspect with `memory.observations`.
|
|
230
|
+
- Re-running the deterministic Reflector (fires on PreCompact/SessionEnd) re-derives consolidation for dangling links.
|
|
231
|
+
- A self-link or `active` + `consolidated_into` row signals a Reflector defect — file it rather than hand-editing the append-only table.
|
|
232
|
+
|
|
233
|
+
### C014 — Observation status / corroboration sanity
|
|
234
|
+
|
|
235
|
+
**Severity:** warn
|
|
236
|
+
|
|
237
|
+
**Scope:** both DBs.
|
|
238
|
+
|
|
239
|
+
**Triggered when:** an observation has a `status` outside `active`/`consolidated`/`expired`, or a `corroboration_count` less than 1.
|
|
240
|
+
|
|
241
|
+
**Why it matters:** Every observation should carry a known lifecycle status and at least one sighting (a fresh insert counts as 1; the migration default is 1). An unknown status means a migration or an external writer bypassed `insert_observation`; a `corroboration_count < 1` means `increment_corroboration` math went negative. Both break downstream behavior — recall filters key off `status`, and the promotion gate keys off `corroboration_count`.
|
|
242
|
+
|
|
243
|
+
**Remediation:**
|
|
244
|
+
- Inspect with `memory.observations`.
|
|
245
|
+
- For a bad `corroboration_count`, re-derive sighting counts via the Reflector's dedup pass.
|
|
246
|
+
- For an unknown status, find the writer that bypassed `insert_observation` (the only sanctioned insert path).
|
|
247
|
+
|
|
181
248
|
## Adding a new check
|
|
182
249
|
|
|
183
250
|
The audit is extensible by design.
|
data/docs/dashboard.md
CHANGED
|
@@ -116,6 +116,29 @@ sqlite-vec coverage. Each surfaces an actionable fix string (e.g.,
|
|
|
116
116
|
"Run `claude-memory init` to install the standard hook set"). Status
|
|
117
117
|
escalates to the worst individual check (error > warning > healthy).
|
|
118
118
|
|
|
119
|
+
### Observations (episodic layer, 0.13.0+)
|
|
120
|
+
|
|
121
|
+
The episodic counterpart to the fact-based panels. Facts answer "what is
|
|
122
|
+
true"; **observations** are an append-only log of "what happened" in your
|
|
123
|
+
sessions. Surfaced both as a first-class sidebar panel (headline numbers)
|
|
124
|
+
and an Advanced → Observations tab (full detail):
|
|
125
|
+
|
|
126
|
+
- **Counts by status / kind / priority** — active vs. consolidated vs.
|
|
127
|
+
expired; decision / preference / event; 🔴 important / 🟡 maybe / 🟢 info.
|
|
128
|
+
- **Corroboration + promotion readiness** — how many observations have been
|
|
129
|
+
seen enough times (≥2, the corroboration gate) to be promotable to facts,
|
|
130
|
+
and the highest corroboration count seen. Promotion is the
|
|
131
|
+
anti-hallucination gate: a one-off mention never becomes a fact.
|
|
132
|
+
- **Compression ratio** — source content tokens ÷ observation tokens, the
|
|
133
|
+
Mastra-style measure of how much the episodic log condenses raw sessions.
|
|
134
|
+
- **Recent timeline** — the latest observations, newest first, with their
|
|
135
|
+
priority markers.
|
|
136
|
+
|
|
137
|
+
Promote a corroborated observation to a fact with `memory.promote_observation`
|
|
138
|
+
(or `claude-memory observations promote`), merge related ones with
|
|
139
|
+
`memory.consolidate_observations`, or run the `/reflect` skill for a guided
|
|
140
|
+
survey → consolidate → promote pass.
|
|
141
|
+
|
|
119
142
|
### Activity drill-down
|
|
120
143
|
|
|
121
144
|
Clicking any moment opens a modal with the parsed payload, prettified JSON,
|
|
@@ -190,3 +213,8 @@ WAL writer lock open across page loads.
|
|
|
190
213
|
flags as stale.
|
|
191
214
|
- `claude-memory dedupe-conflicts` / `reclassify-references` — one-shot
|
|
192
215
|
cleanups for what the Conflicts and Knowledge → References panels surface.
|
|
216
|
+
- `claude-memory observations [list|promote|consolidate]` *(0.13.0+)* — the
|
|
217
|
+
CLI mirror of the Observations panel: list/inspect the episodic log
|
|
218
|
+
(`--kind`, `--status`, `--scope`, `--json`), promote a corroborated
|
|
219
|
+
observation to a fact, or consolidate related ones. `claude-memory stats
|
|
220
|
+
--observations` prints the counts summary.
|
data/docs/improvements.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Improvements to Consider
|
|
2
2
|
|
|
3
|
-
*Updated: 2026-05-23 - Added AI Memory Systems Landscape Analysis (Nakajima/Opus 4.6 Research article, 2026-03-26) — meta-study of 7 benchmarks + ~12 systems. Four High Priority items: graph traversal as third RRF source (#64), temporal-aware retrieval (#65), bi-temporal schema cleanup (#66), LongMemEval integration (#67). One promotion: improvement #57 (provenance-strength ranking) Medium → High, validated as the "soft epistemic separation" pattern. See `docs/influence/ai-memory-systems-2026.md`. Previously: 2026-05-01 - Added Strands Agent SOPs study (article, not repo) — one M-priority item (parameter blocks in skill frontmatter); rest already implemented or deferred. See `docs/influence/strands-agent-sops.md`. Previously: 2026-04-28 (post-0.10.0) - Restructured 1.0 punchlist around milestone versions. **0.11.0 "Trust & Cost"** ships #47 (token budget), #48 (hallucination rate), #51 (claude-memory show), #53 (first-week ROI nudge — moved up from post-1.0), and a 3-scenario prototype of #49 (harm benchmark). **0.12.0 "Release Discipline"** ships #49 full corpus, #50 (CLAUDE.md baseline), #52 (benchmark scoreboard). **1.0.0** lands soak-validated #54/#55/#56 if time + new #59 API stability audit. See `docs/1_0_punchlist.md` for the full plan with calendar targets. Also added 2026-04-28: two ranking-signal gaps surfaced by the Mercury / "Why Karpathy's Second Brain Breaks" article (Zaid, 2026-04-28) — provenance-strength-aware ranking (#57) and reinforcement/decay scoring (#58). Earlier 2026-04-28 updates: opened the 1.0 punchlist track + added cq study. Previously: 2026-03-30 - Re-studied all 7 influencer repos. New recommendations: CLAUDE_CONFIG_DIR support (#26, from episodic-memory), Usage Stats / ROI Tracking (#27, from grepai v0.35.0). New Features to Avoid: AST-Aware Code Chunking (QMD), Custom Instructions via Env Var (lossless-claw v0.5.2), OpenClaw Context Injection (claude-mem v10.6.0). Repos with no changes: kbs (v0.2.1), claude-supermemory (v2.0.1), episodic-memory (v1.0.15). Previously: 14 features implemented through 2026-03-24.*
|
|
3
|
+
*Updated: 2026-06-17 - Added #70 (recall-preserving fact precision on real transcripts — live obs-experiment found Layer-1 fact noise from prose/comparisons/English-word collisions; claim-context gating was measured to crater the distillation benchmark Fact F1 0.958→0.64, so the lever is downstream: wire ReferenceMaterialDetector into the ingest path / Layer-2. Observation extraction was tightened in-branch; facts left at baseline). Earlier 2026-06-16 - Added #69 (self-heal the FTS rank index after concurrent ingest — live incident: hook-vs-MCP write contention leaves `ORDER BY rank` malformed while data stays intact, silently degrading recall until a manual `compact`). Earlier 2026-06-16 - Added Mastra Observational Memory study — one High Priority item (#68, episodic observation layer: Observer + Reflector + observation→fact promotion bridge) and one Medium item (compression/cache telemetry + LongMemEval episodic suite). Key insight: ClaudeMemory has no episodic layer; observations ("what happened") complement facts ("what is true"). See `docs/influence/mastra-observational-memory.md`. Previously: 2026-05-23 - Added AI Memory Systems Landscape Analysis (Nakajima/Opus 4.6 Research article, 2026-03-26) — meta-study of 7 benchmarks + ~12 systems. Four High Priority items: graph traversal as third RRF source (#64), temporal-aware retrieval (#65), bi-temporal schema cleanup (#66), LongMemEval integration (#67). One promotion: improvement #57 (provenance-strength ranking) Medium → High, validated as the "soft epistemic separation" pattern. See `docs/influence/ai-memory-systems-2026.md`. Previously: 2026-05-01 - Added Strands Agent SOPs study (article, not repo) — one M-priority item (parameter blocks in skill frontmatter); rest already implemented or deferred. See `docs/influence/strands-agent-sops.md`. Previously: 2026-04-28 (post-0.10.0) - Restructured 1.0 punchlist around milestone versions. **0.11.0 "Trust & Cost"** ships #47 (token budget), #48 (hallucination rate), #51 (claude-memory show), #53 (first-week ROI nudge — moved up from post-1.0), and a 3-scenario prototype of #49 (harm benchmark). **0.12.0 "Release Discipline"** ships #49 full corpus, #50 (CLAUDE.md baseline), #52 (benchmark scoreboard). **1.0.0** lands soak-validated #54/#55/#56 if time + new #59 API stability audit. See `docs/1_0_punchlist.md` for the full plan with calendar targets. Also added 2026-04-28: two ranking-signal gaps surfaced by the Mercury / "Why Karpathy's Second Brain Breaks" article (Zaid, 2026-04-28) — provenance-strength-aware ranking (#57) and reinforcement/decay scoring (#58). Earlier 2026-04-28 updates: opened the 1.0 punchlist track + added cq study. Previously: 2026-03-30 - Re-studied all 7 influencer repos. New recommendations: CLAUDE_CONFIG_DIR support (#26, from episodic-memory), Usage Stats / ROI Tracking (#27, from grepai v0.35.0). New Features to Avoid: AST-Aware Code Chunking (QMD), Custom Instructions via Env Var (lossless-claw v0.5.2), OpenClaw Context Injection (claude-mem v10.6.0). Repos with no changes: kbs (v0.2.1), claude-supermemory (v2.0.1), episodic-memory (v1.0.15). Previously: 14 features implemented through 2026-03-24.*
|
|
4
4
|
*Sources:*
|
|
5
5
|
- *[thedotmack/claude-mem](https://github.com/thedotmack/claude-mem) - Memory compression system (v10.6.3, re-studied 2026-03-30)*
|
|
6
6
|
- *[obra/episodic-memory](https://github.com/obra/episodic-memory) - Semantic conversation search (v1.0.15, re-studied 2026-03-30 — no changes)*
|
|
@@ -9,6 +9,7 @@
|
|
|
9
9
|
- *[tobi/qmd](https://github.com/tobi/qmd) - On-device hybrid search engine (v2.0.1+unreleased, re-studied 2026-03-30)*
|
|
10
10
|
- *[MadBomber/kbs](https://github.com/MadBomber/kbs) - Knowledge-Based System with RETE inference (v0.2.1, studied 2026-03-30 — no changes)*
|
|
11
11
|
- *[martian-engineering/lossless-claw](https://github.com/martian-engineering/lossless-claw) - DAG-based lossless context management (v0.5.2, re-studied 2026-03-30)*
|
|
12
|
+
- *[Mastra Observational Memory](https://mastra.ai/blog/observational-memory) - Text-based dual-agent episodic memory (studied 2026-06-16)*
|
|
12
13
|
|
|
13
14
|
This document contains only unimplemented improvements. Completed items are removed.
|
|
14
15
|
|
|
@@ -291,6 +292,61 @@ Source: 2026-04-28 1.0 readiness review (`docs/1_0_punchlist.md` #6)
|
|
|
291
292
|
|
|
292
293
|
---
|
|
293
294
|
|
|
295
|
+
### 69. Self-Heal the FTS Rank Index After Concurrent Ingest
|
|
296
|
+
|
|
297
|
+
Source: 2026-06-16 live incident on `claude/observational-layer-design-7662r9` — observed first-hand, not a study.
|
|
298
|
+
|
|
299
|
+
**Gap.** The contentless FTS5 index (`content_fts`) silently drifts into a broken state under concurrent writers: the ingest hook (`claude-memory hook ingest`) and the MCP server (`store_extraction` → `Index::LexicalFTS#index_content_item`) both write the same WAL DB, and a large ingest produces `"Database busy, retrying"` (`Store::RetryHandler`) followed by an FTS index where **plain `MATCH` works but `... ORDER BY rank` raises `database disk image is malformed`**. `integrity_check` passes and all rows are intact — so `recall`/`recall_index` ranking is silently degraded (the rank query throws or returns nothing) while nothing looks wrong. The only fix today is the user manually running `claude-memory compact` (which `rebuild_fts` + vacuums). Documented in `docs/influence/...` gotchas and surfaced reactively in the dashboard (`lib/claude_memory/dashboard/api.rb:338`), but never repaired automatically. Severe form (btree corruption, plain `MATCH` also failing) was separately seen when **two** memory MCP servers ran concurrently — de-duping to a single server removed that, but the benign rank-artifact still recurs from hook-vs-MCP contention alone.
|
|
300
|
+
|
|
301
|
+
**Implementation.**
|
|
302
|
+
|
|
303
|
+
- **Cheap probe + self-heal in the sweep tail.** In `Sweep::Maintenance`, add a `repair_fts_rank` step: run a bounded `SELECT rowid FROM content_fts WHERE content_fts MATCH 'a' ORDER BY rank LIMIT 1`; on `malformed`, call `Index::LexicalFTS#rebuild!` (already contentless-safe) instead of requiring a manual `compact`. Sweep already runs on PreCompact/SessionEnd, so recall ranking self-repairs within the session that broke it. Guard with a time budget so a huge index doesn't blow the hook timeout.
|
|
304
|
+
- **Reduce the contention that triggers it.** Raise the SQLite `busy_timeout` and use `BEGIN IMMEDIATE` for the FTS-writing transactions so the ingest hook and MCP server serialize cleanly instead of racing (the retry-loop WARN is the symptom). Confirm both the hook path and `ManagementHandlers#store_extraction` open connections with the same pragmas via `Store::RetryHandler`.
|
|
305
|
+
- **Proactive detection.** Add a `doctor` check that runs the rank probe and reports "FTS rank index needs rebuild — run `claude-memory compact`" (or auto-heals if the sweep step lands). Optionally a `roi_nudge`-style one-liner.
|
|
306
|
+
- **Option (larger).** Evaluate external-content FTS5 over `content_items` instead of contentless — more robust to `rebuild` and avoids the auxiliary-index drift entirely. Note as a follow-up, not part of the first fix.
|
|
307
|
+
|
|
308
|
+
**Acceptance.**
|
|
309
|
+
|
|
310
|
+
- An integration test that interleaves a large `hook ingest` with an MCP `store_extraction` against the same DB leaves `MATCH ... ORDER BY rank` working (or self-healed by the next sweep) — no manual `compact` required.
|
|
311
|
+
- `doctor` flags the rank-artifact when present.
|
|
312
|
+
- `"Database busy, retrying"` WARN frequency drops under the contention test.
|
|
313
|
+
|
|
314
|
+
**Effort.** Medium (~2 days). Self-heal step + pragmas are small; the integration test reproducing concurrent writers is the bulk.
|
|
315
|
+
|
|
316
|
+
**Why high priority.** It silently degrades `recall` — a core feature — and the user has no signal except empty/unranked results until they happen to run `compact`. Recurs on normal usage (hit live 2026-06-16). Relates to the WAL/connection-release discipline already noted for the dashboard.
|
|
317
|
+
|
|
318
|
+
---
|
|
319
|
+
|
|
320
|
+
### 70. Recall-Preserving Fact Precision on Real Transcripts (downstream, not regex)
|
|
321
|
+
|
|
322
|
+
Source: 2026-06-17 obs-experiment live session — observed first-hand.
|
|
323
|
+
|
|
324
|
+
**Gap.** Layer-1 `NullDistiller` fact extraction is noisy on real (large, mixed-content) transcripts. A live session on a tiny **Ruby + SQLite** project produced `uses_database = postgres/mysql`, `uses_framework = rails/express`, `uses_language = go`, `deployment_platform = docker` — all from prose mentions, comparisons, negations, instruction/skill text, and English-word collisions (`go` in "want this to go", `express` in "expressive"). This is the documented #48 hallucination problem, now characterized with live data.
|
|
325
|
+
|
|
326
|
+
**What was ruled out (with data).** Gating fact emission on a usage/claim verb near the entity (`using X`, `deployed on X`, `written in X`) was implemented and measured against `spec/benchmarks/distillation/extraction_spec.rb`:
|
|
327
|
+
|
|
328
|
+
| | Fact Precision | Fact Recall | F1 |
|
|
329
|
+
|---|---|---|---|
|
|
330
|
+
| baseline | 0.919 | 1.0 | 0.958 |
|
|
331
|
+
| + claim-context | 0.615 | 0.667 | **0.64** |
|
|
332
|
+
|
|
333
|
+
Regex can't separate terse legit claims (`MongoDB database`, `on AWS`, `Dockerized`) from terse prose mentions (`Postgres/MySQL buy you…`) — claim-context trades recall ~1:1 and even loses precision on the clean corpus. **Reverted.**
|
|
334
|
+
|
|
335
|
+
**Why distiller-level tightening is mostly off the table (2026-06-17 finding).** The benchmark *enforces* high-recall: case `ext_ent_negative` — "We looked at MongoDB but **decided against it**" — still **expects** `uses_database: mongodb`. The fact distiller is deliberately "extract every mention, filter downstream." So negation/comparison **exclusion** also regresses recall (it would drop the rejected-MongoDB case). The genuine downstream precision lever is Layer-2 / the observation→fact promotion gate, which already prevents Layer-1 noise from being *committed* as corroborated facts.
|
|
336
|
+
|
|
337
|
+
**Done (recall-safe slice):**
|
|
338
|
+
- ✅ **English-word collision fix** — `go` is matched case-sensitively (`Go`/`golang`) so the verb "go" / "go-to" no longer fires `uses_language=go`. Benchmark Fact Precision **0.919 → 0.935**, Recall held at **1.0** (it removed a real false positive — "my go-to database"). `react`/`rust`/`express` are the same collision class and can follow the same `(?-i:)` pattern, each verified against the benchmark.
|
|
339
|
+
|
|
340
|
+
**Deferred / not worth it:** wiring `ReferenceMaterialDetector` into the ingest path is a no-op for current Layer-1 (it produces stack predicates, not `convention` facts the detector targets) — skip until Layer-1 emits conventions.
|
|
341
|
+
|
|
342
|
+
**Acceptance (revised).** Distiller-level work is bounded to recall-safe English-word-collision fixes (benchmark Fact F1 ≥ baseline). Broader fact precision is owned by Layer-2 + the promotion gate, not regex.
|
|
343
|
+
|
|
344
|
+
**Effort.** Medium. Wiring the existing detector is small; extending its heuristics + a real-transcript precision fixture is the bulk.
|
|
345
|
+
|
|
346
|
+
**Why this priority.** Fact noise predates the observational layer (it's #48) and is mitigated downstream (promotion gate), so it's not a blocker — but it's the most-visible remaining quality gap on real sessions. The observational layer's Layer-1 observation extraction was tightened in this branch (high-precision/low-recall); facts were deliberately left at baseline pending this recall-preserving approach.
|
|
347
|
+
|
|
348
|
+
---
|
|
349
|
+
|
|
294
350
|
## cq Study (2026-04-28)
|
|
295
351
|
|
|
296
352
|
Source: docs/influence/cq.md — usefulness-focused study (not internals)
|
|
@@ -411,6 +467,43 @@ Source: `docs/influence/ai-memory-systems-2026.md` — meta-study of the Nakajim
|
|
|
411
467
|
|
|
412
468
|
---
|
|
413
469
|
|
|
470
|
+
## Mastra Observational Memory Study (2026-06-16)
|
|
471
|
+
|
|
472
|
+
Source: `docs/influence/mastra-observational-memory.md` — architecture study of Mastra's Observational Memory (OM), a text-based dual-agent (Observer + Reflector) episodic memory that compresses raw messages into an append-only, dated observation log living in the context window. SOTA on LongMemEval (84–95%) at 3–6× compression, cache-stable by design.
|
|
473
|
+
|
|
474
|
+
**Headline finding.** In OM's taxonomy ClaudeMemory is the thing it positions against: a structured *semantic* store injected *dynamically per query*. The gap OM exposes is not retrieval quality — it's that **ClaudeMemory has no episodic layer at all.** Facts answer "what is true"; observations answer "what happened." OM is purely episodic, we are purely semantic. The two are complementary, and we already own analogues of OM's Observer (distillation pipeline) and Reflector (Resolve + Sweep) — they just emit facts, not a narrative log.
|
|
475
|
+
|
|
476
|
+
### High Priority Recommendations
|
|
477
|
+
|
|
478
|
+
- [x] **68. Episodic Observation Layer (Observer + Reflector + promotion bridge)** ⭐ — ✅ **Shipped 2026-06-16/17** (phases 1–4)
|
|
479
|
+
- Value: Adds the missing episodic half of memory (narrative "what happened" log) and a cache-stable injection mode, on top of the existing semantic fact store. The promotion bridge (observation→fact on corroboration) doubles as an anti-hallucination gate for the documented reject-churn problem (distiller commits `uses_database`/`uses_framework` facts from one-off doc example text).
|
|
480
|
+
- Evidence: `docs/influence/mastra-observational-memory.md`. Our distill pipeline (`lib/claude_memory/distill/`) is already an Observer that emits facts; `resolve/` + `sweep/` is already a Reflector over facts. No episodic store exists.
|
|
481
|
+
- Implementation (phased):
|
|
482
|
+
1. ✅ **Shipped 2026-06-16** (schema v19): `observations` table (`body`, `kind`, `priority` 🔴/🟡/🟢, `scope`, `source_content_item_id`, `consolidated_into` lineage, `token_count`); `Domain::Observation`; NullDistiller emits observation rows; Resolver persists them; `memory.observations` read tool. **Append-only with tombstoning, not lossy drop** — preserves provenance.
|
|
483
|
+
2. ✅ **Shipped 2026-06-16**: two-block SessionStart injection via `ContextInjector` — Block 1 = observation log (🔴 marked, 🟡/🟢 stripped as Mastra does for the actor) ahead of Block 2 = undistilled tail; `Observe::ObservationsRenderer` shared with the published `.claude/rules/claude_memory.observations.md` snapshot; `observation_count` added to the `hook_context` activity event for token/compression measurement.
|
|
484
|
+
3. ✅ **Shipped 2026-06-17**: `Observe::Reflector` — deterministic, free (no LLM) GC. Dedupes near-identical active observations into the newest (tombstone via `consolidated_into`) and expires stale info-level (🟢) ones past a TTL (`observation_info_ttl_days`, default 30); 🔴/🟡 never expire. Provenance-preserving (rows tombstoned, never deleted). Wired into `Sweep::Maintenance#reflect_observations` → `Sweeper#run!`, so it runs on the existing `PreCompact`/`SessionEnd` sweep — context-pressure-triggered, the analog of Mastra's ~40k-token threshold. (Semantic "merge related/surface patterns" deferred to phase 4 — needs the LLM.)
|
|
485
|
+
4. ✅ **Shipped 2026-06-17** (schema v20): the observation→fact **promotion bridge**. Dedup folds duplicates' `corroboration_count` into the keeper; once an observation crosses `Domain::Observation::PROMOTION_THRESHOLD` (2) sightings it becomes a promotion candidate. `memory.promote_observation` creates the fact via the resolver, links provenance, marks the observation promoted, and **refuses uncorroborated observations server-side** — the anti-hallucination gate. `ContextInjector` surfaces candidates in a SessionStart "## Observation Reflection" section instructing Claude to promote inline (automatic semantic reflection, no extra API cost); a manual `/reflect` skill drives deep on-demand passes. (Trigger is SessionStart rather than PreCompact — the already-wired free injection point; a PreCompact context hook is a possible future refinement.)
|
|
486
|
+
5. ✅ **Shipped 2026-06-17** (branch `claude/observational-layer-complete`): the LLM half. **Layer-2 Claude-as-observer** — the SessionStart extraction prompt asks Claude to emit episodic observations in the `observations` field of `memory.store_extraction` (coerced/validated at the handler border, persisted via the resolver), making the log rich where Layer-1 regex is high-precision/low-recall. **Semantic reflection** — `memory.consolidate_observations` merges related-but-differently-worded observations into one synthesized row with *combined* corroboration (which can tip it over the promotion gate), tombstoning the sources; surfaced in the reflection section + `/reflect`. (Also fixed a latent Liskov bug: `ReferenceMaterialDetector#reclassify` dropped observations when a fact was present.)
|
|
487
|
+
6. ✅ **Shipped 2026-06-17** (branch `claude/observational-layer-complete`): observability + measurement. `Dashboard::Observations` panel (`/api/observations`, Advanced → Observations tab) — counts by status/kind/priority, corroboration + promotion readiness, recent timeline, and a **compression ratio** (source content tokens ÷ observation tokens, Mastra-style). The compression metric is the measurement half of design rec E; a full LongMemEval-style episodic benchmark remains (overlaps #67).
|
|
488
|
+
7. ✅ **Shipped 2026-06-18** (branch `claude/observational-layer-complete`): polish. **PreCompact reflection trigger** — `claude-memory hook context` injects only the reflection nudge (`ContextInjector#reflection_context`) on PreCompact (context pressure, the Mastra token-threshold analog), wired into the standard PreCompact hook set; not the full snapshot. **Observation↔fact provenance** — `memory.observations` exposes status/corroboration_count/promoted_fact_id/consolidated_into; `memory.explain(fact_id)` shows `promoted_from_observations` (reverse link via `observations_for_fact`). Full observational layer complete; remaining: LongMemEval episodic benchmark (#67).
|
|
489
|
+
- Effort: Large, phased. Phase 1 ~2-3 days; full arc ~2 weeks. Reuses distill/resolve/sweep/publish/context-hook machinery and `context_tokens` telemetry.
|
|
490
|
+
- Trade-off: reflection is automatic on *lifecycle events* (compaction/session boundaries), not a wall clock — Claude Code has no timer/cron hook, and Routines/subagents incur separate token budgets (rejected). Observer/Reflector reuse the existing session (no extra API cost). **Augments dynamic recall, does not replace it (user-confirmed 2026-06-16).** See claude-code-guide consultation in the influence doc.
|
|
491
|
+
|
|
492
|
+
### Medium Priority
|
|
493
|
+
|
|
494
|
+
- [ ] **Compression / cache telemetry + LongMemEval episodic suite** (see influence doc rec E)
|
|
495
|
+
- Value: Report compression ratio and token reduction on Trust/Health panels using existing `context_tokens` events (0.11.0). Add a LongMemEval-style long-session suite to DevMemBench to score the episodic layer. Overlaps with existing item #67 (LongMemEval integration) — coordinate.
|
|
496
|
+
- Effort: Medium. Depends on #68 phase 1-2.
|
|
497
|
+
|
|
498
|
+
### Features to Avoid (from this study)
|
|
499
|
+
|
|
500
|
+
- **Two always-on background LLM agents** — violates the no-separate-API-call convention. Observer = context-hook injection; Reflector = deterministic shell-side GC + `PreCompact`-injected semantic consolidation (rides the existing session).
|
|
501
|
+
- **Claude Code Routines / subagents for recurring reflection** — Routines run as a separate scheduled cloud session; subagents run in their own context (~7× tokens). Both incur extra spend; reserve only for an explicitly opted-in one-off backfill.
|
|
502
|
+
- **Lossy drop on reflection** ("never forgives") — we tombstone via `consolidated_into` and retain raw `content_items`; provenance is non-negotiable.
|
|
503
|
+
- **Replacing dynamic recall with a wholesale-loaded log** — augment, don't replace; keep `memory.recall` for targeted lookups.
|
|
504
|
+
|
|
505
|
+
---
|
|
506
|
+
|
|
414
507
|
## Medium Priority
|
|
415
508
|
|
|
416
509
|
### ~~18. Shell Completion for CLI~~ ✅ Implemented 2026-03-20
|