nlm-memory 0.4.1 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli/nlm.js +221 -32
- package/dist/cli/nlm.js.map +1 -1
- package/dist/core/adapters/cursor.d.ts +45 -0
- package/dist/core/adapters/cursor.js +397 -0
- package/dist/core/adapters/cursor.js.map +1 -0
- package/dist/core/adapters/from-source.js +10 -0
- package/dist/core/adapters/from-source.js.map +1 -1
- package/dist/core/adapters/windsurf.d.ts +44 -0
- package/dist/core/adapters/windsurf.js +299 -0
- package/dist/core/adapters/windsurf.js.map +1 -0
- package/dist/core/hook/claude-settings.d.ts +12 -5
- package/dist/core/hook/claude-settings.js +21 -6
- package/dist/core/hook/claude-settings.js.map +1 -1
- package/dist/core/sources/source-registry.d.ts +1 -1
- package/dist/core/sources/source-registry.js +18 -0
- package/dist/core/sources/source-registry.js.map +1 -1
- package/dist/core/storage/sqlite-session-store.d.ts +2 -0
- package/dist/core/storage/sqlite-session-store.js +38 -2
- package/dist/core/storage/sqlite-session-store.js.map +1 -1
- package/dist/hook/hook-auth.d.ts +13 -0
- package/dist/hook/hook-auth.js +19 -0
- package/dist/hook/hook-auth.js.map +1 -0
- package/dist/hook/prompt-recall-hook.js +7 -1
- package/dist/hook/prompt-recall-hook.js.map +1 -1
- package/dist/hook/session-start-hook.js +4 -1
- package/dist/hook/session-start-hook.js.map +1 -1
- package/dist/hook/stop-hook.js +4 -1
- package/dist/hook/stop-hook.js.map +1 -1
- package/dist/http/app.d.ts +2 -0
- package/dist/http/app.js +74 -0
- package/dist/http/app.js.map +1 -1
- package/dist/install/claude-code.js +1 -1
- package/dist/install/claude-code.js.map +1 -1
- package/dist/install/cursor.d.ts +25 -0
- package/dist/install/cursor.js +43 -0
- package/dist/install/cursor.js.map +1 -0
- package/dist/install/nlm-dir-perms.d.ts +19 -0
- package/dist/install/nlm-dir-perms.js +43 -0
- package/dist/install/nlm-dir-perms.js.map +1 -0
- package/dist/install/ollama.d.ts +18 -1
- package/dist/install/ollama.js +68 -10
- package/dist/install/ollama.js.map +1 -1
- package/dist/install/setup.d.ts +4 -0
- package/dist/install/setup.js +141 -18
- package/dist/install/setup.js.map +1 -1
- package/dist/install/windsurf.d.ts +25 -0
- package/dist/install/windsurf.js +43 -0
- package/dist/install/windsurf.js.map +1 -0
- package/dist/shared/types.d.ts +4 -0
- package/dist/ui/assets/{index-BA6IpU8g.css → index-C8cpwbYJ.css} +1 -1
- package/dist/ui/assets/index-CB50QnL-.js +69 -0
- package/dist/ui/index.html +2 -2
- package/logs/CHANGELOG/CHANGELOG-2026.md +186 -0
- package/logs/CHANGELOG/CHANGELOG.md +107 -235
- package/migrations/014_sources_cursor.sql +30 -0
- package/migrations/015_sources_windsurf.sql +30 -0
- package/package.json +1 -1
- package/plugin/scripts/prompt-recall-hook.mjs +55 -4
- package/plugin/scripts/stop-hook.mjs +57 -6
- package/src/cli/nlm.ts +224 -31
- package/src/core/adapters/cursor.ts +486 -0
- package/src/core/adapters/from-source.ts +10 -0
- package/src/core/adapters/windsurf.ts +386 -0
- package/src/core/hook/claude-settings.ts +30 -9
- package/src/core/sources/source-registry.ts +19 -1
- package/src/core/storage/sqlite-session-store.ts +46 -1
- package/src/hook/hook-auth.ts +18 -0
- package/src/hook/prompt-recall-hook.ts +7 -1
- package/src/hook/session-start-hook.ts +4 -1
- package/src/hook/stop-hook.ts +4 -1
- package/src/http/app.ts +78 -0
- package/src/install/claude-code.ts +1 -1
- package/src/install/cursor.ts +68 -0
- package/src/install/nlm-dir-perms.ts +55 -0
- package/src/install/ollama.ts +86 -10
- package/src/install/setup.ts +138 -17
- package/src/install/windsurf.ts +68 -0
- package/src/shared/types.ts +4 -0
- package/src/ui/components/SessionDrawer.tsx +97 -34
- package/src/ui/pages/River.tsx +90 -44
- package/src/ui/pages/Search.tsx +357 -64
- package/src/ui/pages/Thread.tsx +267 -56
- package/src/ui/styles.css +129 -5
- package/tests/integration/getbyids-sqlite.test.ts +40 -0
- package/tests/integration/hook-claude-settings.test.ts +14 -1
- package/tests/integration/mcp.test.ts +12 -0
- package/tests/integration/source-registry.test.ts +5 -3
- package/tests/unit/core/adapters/cursor.test.ts +485 -0
- package/tests/unit/core/adapters/windsurf.test.ts +416 -0
- package/dist/ui/assets/index-B_qIVV0k.js +0 -69
package/dist/ui/index.html
CHANGED
|
@@ -4,8 +4,8 @@
|
|
|
4
4
|
<meta charset="UTF-8" />
|
|
5
5
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
|
6
6
|
<title>nlm memory</title>
|
|
7
|
-
<script type="module" crossorigin src="/ui/assets/index-
|
|
8
|
-
<link rel="stylesheet" crossorigin href="/ui/assets/index-
|
|
7
|
+
<script type="module" crossorigin src="/ui/assets/index-CB50QnL-.js"></script>
|
|
8
|
+
<link rel="stylesheet" crossorigin href="/ui/assets/index-C8cpwbYJ.css">
|
|
9
9
|
</head>
|
|
10
10
|
<body>
|
|
11
11
|
<div id="root"></div>
|
|
@@ -1,4 +1,52 @@
|
|
|
1
1
|
|
|
2
|
+
## 2026-05-28 — C1: OpenCode adapter (SQLite-based, `opencode/1.0`)
|
|
3
|
+
|
|
4
|
+
OpenCode stores all sessions in a single SQLite DB (`~/Library/Application Support/opencode/opencode.db` on macOS, `$XDG_DATA_HOME/opencode/opencode.db` on Linux) rather than per-session JSONL files. The adapter reads it via `better-sqlite3` in readonly mode, reusing the same `TranscriptAdapter` port as Claude Code, Hermes, and pi.
|
|
5
|
+
|
|
6
|
+
**What ships**
|
|
7
|
+
|
|
8
|
+
- `src/core/adapters/opencode.ts` (new) — `OpenCodeAdapter` class. `detect()` checks for the DB file. `discover()` queries `session WHERE time_archived IS NULL` with optional `time_updated >= since` filter. `parseSession(sessionId)` joins the `session`, `message`, and `part` tables: extracts `text` parts (non-ignored) and `tool` parts (summarized as `[tool: <name>]`), skips structural parts (step-start/finish, reasoning, compaction, snapshot, patch, agent, retry). Label comes from `session.title` unless it's `"New session"`, in which case it falls back to the first user turn. `gitBranch` read from `.git/HEAD` in `session.directory`. `sourcePath` is `${dbPath}::${sessionId}`.
|
|
9
|
+
- `migrations/010_sources_opencode.sql` (new) — SQLite table-recreate migration to add `"opencode"` to the `sources.kind` CHECK constraint (SQLite does not support `ALTER COLUMN`). Copies existing rows, drops old table, renames new.
|
|
10
|
+
- `src/core/adapters/from-source.ts` — `"opencode"` case added to `adapterFromSource` switch.
|
|
11
|
+
- `src/core/sources/source-registry.ts` — `SourceKind` union extended; `seedDefaults()` now seeds 4 presets (added OpenCode row, auto-enabled if DB exists).
|
|
12
|
+
- `tests/unit/core/adapters/opencode.test.ts` (new) — 15 tests: detect enabled/disabled, discover (all sessions, archived exclusion, since filter, absent DB), parseSession (null for unknown, null for no usable turns, turn count + roles, ignored-part skipping, tool-part summarization, title label, fallback label, sourcePath format, projectDir, absent DB, ISO timestamps), and metadata assertions.
|
|
13
|
+
- `tests/integration/source-registry.test.ts` — two assertions updated: "seeds three presets" → "seeds four presets"; kind list updated to include `"opencode"`.
|
|
14
|
+
|
|
15
|
+
**Architecture note**
|
|
16
|
+
|
|
17
|
+
The `discover()` / `parseSession()` contract treats session IDs (not file paths) as the identifying string — the interface's `path: string` param is opaque, so this is valid. Users with OpenCode already installed get the source auto-enabled on first `nlm migrate` + daemon restart with no manual configuration.
|
|
18
|
+
|
|
19
|
+
**Tests: 488 pass** (was 470 before this session). All 57 test files green, build clean.
|
|
20
|
+
|
|
21
|
+
**Next:** README rewrite (D) — drop "self-improving accuracy" promise; lead with the three moats (editable timeline, cross-runtime MCP reach, 97.2% R@5). Then NousResearch Hermes adapter (#165, P1).
|
|
22
|
+
|
|
23
|
+
## 2026-05-28 — Code review: HOOK_SCRIPT_MARKERS bug caught and patched (44fec62)
|
|
24
|
+
|
|
25
|
+
`code-review:code-review` skill run against commits `10c16ac..285fe9e`. One confirmed bug found and fixed: `HOOK_SCRIPT_MARKERS` in `claude-settings.ts` did not include the three Phase 2 hook filenames (`session-start-hook.js`, `pre-compact-hook.js`, `subagent-start-hook.js`). Consequence: `nlm hook uninstall` silently left all three hooks behind; each reinstall appended a duplicate instead of replacing. Live settings had two `SessionStart` NLM entries. Fix: added three filenames to `HOOK_SCRIPT_MARKERS`, updated stale file-level comment, rebuilt, reinstalled. Settings deduplicated (1 entry per event × 6 hooks). 436/436 tests pass. No other confirmed bugs from the review — four lower-confidence items scored below 80 and were not acted on.
|
|
26
|
+
|
|
27
|
+
**State:** `nlm v0.3.0` installed globally. 6 hooks clean in `~/.claude/settings.json`. Shadow mode live.
|
|
28
|
+
|
|
29
|
+
**Next:** `nlm useful-scan` CLI (B1 full); C1 OpenCode adapter #180 (P1); B3 extract-triples redesign; tests for `session-start-hook.ts`.
|
|
30
|
+
|
|
31
|
+
## 2026-05-28 — Deploy v0.3.0: 6 hooks live; cite_session double-count fixed; useful_hit_rate stub; session-start source added
|
|
32
|
+
|
|
33
|
+
Four commits on main (`976e549` → `d013caf`). All 436 tests green throughout.
|
|
34
|
+
|
|
35
|
+
1. **B2 double-count fix** (`976e549`): `citation-detect.ts` was re-detecting `cite_session` tool_uses in the Stop hook and writing a second citation log entry. MCP handler already calls `appendCitation()` directly. Fix: skip `cite_session` in Stop hook detector; updated 5 tests in `citation-detect-cite-session.test.ts`.
|
|
36
|
+
2. **B1 stub** (`976e549`): added `useful_hit_rate: null` to `StatsResult` + both `recallStats()` return paths. Daily digest shows "pending" cleanly instead of a field-access error. Unblocks schema for future `nlm useful-scan` CLI.
|
|
37
|
+
3. **Phase 2 hook wiring** (`becb591`): `ALL_HOOKS` now includes SessionStart, PreCompact, SubagentStart. Version string corrected 0.2.0-dev → 0.3.0.
|
|
38
|
+
4. **session-start source** (`d013caf`): `src/hook/session-start-hook.ts` written against current interfaces (stale dist imported `loadSurfacedForBudget` that no longer exists). `ClaudeHookEvent` union extended with `SessionStart` + `SubagentStart`.
|
|
39
|
+
|
|
40
|
+
**State:** `nlm v0.3.0` installed globally, all 6 hooks active in shadow mode. Live measurement window open.
|
|
41
|
+
|
|
42
|
+
**Next:** `nlm useful-scan` CLI (B1 full implementation); B3 extract-triples redesign; C1 OpenCode adapter #180.
|
|
43
|
+
|
|
44
|
+
## 2026-05-28 — D4 thesis pivot: citation moat downgraded permanently; adapter breadth + editable timeline elevated; Phase 0/2/3 engineering landed
|
|
45
|
+
|
|
46
|
+
Full-day arc on 2026-05-27 producing three clusters of work: a 3-agent audit exposing recall-layer defects, five engineering branches integrated (Phases 0/2/3 of the 90-day plan), and a D4 strategic-pivot decision ending in a permanent thesis revision. The cite_session MCP tool lands on this branch (`phase-1c-cite-tool`) as the last Phase 0 piece.
|
|
47
|
+
|
|
48
|
+
**D4 thesis pivot (permanent):** citation-trained-reranker moat hypothesis fails on fundamentals (corpus too small at ~3,800 rows/year, cross-operator pooling violates local-first). Citation feedback loop's new role: quality-monitoring only. Three elevated moats: (1) editable timeline/supersedence — schema-level, retrofit-impossible; (2) cross-runtime reach via MCP; (3) passive corpus quality at 97.2% R@5. Adapter breadth elevated to primary workstream.
|
|
49
|
+
|
|
2
50
|
## 2026-05-28 — B1 full: nlm useful-scan CLI + useful_hit_rate live in GET /api/recall/stats
|
|
3
51
|
|
|
4
52
|
Shipped the full useful-scan implementation. The `useful_hit_rate: null` stub is now a real ratio backed by `~/.nlm/useful-hit-log.jsonl`.
|
|
@@ -1387,3 +1435,141 @@ The 21 kw-only failures split into two modes:
|
|
|
1387
1435
|
**Next priorities:** Build E′ (conditional asymmetric RRF); Mode B floor fix; alt-embedding A/B deferred; stop hook citation rate; cross-runtime adapters.
|
|
1388
1436
|
|
|
1389
1437
|
Vault: full diagnosis filed at `Ventures/nlm-memory/track-record.md` (2026-05-26 second entry).
|
|
1438
|
+
|
|
1439
|
+
|
|
1440
|
+
## 2026-05-27 — Codex CLI adapter: marketplace plugin + MCP config wiring + interactive-mode hook dispatch
|
|
1441
|
+
|
|
1442
|
+
Cross-runtime adapter work, first target landed. NLM is now installable on Codex CLI via `nlm connect codex`, which registers a local plugin marketplace, installs the `nlm-memory` plugin, writes a sentinel-bracketed `[mcp_servers.nlm-memory]` block to `~/.codex/config.toml`, and (optionally with `--with-hooks`) drops a legacy `~/.codex/hooks.json` fallback. Designed to mirror agentmemory's distribution pattern but the integration surface for current Codex (0.134.0) is materially different from both Codex Desktop and the wiki's 2026-05-23 prediction.
|
|
1443
|
+
|
|
1444
|
+
**What ships**
|
|
1445
|
+
|
|
1446
|
+
- `plugin/.codex-plugin/plugin.json` — Codex plugin manifest declaring `mcpServers: "./.mcp.json"` and `hooks: "./hooks/hooks.json"` pointers
|
|
1447
|
+
- `plugin/hooks/hooks.json` — `UserPromptSubmit` + `Stop` event registrations, scripts referenced via `${CLAUDE_PLUGIN_ROOT}`
|
|
1448
|
+
- `plugin/.mcp.json` — MCP server registration (spawns `nlm mcp` over stdio); duplicated by the direct config.toml writer for redundancy
|
|
1449
|
+
- `plugin/scripts/{prompt-recall-hook,stop-hook}.mjs` — esbuild single-file bundles of the existing TS hook entries, build pinned in `scripts/build-codex-plugin.mjs`
|
|
1450
|
+
- `.agents/plugins/marketplace.json` — marketplace manifest declaring the plugin and its source path (`./plugin`)
|
|
1451
|
+
- `src/install/codex.ts` — `connectCodex` / `disconnectCodex` / `writeMcpServerToConfig` / `removeMcpServerFromConfig` / `writeLegacyHooks` / `removeLegacyHooks`. Marketplace + plugin add are delegated to the `codex` binary (it owns trust + snapshot state); MCP config and hooks.json are written directly with sentinel markers so disconnect can strip exact regions without touching user-authored content.
|
|
1452
|
+
- `src/cli/nlm.ts` — `nlm connect codex` and `nlm disconnect codex` commands. Flags: `--source <owner/repo>` (default `pbmagnet4/nlm-memory-ts`), `--local` shortcut for dev, `--with-hooks` to also write the legacy fallback, `--dry-run`.
|
|
1453
|
+
|
|
1454
|
+
**The four wrong-then-right turns worth keeping in memory**
|
|
1455
|
+
|
|
1456
|
+
1. *Codex hooks are not Claude-Code-shape settings.json entries.* The 2026-05-23 wiki claim of "identical schema, ~95% script reuse" was wrong on the install mechanism. Codex uses a marketplace + plugin architecture. Hook *contract* (events, stdin payload, stdout convention) is identical to Claude Code; install path is entirely different. Script logic reuses verbatim.
|
|
1457
|
+
2. *Marketplace requires a `.agents/plugins/marketplace.json` at the repo root.* First connect attempt failed with `marketplace root does not contain a supported manifest` until that file landed. Reverse-engineered from `~/.codex/.tmp/plugins/.agents/plugins/marketplace.json` shipped by `openai-curated`.
|
|
1458
|
+
3. *The marketplace policy field is enum-constrained.* `authentication: "NONE"` rejected as `unknown variant`; only `"ON_INSTALL"` and `"ON_USE"` accepted. NLM has no auth to do, so `"ON_USE"` was picked as a no-op-on-use default. Marketplace went green after the swap.
|
|
1459
|
+
4. *`--dangerously-bypass-hook-trust` is misleadingly named.* The flag warns "hooks may run without review for this invocation" but in practice does not bypass trust at all. Hooks dispatched only after persisting trust via an interactive Codex session. Once trust landed in `[hooks.state]`, hooks fired in subsequent `codex exec` (non-interactive) calls too. The bypass flag's real role is unclear.
|
|
1460
|
+
|
|
1461
|
+
**Verified end-to-end** (`019e69fa-4ea1-7b10-8c66-70bda64ba086` is the codex session used for final validation)
|
|
1462
|
+
|
|
1463
|
+
- ✅ `codex plugin marketplace add ./` (local source) succeeds
|
|
1464
|
+
- ✅ `codex plugin add nlm-memory@nlm-memory-ts` produces `installed, enabled` in `codex plugin list`
|
|
1465
|
+
- ✅ Cached plugin at `~/.codex/plugins/cache/nlm-memory-ts/nlm-memory/0.3.0/` contains all expected files including dotfile dirs (`.codex-plugin/`, `.mcp.json`)
|
|
1466
|
+
- ✅ `[mcp_servers.nlm-memory]` block written to `~/.codex/config.toml` between sentinels; idempotent under repeated connects; cleanly stripped on disconnect
|
|
1467
|
+
- ✅ `UserPromptSubmit` hook dispatches from plugin path: codex stdout shows `hook: UserPromptSubmit` / `hook: UserPromptSubmit Completed`, hook-log gains an entry with codex session UUID (`019e...`), recall ran, gate evaluated, would-inject populated, shadow mode logged correctly
|
|
1468
|
+
- ✅ Plugin-only default (`nlm connect codex` without `--with-hooks`) fires UserPromptSubmit exactly once per turn. The earlier double-fire with `--with-hooks` enabled (plugin path + legacy `~/.codex/hooks.json` both fired) is exactly why `--with-hooks` stays opt-in
|
|
1469
|
+
- ✅ `codex_features list` confirms `hooks: stable, true` (so the runtime supports them) but `plugin_hooks: removed, false` (the older feature flag is dead; current path is the `hooks` engine with plugin-bundled config pointers)
|
|
1470
|
+
|
|
1471
|
+
**Not yet verified**
|
|
1472
|
+
|
|
1473
|
+
- ⏳ `Stop` hook dispatch — needs a one-time interactive trust approval before it fires (Codex only prompts for trust on hooks that have a chance to run; `codex exec` -p with bypass-trust did not surface a Stop prompt). Will land on Edward's next interactive `codex` turn.
|
|
1474
|
+
- ⏳ Remote marketplace install (`codex plugin marketplace add pbmagnet4/nlm-memory-ts`). The local install is the harder code path (the marketplace.json had to be authored from scratch); remote install reuses the same files via git fetch. Verifying in this session's tail after the GitHub push.
|
|
1475
|
+
|
|
1476
|
+
**Trust mechanics, for the future**
|
|
1477
|
+
|
|
1478
|
+
Codex persists hook trust per `(source, event, ...)` tuple under `[hooks.state]` in `config.toml`. Once a user approves a hook the first time, subsequent invocations (including `codex exec`) fire without prompting. The hash is content-addressed — a release that changes a script binary requires re-trust. This means `nlm connect codex` from a fresh install always requires one interactive `codex` turn to bootstrap trust before hooks fire; we cannot do that step on the user's behalf.
|
|
1479
|
+
|
|
1480
|
+
**Build pipeline**
|
|
1481
|
+
|
|
1482
|
+
`npm run build` now chains `build:server` (tsc) + `build:ui` (vite) + `build:codex-plugin` (esbuild). The codex-plugin build is single-file per entry (no dependency tree shipped), platform=node, format=esm, target=node20. Each .mjs is under 10KB.
|
|
1483
|
+
|
|
1484
|
+
**Tests**
|
|
1485
|
+
|
|
1486
|
+
414 unit + integration pass unchanged. No new test files added in this commit — the install path is exercised by the verified end-to-end smoke flow (`nlm connect codex --local` → `codex exec` → hook-log delta inspection). Test surface for install/codex.ts and the build script should land in a follow-up.
|
|
1487
|
+
|
|
1488
|
+
**Wiki correction owed**
|
|
1489
|
+
|
|
1490
|
+
`Whtnxt Agent Vault/Ventures/nlm-memory/learnings.md` line 218 lists Codex CLI as "`~/.codex/` JSON-config hooks (identical schema to Claude Code) … ~95% script reuse from Claude Code". The script reuse claim is correct (the .ts files port verbatim); the install-mechanism claim is wrong (marketplace + plugin, not settings.json). Wiki update is the next priority after this commit lands.
|
|
1491
|
+
|
|
1492
|
+
**Next priorities** (revised from the morning's stack)
|
|
1493
|
+
|
|
1494
|
+
1. Wiki update correcting the 2026-05-23 cross-runtime hook landscape table and adding a Codex plugin Tool Lesson. ← **Up next.**
|
|
1495
|
+
2. Stop hook validation on Edward's first interactive codex turn (passive — happens whenever).
|
|
1496
|
+
3. NousResearch Hermes Agent (#165) — has the cleanest `plugin.yaml` hook surface and was identified in the wiki as the next runtime worth a real adapter. I can validate it end-to-end without a TTY, unlike Codex.
|
|
1497
|
+
4. Mode B pre-mortem and alt-embedding A/B remain shelved.
|
|
1498
|
+
|
|
1499
|
+
## 2026-05-27 — Stop-hook multi-turn citation detection: useful_hit_rate goes from structurally 0% to a real metric
|
|
1500
|
+
|
|
1501
|
+
Bug-fix to the Stop hook's citation detector. The previous implementation scanned only the LAST assistant turn of the transcript, but `tool_use` blocks live in earlier turns — the typical pattern is `tool_use → tool_result → prose summary`, and Stop fires after the summary. The detector saw prose, found no tool_use, logged 0 citations. Production evidence: 348 Stop firings with surfaced IDs, **zero** citations recorded, despite 23 real `mcp__nlm-memory__*` tool_uses in the matching transcripts over the last 7 days.
|
|
1502
|
+
|
|
1503
|
+
**Diagnosis path.** Cross-referenced `~/.nlm/hook-log.jsonl` (stop entries, all `citedIds:[]`) against `~/.claude/projects/<workspace>/<conv>.jsonl` (real assistant turns). Drilled into `1fc5a8f1-00fa-4ff5-85e7-a239072082b2`: recall hook surfaced `cc_7ff73609-…`, the assistant called `get_session({id:"cc_7ff73609-…"})` in turn N-1, then wrote a prose summary in turn N; the Stop hook scanned only turn N and logged `citedIds:[]`. Confirmed by code path at `transcript.ts:48` — the loop returns on the first assistant line found walking from the end.
|
|
1504
|
+
|
|
1505
|
+
**Changes**
|
|
1506
|
+
- `src/core/hook/transcript.ts` — added `readAllAssistantTurns(transcriptPath): ReadonlyArray<AssistantTurn>` that returns every assistant turn in order. Kept `readLastAssistantTurn` as a thin wrapper (single test caller; back-compat for non-Stop callers).
|
|
1507
|
+
- `src/core/hook/cite-memo.ts` (new) — per-conversation cited-set memo mirroring `memo.ts`. Same state dir (`~/.nlm/hook-state/`, overridable via `NLM_HOOK_STATE_DIR`), filename suffix `.cited.json` so memo-sweep's existing dir-walk cleans both surfaced and cited memos by mtime. `loadCited` / `recordCited` / `clearCited`.
|
|
1508
|
+
- `src/hook/stop-hook.ts` — `runStopHook` now reads all assistant turns, unions text + tool_uses across them, runs `detectCitations` over the union, dedupes against `loadCited(conversationId)`, posts the fresh ones, and persists via `recordCited`. The `responsePreview` stays as the LAST turn's prose (that's the text Edward saw when Stop fired). Daemon remains blind-append; dedup is hook-local.
|
|
1509
|
+
- `src/hook/session-end-hook.ts` — `runSessionEnd` now also calls `clearCited` so both memos are cleaned on session close.
|
|
1510
|
+
- `scripts/backfill-citations.mjs` (new) — one-shot historical replay. Walks `~/.nlm/hook-log.jsonl` to collect surfaced-ID sets per conversation, finds matching transcripts under `~/.claude/projects/`, runs the same detector, dedupes against existing `~/.nlm/citation-log.jsonl` entries, appends fresh citations with a `backfill:true` marker. Idempotent. Dry-run by default; `--commit` writes.
|
|
1511
|
+
|
|
1512
|
+
**Validation**
|
|
1513
|
+
- Tests: 414 unit + integration tests pass (was 396, +18 new). New cases cover: tool_use detected when it's in an earlier turn and the last turn is prose-only (the real-world pattern); dedup across repeated Stop firings on a growing transcript; local memo update even when `postCitation` fails (no double-count on next fire); 10 `cite-memo` cases (load/record/clear/corrupt-file/non-array/path-safety); 3 `readAllAssistantTurns` cases; 2 new session-end cases.
|
|
1514
|
+
- Typecheck clean on changes (pre-existing `SessionEnd` error in `hook-claude-settings.test.ts` is unrelated and predates this work).
|
|
1515
|
+
- Backfill dry-run against the live `~/.nlm/hook-log.jsonl`: 42 conversations had surfaced IDs, 37 had a matching transcript, **4 conversations contain at least one tool_use citation the old detector missed**. Lower than the upper bound suggests by raw tool-use count (23) because many tool_uses were `recall_sessions`/`recall_facts` (no surfaced-ID-in-input — those are pull, not push-follow-up). The 4 captured citations are the ones where the model actually drilled into a surfaced session via `get_session(id=...)`.
|
|
1516
|
+
|
|
1517
|
+
**Impact.** `useful_hit_rate` (cited / surfaced) goes from a structural 0% to a real signal. This is the training-data substrate for the future learned reranker (each row in the citation log is a `(query, returned_id, was_cited)` triple once joined against `~/.nlm/query_log.jsonl` by `conversation_id`). The 348 stop firings that previously generated zero training rows would have generated ~10-15 if the detector had been working — small but real, and growing with every conversation going forward.
|
|
1518
|
+
|
|
1519
|
+
**Methodology note worth keeping.** The bug was diagnosable in <10 minutes by cross-referencing two existing log streams (hook-log.jsonl × Claude Code transcripts) before touching code. Tomorrow's-self version of this rule: when a telemetry metric reads structurally zero, scan the raw inputs the metric is supposed to consume before assuming the metric is correct. Filing in `Operations/what-works/code-quality.md` candidate set.
|
|
1520
|
+
|
|
1521
|
+
**Next priorities (unchanged from earlier today's update):**
|
|
1522
|
+
|
|
1523
|
+
1. ~~Stop hook citation rate.~~ Shipped.
|
|
1524
|
+
2. Pre-mortem Mode B before any code. Ceiling +1.5% hybrid temporal — current recommendation is to shelve unless a separate driver emerges.
|
|
1525
|
+
3. Cross-runtime hook adapters (Hermes / pi / Codex). Unchanged.
|
|
1526
|
+
4. Alt-embedding A/B — still deferred.
|
|
1527
|
+
|
|
1528
|
+
**Source:** Whtnxt Agent orchestrator session 2026-05-27 (continuation from Build F ship). Diagnosis grounded in `~/.nlm/hook-log.jsonl` (342 stop entries, 0 citations) and `~/.claude/projects/-Users-echalupa-Documents-Coding-Projects-Whtnxt-Agent/*.jsonl` (23 NLM tool_uses across 7 days).
|
|
1529
|
+
|
|
1530
|
+
## 2026-05-27 — Build F shipped: force-include keyword rank-1 on temporal+entity shape; hybrid temporal +3.0 / aggregate +0.8 / hybrid beats keyword for the first time
|
|
1531
|
+
|
|
1532
|
+
Single session arc, ~6 hours: Build E′ (asymmetric RRF multiplicative boost) shipped → harness-tested → falsified by head-baseline → reverted → diagnosed via per-question `results.json` → Probes 1 & 2 designed and run → Build F (post-merge force-include) shipped → confirmed by clean A/B head-baseline → shipped. Three full harness runs (1 cold ~50 min + 2 hot ~25s) plus two probe scripts. Zero false ships.
|
|
1533
|
+
|
|
1534
|
+
**Build E′ (falsified path, recorded for audit trail).** Built `src/core/recall/query-shape.ts` with `detectQueryShape(query)` returning `{hasTemporal, hasNamedEntity}` (temporal regex covers "N days/weeks/months ago", "last <day>", "when did", "before/after I", "yesterday/today/tomorrow"; named-entity accepts ALL-CAPS acronyms and mixed-case tokens, excludes days of week and month names to avoid Mode B false-fires). Modified `mergeHybrid` to accept a `boostKeyword` param and multiply the keyword leg's `1/(RRF_K + rank_kw)` by 1.75 on shape match. Added 27 unit tests for `detectQueryShape`. Harness run `2026-05-26-16-39-52` (n=500, ~48 min, partial cache): hybrid temporal 91.0 → 92.5 / aggregate 95.8 → 96.4. Head-baseline rerun with boost disabled on the same cache (`2026-05-26-16-57-47`, 26.3s): **byte-identical numbers**. The lift was 100% cache enrichment from the 7,500→5,500 chunk-size change populating new embeddings; the boost contributed zero. Post-mortem probe: detector fires on 23/133 temporal queries, but on those 23 the multiplicative boost changed zero top-5 results — the boost magnitude (1.75×) was too small to overcome the "session appears in both lists at lower rank" advantage in RRF. Reverted; recorded in [[track-record]].
|
|
1535
|
+
|
|
1536
|
+
**Build F (shipped).** Replaced the failed multiplicative boost with a post-merge **force-include**: when shape is `temporal && namedEntity`, ensure `kwHits[0].session.id` is in the merged top-`limit` set; if not, insert at position `limit - 1`, displacing the lowest-confidence merged hit. Sidesteps RRF arithmetic entirely. ~10 lines in `forceIncludeKeywordTop()` helper at `src/core/recall/recall-service.ts`; detector unchanged from E′.
|
|
1537
|
+
|
|
1538
|
+
**Pre-build probes justified the build.** Probe 1 joined each hybrid temporal miss's keyword `returnedIds` against the dataset's `answer_session_ids` to compute keyword's rank for the gold session — on the 7 KW-FOUND misses, 5 had keyword rank=1 and 2 were within rank 5 (force-include trivially recovers all 7 if the detector fires). Probe 2 measured detector fire rate by `question_type`: 17.3% on temporal-reasoning, 0% on the two paraphrase types (single-session-preference, single-session-assistant), 1.4-2.6% on the other non-temporal types — bounded blast radius of ~5 queries across 367 non-temporal questions.
|
|
1539
|
+
|
|
1540
|
+
**Clean A/B (same hot cache, identical code except the force-include branch).** Build F (`2026-05-26-22-47-07`, cold rebuild ~85 min) vs head-baseline boost-off (`2026-05-26-22-56-53`, 22.1s on now-hot cache):
|
|
1541
|
+
|
|
1542
|
+
| Metric | Off | On | Δ |
|
|
1543
|
+
|---|---|---|---|
|
|
1544
|
+
| hybrid aggregate | 96.4 | **97.2** | **+0.8** |
|
|
1545
|
+
| hybrid temporal | 92.5 | **95.5** | **+3.0** |
|
|
1546
|
+
| all other types | byte-identical | byte-identical | 0 |
|
|
1547
|
+
| keyword aggregate | 96.6 | 96.6 | 0 |
|
|
1548
|
+
| semantic aggregate | 91.6 | 91.6 | 0 |
|
|
1549
|
+
|
|
1550
|
+
Zero regression on any question type. Detector unchanged from E′ — the difference is force-include sidestepping the RRF math rather than trying to outmuscle it.
|
|
1551
|
+
|
|
1552
|
+
**Hybrid finally beats keyword on aggregate** (97.2 > 96.6) — first time on this benchmark. Resolves the structural tension from 2026-05-25 where keyword led aggregate R@5. The 2026-05-23 MCP default flip to hybrid is now backed by k=5 numbers, not just the k=20 ablation.
|
|
1553
|
+
|
|
1554
|
+
**Gate check vs the 2026-05-26 brief:** target was `hybrid temporal R@5 ≥ +4 (target ~95+)`. Landed at +3.0 / 95.5 — one question shy of +4 but inside the 95+ landing target. The miss is "Who did I meet with during the lunch last Tuesday?" — detector skips because day-of-week is excluded from the named-entity set (necessary to avoid Mode B false-fires). Adding day-of-week as NE would catch this one question but cost the Mode B exclusions. Not worth the trade at scale.
|
|
1555
|
+
|
|
1556
|
+
**Tests:** 186 unit tests pass (added 27 for `detectQueryShape`); typecheck clean on changes (pre-existing `SessionEnd` error in `hook-claude-settings.test.ts` unrelated). Daemon unchanged (Build F is recall-path code, not ingest/embed).
|
|
1557
|
+
|
|
1558
|
+
**Operational gotcha filed.** Mid-session, `~/.cache/longmemeval/{embeddings.sqlite,longmemeval_s_cleaned.json}` vanished between two harness runs — macOS Sonoma+ auto-cleanup of `~/.cache/` during an idle window. Cost ~90 min of cold rebuild + 277 MB redownload. Mitigation: move the cache outside `~/.cache/` via `LONGMEMEVAL_CACHE_DIR=$HOME/.local/share/longmemeval` before the next harness run. Full diagnosis in `Operations/Tool Lessons/longmemeval-harness.md` (vault) — also captures the harness performance envelope and the pre-build probing methodology.
|
|
1559
|
+
|
|
1560
|
+
**Methodology lesson worth keeping.** Two-to-five-line probe scripts catch dead hypotheses cheaper than a full harness run. Pattern: (a) probe detector fire rate on the target distribution, (b) probe detector fire rate on the non-target distribution (blast radius), (c) probe the failure mode's mechanism (rank position, candidate-set membership). Run before harness; the result is right whether or not the build ships. Filed in `Ventures/nlm-memory/track-record.md` and `Operations/Tool Lessons/longmemeval-harness.md`. Candidate addition to `Operations/what-works/code-quality.md` if the pattern recurs outside NLM.
|
|
1561
|
+
|
|
1562
|
+
**Next priorities (updated):**
|
|
1563
|
+
|
|
1564
|
+
1. **Stop hook citation rate.** Now the highest-leverage moat work — hybrid is structurally sound at 97.2 aggregate; further R@5 work hits diminishing returns until a different lever gets pre-mortem'd.
|
|
1565
|
+
2. **Pre-mortem Mode B before any code.** Only 2 of 10 hybrid temporal misses are both-leg misses. Ceiling on a successful Mode B fix is +2/133 = +1.5% hybrid temporal. Probe: can a query-time date parser actually resolve those 2 questions' answer windows? If under 50%, the build doesn't justify itself.
|
|
1566
|
+
3. **Cross-runtime hook adapters** (Hermes / pi / Codex). Unchanged from prior handoff.
|
|
1567
|
+
4. **Alt-embedding A/B** — still deferred. Hybrid 97.2 is a higher floor than the alt-embedding work was originally framed against. Reopen only when migration 010 is justified by a separate driver.
|
|
1568
|
+
|
|
1569
|
+
**Source:** Whtnxt Agent orchestrator session 2026-05-26 → 2026-05-27 (continuation); harness reports `reports/longmemeval/2026-05-26-16-39-52/` (E′ on partial cache), `…16-57-47/` (head-baseline boost off, byte-identical to E′), `…22-47-07/` (Build F on cold rebuild), `…22-56-53/` (head-baseline force-include off, same hot cache as 22-47-07). Probe scripts ephemeral at `/tmp/nlm-eprime/`.
|
|
1570
|
+
|
|
1571
|
+
_Older entries archived in CHANGELOG-2026.md_
|
|
1572
|
+
|
|
1573
|
+
|
|
1574
|
+
|
|
1575
|
+
_Older entries archived in CHANGELOG-2026.md_
|