nlm-memory 0.4.2 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (285) hide show
  1. package/README.md +72 -34
  2. package/dist/cli/nlm.js +223 -33
  3. package/dist/cli/nlm.js.map +1 -1
  4. package/dist/core/adapters/cursor.d.ts +45 -0
  5. package/dist/core/adapters/cursor.js +397 -0
  6. package/dist/core/adapters/cursor.js.map +1 -0
  7. package/dist/core/adapters/from-source.js +10 -0
  8. package/dist/core/adapters/from-source.js.map +1 -1
  9. package/dist/core/adapters/windsurf.d.ts +44 -0
  10. package/dist/core/adapters/windsurf.js +299 -0
  11. package/dist/core/adapters/windsurf.js.map +1 -0
  12. package/dist/core/hook/claude-settings.d.ts +12 -5
  13. package/dist/core/hook/claude-settings.js +21 -6
  14. package/dist/core/hook/claude-settings.js.map +1 -1
  15. package/dist/core/sources/source-registry.d.ts +1 -1
  16. package/dist/core/sources/source-registry.js +18 -0
  17. package/dist/core/sources/source-registry.js.map +1 -1
  18. package/dist/core/storage/sqlite-session-store.d.ts +2 -0
  19. package/dist/core/storage/sqlite-session-store.js +38 -2
  20. package/dist/core/storage/sqlite-session-store.js.map +1 -1
  21. package/dist/hook/hook-auth.d.ts +13 -0
  22. package/dist/hook/hook-auth.js +19 -0
  23. package/dist/hook/hook-auth.js.map +1 -0
  24. package/dist/hook/prompt-recall-hook.js +7 -1
  25. package/dist/hook/prompt-recall-hook.js.map +1 -1
  26. package/dist/hook/session-start-hook.js +4 -1
  27. package/dist/hook/session-start-hook.js.map +1 -1
  28. package/dist/hook/stop-hook.js +4 -1
  29. package/dist/hook/stop-hook.js.map +1 -1
  30. package/dist/http/app.d.ts +2 -0
  31. package/dist/http/app.js +76 -1
  32. package/dist/http/app.js.map +1 -1
  33. package/dist/install/claude-code.js +1 -1
  34. package/dist/install/claude-code.js.map +1 -1
  35. package/dist/install/cursor.d.ts +25 -0
  36. package/dist/install/cursor.js +43 -0
  37. package/dist/install/cursor.js.map +1 -0
  38. package/dist/install/nlm-dir-perms.d.ts +19 -0
  39. package/dist/install/nlm-dir-perms.js +43 -0
  40. package/dist/install/nlm-dir-perms.js.map +1 -0
  41. package/dist/install/ollama.d.ts +18 -1
  42. package/dist/install/ollama.js +62 -7
  43. package/dist/install/ollama.js.map +1 -1
  44. package/dist/install/setup.d.ts +4 -0
  45. package/dist/install/setup.js +141 -18
  46. package/dist/install/setup.js.map +1 -1
  47. package/dist/install/windsurf.d.ts +25 -0
  48. package/dist/install/windsurf.js +43 -0
  49. package/dist/install/windsurf.js.map +1 -0
  50. package/dist/mcp/server.js +20 -1
  51. package/dist/mcp/server.js.map +1 -1
  52. package/dist/shared/types.d.ts +4 -0
  53. package/dist/ui/assets/{index-BA6IpU8g.css → index-Beo8psd-.css} +1 -1
  54. package/dist/ui/assets/index-CSPTTeeM.js +69 -0
  55. package/dist/ui/index.html +2 -2
  56. package/package.json +26 -1
  57. package/plugin/scripts/prompt-recall-hook.mjs +55 -4
  58. package/plugin/scripts/stop-hook.mjs +57 -6
  59. package/.agents/plugins/marketplace.json +0 -20
  60. package/.github/workflows/ci.yml +0 -30
  61. package/dist/ui/assets/index-B_qIVV0k.js +0 -69
  62. package/docs/methodology/re-derivation-rate.md +0 -112
  63. package/docs/methodology/useful-hit-rate.md +0 -79
  64. package/docs/plans/2026-05-20-fts5-lexical-recall.md +0 -1088
  65. package/docs/plans/2026-05-20-recall-daemon-wedge-fix.md +0 -662
  66. package/docs/plans/2026-05-20-recall-hook-design.md +0 -131
  67. package/docs/plans/2026-05-20-recall-hook-implementation.md +0 -1222
  68. package/docs/plans/desktop-product.md +0 -69
  69. package/docs/plans/factstore-design.md +0 -236
  70. package/logs/CHANGELOG/CHANGELOG-2026.md +0 -1389
  71. package/logs/CHANGELOG/CHANGELOG.md +0 -337
  72. package/migrations/000_initial_schema.sql +0 -174
  73. package/migrations/001_entity_type_rename.sql +0 -17
  74. package/migrations/002_adapter_state_extend.sql +0 -12
  75. package/migrations/003_session_embeddings.sql +0 -11
  76. package/migrations/004_facts.sql +0 -46
  77. package/migrations/005_sources.sql +0 -31
  78. package/migrations/006_providers.sql +0 -33
  79. package/migrations/007_source_tokens.sql +0 -17
  80. package/migrations/008_fts_rebuild.sql +0 -9
  81. package/migrations/009_session_embedding_chunks.sql +0 -46
  82. package/migrations/010_sources_opencode.sql +0 -30
  83. package/migrations/011_sources_hermes_agent.sql +0 -30
  84. package/migrations/012_sources_aider.sql +0 -30
  85. package/migrations/013_adapter_state_failure_count.sql +0 -12
  86. package/plugin-hermes-agent/README.md +0 -49
  87. package/plugin-hermes-agent/__init__.py +0 -75
  88. package/plugin-hermes-agent/plugin.yaml +0 -15
  89. package/scripts/backfill-citations.mjs +0 -0
  90. package/scripts/build-codex-plugin.mjs +0 -61
  91. package/scripts/deepseek-probe.mjs +0 -67
  92. package/scripts/extract-triples.mjs +0 -207
  93. package/scripts/longmemeval/embedding-cache.ts +0 -77
  94. package/scripts/longmemeval/fetch-dataset.sh +0 -25
  95. package/scripts/longmemeval/run-harness.ts +0 -315
  96. package/scripts/longmemeval/scorer.ts +0 -99
  97. package/scripts/longmemeval/tsconfig.json +0 -9
  98. package/scripts/longmemeval/types.ts +0 -35
  99. package/scripts/nlm-daily-digest.py +0 -239
  100. package/scripts/nlm-daily-digest.sh +0 -28
  101. package/src/cli/classify-parity.ts +0 -257
  102. package/src/cli/launchctl-helpers.ts +0 -49
  103. package/src/cli/nlm.ts +0 -885
  104. package/src/core/actions/actions-log.ts +0 -118
  105. package/src/core/actions/overlay.ts +0 -117
  106. package/src/core/adapters/aider.ts +0 -205
  107. package/src/core/adapters/claude-code.ts +0 -293
  108. package/src/core/adapters/common.ts +0 -54
  109. package/src/core/adapters/from-source.ts +0 -57
  110. package/src/core/adapters/hermes-agent.ts +0 -240
  111. package/src/core/adapters/hermes.ts +0 -277
  112. package/src/core/adapters/jsonl-generic.ts +0 -208
  113. package/src/core/adapters/opencode.ts +0 -281
  114. package/src/core/adapters/pi.ts +0 -264
  115. package/src/core/classifier/prompt.ts +0 -200
  116. package/src/core/dataset/build-dataset.ts +0 -463
  117. package/src/core/embedding/chunk-body.ts +0 -76
  118. package/src/core/embedding/embed-backfill.ts +0 -210
  119. package/src/core/embedding/embed-normalize.ts +0 -135
  120. package/src/core/facts/backfill-facts.ts +0 -254
  121. package/src/core/facts/extract-facts.ts +0 -50
  122. package/src/core/hook/citation-detect.ts +0 -124
  123. package/src/core/hook/cite-memo.ts +0 -68
  124. package/src/core/hook/claude-settings.ts +0 -166
  125. package/src/core/hook/gate.ts +0 -25
  126. package/src/core/hook/hook-log.ts +0 -41
  127. package/src/core/hook/memo-sweep.ts +0 -164
  128. package/src/core/hook/memo.ts +0 -67
  129. package/src/core/hook/pointer-block.ts +0 -26
  130. package/src/core/hook/select.ts +0 -32
  131. package/src/core/hook/transcript.ts +0 -121
  132. package/src/core/ingest/ingest-session.ts +0 -111
  133. package/src/core/providers/provider-models.ts +0 -100
  134. package/src/core/providers/provider-registry.ts +0 -196
  135. package/src/core/recall/citation-log.ts +0 -108
  136. package/src/core/recall/filter.ts +0 -27
  137. package/src/core/recall/index.ts +0 -6
  138. package/src/core/recall/match-fields.ts +0 -40
  139. package/src/core/recall/query-log.ts +0 -149
  140. package/src/core/recall/query-shape.ts +0 -66
  141. package/src/core/recall/recall-service.ts +0 -320
  142. package/src/core/recall/recent-log.ts +0 -59
  143. package/src/core/recall/tokenize.ts +0 -18
  144. package/src/core/recall/useful-scan.ts +0 -336
  145. package/src/core/recall-facts/fact-query-log.ts +0 -150
  146. package/src/core/recall-facts/fact-recall-service.ts +0 -327
  147. package/src/core/scheduler/scan-once.ts +0 -142
  148. package/src/core/scheduler/scheduler.ts +0 -225
  149. package/src/core/sources/source-registry.ts +0 -260
  150. package/src/core/storage/db-restore.ts +0 -133
  151. package/src/core/storage/live-status.ts +0 -45
  152. package/src/core/storage/migrate.ts +0 -72
  153. package/src/core/storage/sqlite-fact-store.ts +0 -304
  154. package/src/core/storage/sqlite-session-store.ts +0 -765
  155. package/src/hook/prompt-recall-hook.ts +0 -174
  156. package/src/hook/session-end-hook.ts +0 -81
  157. package/src/hook/session-start-hook.ts +0 -165
  158. package/src/hook/stop-hook.ts +0 -236
  159. package/src/http/app.ts +0 -1137
  160. package/src/install/claude-code.ts +0 -128
  161. package/src/install/codex.ts +0 -367
  162. package/src/install/hermes-agent.ts +0 -76
  163. package/src/install/hermes.ts +0 -78
  164. package/src/install/ollama.ts +0 -211
  165. package/src/install/setup.ts +0 -368
  166. package/src/llm/classifier-box.ts +0 -64
  167. package/src/llm/deepseek-client.ts +0 -150
  168. package/src/llm/env-autoload.ts +0 -55
  169. package/src/llm/ollama-client.ts +0 -189
  170. package/src/mcp/server.ts +0 -534
  171. package/src/ports/fact-store.ts +0 -102
  172. package/src/ports/llm-client.ts +0 -52
  173. package/src/ports/logger.ts +0 -16
  174. package/src/ports/session-store.ts +0 -45
  175. package/src/ports/transcript-adapter.ts +0 -55
  176. package/src/shared/types.ts +0 -145
  177. package/src/ui/App.tsx +0 -58
  178. package/src/ui/components/PromoteOpenButton.tsx +0 -65
  179. package/src/ui/components/SessionDrawer.tsx +0 -136
  180. package/src/ui/components/SideNav.tsx +0 -162
  181. package/src/ui/components/Skeleton.tsx +0 -107
  182. package/src/ui/index.html +0 -13
  183. package/src/ui/lib/actions.ts +0 -30
  184. package/src/ui/lib/api.ts +0 -92
  185. package/src/ui/lib/dataset.ts +0 -141
  186. package/src/ui/lib/registries.ts +0 -155
  187. package/src/ui/lib/view-settings.ts +0 -41
  188. package/src/ui/main.tsx +0 -15
  189. package/src/ui/pages/Live.tsx +0 -229
  190. package/src/ui/pages/Pulse.tsx +0 -415
  191. package/src/ui/pages/Recall.tsx +0 -190
  192. package/src/ui/pages/River.tsx +0 -308
  193. package/src/ui/pages/Search.tsx +0 -93
  194. package/src/ui/pages/Stub.tsx +0 -9
  195. package/src/ui/pages/Thread.tsx +0 -262
  196. package/src/ui/pages/settings/Classifier.tsx +0 -227
  197. package/src/ui/pages/settings/Data.tsx +0 -190
  198. package/src/ui/pages/settings/Index.tsx +0 -65
  199. package/src/ui/pages/settings/Labels.tsx +0 -224
  200. package/src/ui/pages/settings/Providers.tsx +0 -305
  201. package/src/ui/pages/settings/SettingsSubnav.tsx +0 -28
  202. package/src/ui/pages/settings/Sources.tsx +0 -326
  203. package/src/ui/pages/settings/Views.tsx +0 -96
  204. package/src/ui/styles.css +0 -1766
  205. package/src/ui/tsconfig.json +0 -21
  206. package/src/ui/vite.config.ts +0 -19
  207. package/tests/fixtures/claude_code/short_session.jsonl +0 -2
  208. package/tests/fixtures/claude_code/standard_iso.jsonl +0 -4
  209. package/tests/fixtures/claude_code/tool_heavy.jsonl +0 -8
  210. package/tests/fixtures/claude_code/with_subagent.jsonl +0 -7
  211. package/tests/fixtures/facts.ts +0 -17
  212. package/tests/fixtures/golden-corpus.ts +0 -85
  213. package/tests/fixtures/hermes/paired_request_dump.json +0 -24
  214. package/tests/fixtures/hermes/paired_session.json +0 -23
  215. package/tests/fixtures/hermes/request_dump.json +0 -28
  216. package/tests/fixtures/hermes/session_iso.json +0 -38
  217. package/tests/fixtures/hermes/session_unix.json +0 -38
  218. package/tests/fixtures/hermes/system_only.json +0 -18
  219. package/tests/fixtures/pi/error-connection-abort.jsonl +0 -8
  220. package/tests/fixtures/pi/short-successful.jsonl +0 -5
  221. package/tests/fixtures/pi/with-custom-message.jsonl +0 -6
  222. package/tests/fixtures/sessions.ts +0 -22
  223. package/tests/integration/backfill-facts.test.ts +0 -362
  224. package/tests/integration/citation-explicit.test.ts +0 -111
  225. package/tests/integration/cite-event.test.ts +0 -169
  226. package/tests/integration/cite-memo.test.ts +0 -87
  227. package/tests/integration/db-restore.test.ts +0 -153
  228. package/tests/integration/embed-backfill.test.ts +0 -176
  229. package/tests/integration/fact-supersedence.test.ts +0 -313
  230. package/tests/integration/fts-index.test.ts +0 -60
  231. package/tests/integration/getbyids-sqlite.test.ts +0 -60
  232. package/tests/integration/hermes-agent-hooks.test.ts +0 -248
  233. package/tests/integration/hook-claude-settings.test.ts +0 -205
  234. package/tests/integration/hook-log.test.ts +0 -54
  235. package/tests/integration/hook-memo.test.ts +0 -68
  236. package/tests/integration/hook-pre-compact.test.ts +0 -105
  237. package/tests/integration/hook-subagent-start.test.ts +0 -102
  238. package/tests/integration/http.test.ts +0 -401
  239. package/tests/integration/keyword-search-fts.test.ts +0 -66
  240. package/tests/integration/mcp-recall-logging.test.ts +0 -88
  241. package/tests/integration/mcp.test.ts +0 -248
  242. package/tests/integration/memo-sweep.test.ts +0 -91
  243. package/tests/integration/prompt-recall-hook.test.ts +0 -88
  244. package/tests/integration/provider-registry.test.ts +0 -107
  245. package/tests/integration/recall-golden.test.ts +0 -59
  246. package/tests/integration/recall-sqlite.test.ts +0 -169
  247. package/tests/integration/scheduler.test.ts +0 -391
  248. package/tests/integration/session-end-hook.test.ts +0 -48
  249. package/tests/integration/session-start-hook.test.ts +0 -126
  250. package/tests/integration/source-registry.test.ts +0 -120
  251. package/tests/integration/sqlite-fact-store.test.ts +0 -346
  252. package/tests/integration/stop-hook.test.ts +0 -560
  253. package/tests/integration/wal-checkpoint.test.ts +0 -49
  254. package/tests/unit/cli/launchctl-helpers.test.ts +0 -60
  255. package/tests/unit/core/adapters/aider.test.ts +0 -230
  256. package/tests/unit/core/adapters/claude-code.test.ts +0 -118
  257. package/tests/unit/core/adapters/hermes-agent.test.ts +0 -329
  258. package/tests/unit/core/adapters/hermes.test.ts +0 -81
  259. package/tests/unit/core/adapters/jsonl-generic.test.ts +0 -142
  260. package/tests/unit/core/adapters/opencode.test.ts +0 -354
  261. package/tests/unit/core/adapters/pi.test.ts +0 -110
  262. package/tests/unit/core/classifier/prompt.test.ts +0 -126
  263. package/tests/unit/core/embedding/chunk-body.test.ts +0 -100
  264. package/tests/unit/core/facts/extract-facts.test.ts +0 -117
  265. package/tests/unit/core/filter.test.ts +0 -40
  266. package/tests/unit/core/hook/citation-detect-cite-session.test.ts +0 -96
  267. package/tests/unit/core/hook/citation-detect.test.ts +0 -124
  268. package/tests/unit/core/hook/gate.test.ts +0 -29
  269. package/tests/unit/core/hook/pointer-block.test.ts +0 -22
  270. package/tests/unit/core/hook/select.test.ts +0 -66
  271. package/tests/unit/core/match-fields.test.ts +0 -39
  272. package/tests/unit/core/mcp-cite-session.test.ts +0 -51
  273. package/tests/unit/core/providers/provider-models.test.ts +0 -101
  274. package/tests/unit/core/query-shape.test.ts +0 -92
  275. package/tests/unit/core/recall-facts/fact-recall-service.test.ts +0 -258
  276. package/tests/unit/core/recall-service.test.ts +0 -200
  277. package/tests/unit/core/storage/live-status.test.ts +0 -54
  278. package/tests/unit/core/tokenize.test.ts +0 -32
  279. package/tests/unit/core/useful-scan.test.ts +0 -537
  280. package/tests/unit/llm/embed.test.ts +0 -93
  281. package/tests/unit/llm/ollama-client.test.ts +0 -124
  282. package/tests/unit/scripts/longmemeval-scorer.test.ts +0 -114
  283. package/tsconfig.json +0 -31
  284. package/tsconfig.test.json +0 -11
  285. package/vitest.config.ts +0 -22
@@ -1,112 +0,0 @@
1
- # re-derivation_rate — design
2
-
3
- ## Why
4
-
5
- `re_derivation_rate` is NLM's strategic metric — the operator-outcome number that competitors (mem0, agentmemory, Letta) cannot match because their destructive lifecycle (decay, auto-forget) erases the data needed to compute it. It is the headline number for Pulse, the cron digest, and any public marketing scorecard. Detection rule, methodology, and a reproducible script live here so the metric is auditable.
6
-
7
- ## Plain-language definition
8
-
9
- A *re-derivation* is when an operator (you, in any AI runtime) solves the same problem twice across multiple sessions without recall of the prior solution. It is the tax NLM exists to eliminate: every re-derivation is a session where memory could have helped but didn't.
10
-
11
- `re_derivation_rate` over a window = (re-derivation events) / (decision events) in that window.
12
-
13
- `re_derivations_prevented` = recall events whose `useful_hit_rate` is true AND whose returned session contained the matching decision. Inverse of re-derivation: the events where memory *did* help.
14
-
15
- ## Detection rule (V1)
16
-
17
- A pair of sessions `(A, B)` is a re-derivation iff all of the following hold:
18
-
19
- 1. **Same entity.** A and B share at least one entity in their respective `entities` arrays.
20
- 2. **Same decision normalized.** A `decision` marker in A and a `decision` marker in B normalize to overlapping content. Normalization: lowercase, strip stopwords, tokenize, Jaccard similarity ≥ 0.6.
21
- 3. **Temporal gap.** `B.started_at - A.started_at >= 7 days`.
22
- 4. **No supersedence link.** No `session_edges` row of kind `supersedes` connects A and B in either direction.
23
- 5. **No continues link.** No `session_edges` row of kind `continues` connects A and B.
24
- 6. **No intervening recall.** Between A.started_at and B.started_at, no recall event in `query-log.jsonl` or `hook-log.jsonl` returned A's id (would mean B's operator was aware of A and chose not to link).
25
-
26
- When all six are true, `B` is a re-derivation of `A`. Count B (not A) — the metric measures fresh re-derivations, not the original.
27
-
28
- ## Edge cases and resolutions
29
-
30
- - **Three sessions A, B, C** where B re-derives A and C re-derives B: count B and C, not A.
31
- - **Trivial decisions.** Decisions under N tokens (default 6) are excluded — "yes ship it" is not a meaningful decision to track.
32
- - **High-frequency entities.** If an entity has >50 sessions in the window, scale the Jaccard threshold up to 0.75 to reduce false positives (common topics will inevitably overlap in keyword-trivial ways).
33
- - **Probe / test entities.** Sessions whose label matches probe patterns (see useful-hit-rate.md) are excluded from both sides.
34
-
35
- ## Computation algorithm
36
-
37
- ```python
38
- def find_re_derivations(sessions, edges, recalls, window_days):
39
- pairs = []
40
- decisions = collect_decisions(sessions) # one row per (session_id, normalized_decision_tokens, entities)
41
- for ent in distinct_entities(decisions):
42
- ent_decisions = sorted(by_session_start([d for d in decisions if ent in d.entities]))
43
- for i, a in enumerate(ent_decisions):
44
- for b in ent_decisions[i+1:]:
45
- if days_between(a, b) < 7: continue
46
- if days_between(a, b) > window_days: break
47
- if jaccard(a.tokens, b.tokens) < threshold(ent): continue
48
- if has_edge(edges, a, b, ("supersedes", "continues")): continue
49
- if recall_returned_a_between(recalls, a, b): continue
50
- pairs.append((a, b))
51
- return pairs
52
- ```
53
-
54
- Runs over the existing canonical sqlite (sessions + session_edges) and the recall log jsonl files. No new schema, no migration. Computed in a single pass; results cached by `(window_start, window_end)` in a new `re_derivation_log` table.
55
-
56
- ## Storage
57
-
58
- - New table `re_derivation_log`: `(window_start, window_end, computed_at, session_a_id, session_b_id, entity, jaccard, decision_a, decision_b)`. One row per detected pair. Re-computable; deletable; not source of truth.
59
- - New endpoint field on `/api/recall/stats`: `re_derivation_count_7d`, `re_derivations_prevented_7d`.
60
- - Pulse: new headline tile showing both numbers and the weekly trend.
61
-
62
- ## CLI
63
-
64
- - `nlm re-derivation scan` — recomputes the log for a window. Default last 30 days.
65
- - `nlm re-derivation list --since 7d` — lists detected pairs with the matched decisions for human review (false-positive triage).
66
- - `nlm re-derivation explain <session-b-id>` — for one B, show why it was flagged (matched A, decision overlap, why no recall covered it).
67
-
68
- ## Calibration loop
69
-
70
- Re-derivation detection is heuristic. False positives waste reader trust; false negatives undersell the metric. Calibration weekly for the first month after V1:
71
-
72
- 1. Run `nlm re-derivation list --since 7d`
73
- 2. Edward reviews each flagged pair
74
- 3. Mark `true_re_derivation: true|false` in a `re_derivation_feedback` table
75
- 4. Adjust Jaccard threshold + minimum decision length until precision/recall both > 70% on Edward's review
76
-
77
- After 4 weeks of calibration, freeze the parameters and publish them in `docs/methodology/re-derivation-rate.md` for external use.
78
-
79
- ## Public scorecard format
80
-
81
- For external publication (gated on the marketing-readiness checklist):
82
-
83
- ```
84
- Edward's corpus, week of YYYY-MM-DD:
85
- Sessions in window: N
86
- Decisions in window: M
87
- Re-derivations detected: X
88
- Re-derivations prevented: Y (recall returned the matching prior session)
89
- Re-derivation rate: X / M = Z.Z%
90
- Methodology: docs/methodology/re-derivation-rate.md
91
- Calibration set: docs/calibration/re-derivation-2026-MM.md
92
- ```
93
-
94
- Publish weekly to the repo. The trend (rate falling over time as NLM gets more useful) is the marketing story.
95
-
96
- ## Why competitors cannot match this
97
-
98
- agentmemory's 4-tier lifecycle decays old observations and auto-forgets stale facts. Without the historical session record intact, there is no Session A to detect a re-derivation against — the data is gone. mem0 uses passive extraction and accretion, with no native concept of session identity that would let you pair A and B. Letta's core memory is in-context, not historical.
99
-
100
- NLM's supersedence + full-session retention is the prerequisite for this metric. It is the strategic moat made measurable.
101
-
102
- ## Out of scope (V1)
103
-
104
- - Cross-runtime re-derivation (decision in Claude Code, re-derived in Hermes). Requires reliable entity normalization across adapters; defer to V2.
105
- - Semantic similarity instead of Jaccard (would catch paraphrased decisions but requires embedding every decision). Defer.
106
- - Automatic supersedence link suggestion from detected re-derivations. The metric should measure, not act, until calibrated.
107
-
108
- ## Implementation phasing
109
-
110
- 1. **Phase 1 (after #152, #153, #154 ship):** implement detection algorithm + CLI + scan command. No UI changes. Validate on Edward's corpus.
111
- 2. **Phase 2 (after 2 weeks of calibration):** wire `re_derivation_count_7d` into `/api/recall/stats` and the daily digest. Pulse tile.
112
- 3. **Phase 3 (gated on marketing readiness):** publish first weekly scorecard publicly. Repo README. Landing site.
@@ -1,79 +0,0 @@
1
- # useful_hit_rate — design
2
-
3
- ## Why
4
-
5
- `hit_rate` reports the fraction of recall calls that returned ≥1 row. With the MCP default now hybrid, that number is structurally close to 100% — semantic always returns *something*. `hit_rate` no longer separates "found stuff" from "found stuff that mattered." `useful_hit_rate` is the metric we actually want: the fraction of recall calls whose returned results were referenced in the next assistant turn.
6
-
7
- This is the signal that lets us answer "is NLM serving its intended purpose" with evidence instead of opinion, and it's an input to the headline re-derivation rate metric (see [re-derivation-rate.md](re-derivation-rate.md) — pending).
8
-
9
- ## Definitions
10
-
11
- **A recall event** is one of:
12
- - A hook fire (logged in `~/.nlm/hook-log.jsonl` with `wouldInject` ids)
13
- - An MCP `recall_sessions` / `recall_facts` call (logged in `~/.nlm/query-log.jsonl`)
14
- - An HTTP `/api/recall` call (logged in `~/.nlm/query-log.jsonl`)
15
-
16
- **A useful recall** is a recall event where:
17
- - At least one of the returned session ids OR session labels appears in the next assistant message in the same conversation transcript, AND
18
- - The match occurs within 3 assistant turns of the recall, AND
19
- - The recall is not a probe (excluded query patterns: `concurrency probe`, `test probe`, `path test`, `recall test`, smoke/cutover patterns)
20
-
21
- **`useful_hit_rate`** = (useful recalls) / (real recalls) over the reporting window.
22
-
23
- ## Detection algorithm
24
-
25
- ```
26
- for each real recall event in window:
27
- transcript = find_transcript(event.conversationId)
28
- if transcript is None:
29
- mark useful = null (unmeasurable)
30
- continue
31
- next_assistant_msgs = transcript.messages_after(event.ts, role="assistant", limit=3)
32
- haystack = " ".join(m.content for m in next_assistant_msgs)
33
- for hit_id in event.returnedIds:
34
- if hit_id in haystack or session_label(hit_id) in haystack:
35
- mark useful = true; break
36
- else:
37
- mark useful = false
38
- ```
39
-
40
- ## Data flow
41
-
42
- 1. **Hook recalls** have `conversationId` directly. Transcript path: `~/.claude/projects/<sanitized-project>/<conversationId>.jsonl`.
43
- 2. **MCP recalls** currently have no conversation context in `query-log.jsonl`. Adding `x-claude-session-id` capture to the MCP server is a prerequisite for measuring MCP useful_hit_rate.
44
- 3. **HTTP recalls** are operator-driven (UI browsing) and excluded from this metric — `useful_hit_rate` measures agent recall usefulness, not UI search satisfaction.
45
-
46
- ## Storage
47
-
48
- - New log file `~/.nlm/useful-hit-log.jsonl`, one entry per scanned recall:
49
- ```json
50
- {"ts": "...", "source": "hook|mcp", "conversationId": "...", "returnedIds": [...], "useful": true|false|null, "matchedId": "...", "scannedAt": "..."}
51
- ```
52
- - New CLI: `nlm useful-scan` — scans the last 24h of recalls, joins against transcripts, appends to the log
53
- - New endpoint field: `/api/recall/stats` includes `useful_hit_rate` and `useful_hit_count` over the same window as `hit_rate`
54
-
55
- ## Out of scope (V1)
56
-
57
- - MCP useful_hit_rate (blocked on conversation-id capture; track as follow-up)
58
- - Real-time useful-hit detection (V1 is batch-scan, run on the daily digest cron)
59
- - Distinguishing "agent quoted the recall" vs "agent acted on it" (the former is a proxy for the latter; V2 could refine)
60
- - HTTP UI click-through (different metric — would live under a separate `ui_click_rate`)
61
-
62
- ## V1 scope (shipping now)
63
-
64
- - Ship the daily digest cron consuming existing `hit_rate` (this doc justifies the upgrade path)
65
- - Add stub field `useful_hit_rate: null` to `/api/recall/stats` so the digest schema is forward-compatible
66
- - Implement the scanner + CLI in a follow-up commit (target: within 7 days)
67
-
68
- ## Why batch-scan vs hook-vs-hook real-time
69
-
70
- A second Claude Code hook (`Stop` or `PostToolUse`) could compute usefulness in real time. Rejected because:
71
- - Doubles installation surface (two hooks per agent runtime)
72
- - Adds per-turn latency for a metric the user reads once/day
73
- - Doesn't generalize to Hermes, pi, Codex, Gemini, Aider (no equivalent post-turn hook on most)
74
- - Batch-scan reads the same transcript files the daemon already polls
75
-
76
- ## Open questions
77
-
78
- - Hit-label heuristic: substring match is cheap but noisy. Worth fuzzy matching session label tokens? Defer until V1 data shows the false-positive rate.
79
- - Window for scan: hour-bucket vs day-bucket? Daily-bucket for now to match the digest cadence; revisit if cron interval changes.