nlm-memory 0.5.0 → 0.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +89 -34
- package/dist/cli/digest.d.ts +20 -0
- package/dist/cli/digest.js +142 -0
- package/dist/cli/digest.js.map +1 -0
- package/dist/cli/nlm.d.ts +1 -0
- package/dist/cli/nlm.js +25 -1
- package/dist/cli/nlm.js.map +1 -1
- package/dist/core/digest/compose.d.ts +38 -0
- package/dist/core/digest/compose.js +93 -0
- package/dist/core/digest/compose.js.map +1 -0
- package/dist/core/digest/hook-liveness.d.ts +32 -0
- package/dist/core/digest/hook-liveness.js +54 -0
- package/dist/core/digest/hook-liveness.js.map +1 -0
- package/dist/http/app.js +2 -1
- package/dist/http/app.js.map +1 -1
- package/dist/mcp/server.js +20 -1
- package/dist/mcp/server.js.map +1 -1
- package/dist/ui/assets/{index-C8cpwbYJ.css → index-Beo8psd-.css} +1 -1
- package/dist/ui/assets/{index-CB50QnL-.js → index-CSPTTeeM.js} +8 -8
- package/dist/ui/index.html +2 -2
- package/package.json +26 -1
- package/.agents/plugins/marketplace.json +0 -20
- package/.github/workflows/ci.yml +0 -30
- package/docs/methodology/re-derivation-rate.md +0 -112
- package/docs/methodology/useful-hit-rate.md +0 -79
- package/docs/plans/2026-05-20-fts5-lexical-recall.md +0 -1088
- package/docs/plans/2026-05-20-recall-daemon-wedge-fix.md +0 -662
- package/docs/plans/2026-05-20-recall-hook-design.md +0 -131
- package/docs/plans/2026-05-20-recall-hook-implementation.md +0 -1222
- package/docs/plans/desktop-product.md +0 -69
- package/docs/plans/factstore-design.md +0 -236
- package/logs/CHANGELOG/CHANGELOG-2026.md +0 -1575
- package/logs/CHANGELOG/CHANGELOG.md +0 -209
- package/migrations/000_initial_schema.sql +0 -174
- package/migrations/001_entity_type_rename.sql +0 -17
- package/migrations/002_adapter_state_extend.sql +0 -12
- package/migrations/003_session_embeddings.sql +0 -11
- package/migrations/004_facts.sql +0 -46
- package/migrations/005_sources.sql +0 -31
- package/migrations/006_providers.sql +0 -33
- package/migrations/007_source_tokens.sql +0 -17
- package/migrations/008_fts_rebuild.sql +0 -9
- package/migrations/009_session_embedding_chunks.sql +0 -46
- package/migrations/010_sources_opencode.sql +0 -30
- package/migrations/011_sources_hermes_agent.sql +0 -30
- package/migrations/012_sources_aider.sql +0 -30
- package/migrations/013_adapter_state_failure_count.sql +0 -12
- package/migrations/014_sources_cursor.sql +0 -30
- package/migrations/015_sources_windsurf.sql +0 -30
- package/plugin-hermes-agent/README.md +0 -49
- package/plugin-hermes-agent/__init__.py +0 -75
- package/plugin-hermes-agent/plugin.yaml +0 -15
- package/scripts/backfill-citations.mjs +0 -0
- package/scripts/build-codex-plugin.mjs +0 -61
- package/scripts/deepseek-probe.mjs +0 -67
- package/scripts/extract-triples.mjs +0 -207
- package/scripts/longmemeval/embedding-cache.ts +0 -77
- package/scripts/longmemeval/fetch-dataset.sh +0 -25
- package/scripts/longmemeval/run-harness.ts +0 -315
- package/scripts/longmemeval/scorer.ts +0 -99
- package/scripts/longmemeval/tsconfig.json +0 -9
- package/scripts/longmemeval/types.ts +0 -35
- package/scripts/nlm-daily-digest.py +0 -239
- package/scripts/nlm-daily-digest.sh +0 -28
- package/src/cli/classify-parity.ts +0 -257
- package/src/cli/launchctl-helpers.ts +0 -49
- package/src/cli/nlm.ts +0 -1078
- package/src/core/actions/actions-log.ts +0 -118
- package/src/core/actions/overlay.ts +0 -117
- package/src/core/adapters/aider.ts +0 -205
- package/src/core/adapters/claude-code.ts +0 -293
- package/src/core/adapters/common.ts +0 -54
- package/src/core/adapters/cursor.ts +0 -486
- package/src/core/adapters/from-source.ts +0 -67
- package/src/core/adapters/hermes-agent.ts +0 -240
- package/src/core/adapters/hermes.ts +0 -277
- package/src/core/adapters/jsonl-generic.ts +0 -208
- package/src/core/adapters/opencode.ts +0 -281
- package/src/core/adapters/pi.ts +0 -264
- package/src/core/adapters/windsurf.ts +0 -386
- package/src/core/classifier/prompt.ts +0 -200
- package/src/core/dataset/build-dataset.ts +0 -463
- package/src/core/embedding/chunk-body.ts +0 -76
- package/src/core/embedding/embed-backfill.ts +0 -210
- package/src/core/embedding/embed-normalize.ts +0 -135
- package/src/core/facts/backfill-facts.ts +0 -254
- package/src/core/facts/extract-facts.ts +0 -50
- package/src/core/hook/citation-detect.ts +0 -124
- package/src/core/hook/cite-memo.ts +0 -68
- package/src/core/hook/claude-settings.ts +0 -187
- package/src/core/hook/gate.ts +0 -25
- package/src/core/hook/hook-log.ts +0 -41
- package/src/core/hook/memo-sweep.ts +0 -164
- package/src/core/hook/memo.ts +0 -67
- package/src/core/hook/pointer-block.ts +0 -26
- package/src/core/hook/select.ts +0 -32
- package/src/core/hook/transcript.ts +0 -121
- package/src/core/ingest/ingest-session.ts +0 -111
- package/src/core/providers/provider-models.ts +0 -100
- package/src/core/providers/provider-registry.ts +0 -196
- package/src/core/recall/citation-log.ts +0 -108
- package/src/core/recall/filter.ts +0 -27
- package/src/core/recall/index.ts +0 -6
- package/src/core/recall/match-fields.ts +0 -40
- package/src/core/recall/query-log.ts +0 -149
- package/src/core/recall/query-shape.ts +0 -66
- package/src/core/recall/recall-service.ts +0 -320
- package/src/core/recall/recent-log.ts +0 -59
- package/src/core/recall/tokenize.ts +0 -18
- package/src/core/recall/useful-scan.ts +0 -336
- package/src/core/recall-facts/fact-query-log.ts +0 -150
- package/src/core/recall-facts/fact-recall-service.ts +0 -327
- package/src/core/scheduler/scan-once.ts +0 -142
- package/src/core/scheduler/scheduler.ts +0 -225
- package/src/core/sources/source-registry.ts +0 -278
- package/src/core/storage/db-restore.ts +0 -133
- package/src/core/storage/live-status.ts +0 -45
- package/src/core/storage/migrate.ts +0 -72
- package/src/core/storage/sqlite-fact-store.ts +0 -304
- package/src/core/storage/sqlite-session-store.ts +0 -810
- package/src/hook/hook-auth.ts +0 -18
- package/src/hook/prompt-recall-hook.ts +0 -180
- package/src/hook/session-end-hook.ts +0 -81
- package/src/hook/session-start-hook.ts +0 -168
- package/src/hook/stop-hook.ts +0 -239
- package/src/http/app.ts +0 -1215
- package/src/install/claude-code.ts +0 -128
- package/src/install/codex.ts +0 -367
- package/src/install/cursor.ts +0 -68
- package/src/install/hermes-agent.ts +0 -76
- package/src/install/hermes.ts +0 -78
- package/src/install/nlm-dir-perms.ts +0 -55
- package/src/install/ollama.ts +0 -284
- package/src/install/setup.ts +0 -489
- package/src/install/windsurf.ts +0 -68
- package/src/llm/classifier-box.ts +0 -64
- package/src/llm/deepseek-client.ts +0 -150
- package/src/llm/env-autoload.ts +0 -55
- package/src/llm/ollama-client.ts +0 -189
- package/src/mcp/server.ts +0 -534
- package/src/ports/fact-store.ts +0 -102
- package/src/ports/llm-client.ts +0 -52
- package/src/ports/logger.ts +0 -16
- package/src/ports/session-store.ts +0 -45
- package/src/ports/transcript-adapter.ts +0 -55
- package/src/shared/types.ts +0 -149
- package/src/ui/App.tsx +0 -58
- package/src/ui/components/PromoteOpenButton.tsx +0 -65
- package/src/ui/components/SessionDrawer.tsx +0 -199
- package/src/ui/components/SideNav.tsx +0 -162
- package/src/ui/components/Skeleton.tsx +0 -107
- package/src/ui/index.html +0 -13
- package/src/ui/lib/actions.ts +0 -30
- package/src/ui/lib/api.ts +0 -92
- package/src/ui/lib/dataset.ts +0 -141
- package/src/ui/lib/registries.ts +0 -155
- package/src/ui/lib/view-settings.ts +0 -41
- package/src/ui/main.tsx +0 -15
- package/src/ui/pages/Live.tsx +0 -229
- package/src/ui/pages/Pulse.tsx +0 -415
- package/src/ui/pages/Recall.tsx +0 -190
- package/src/ui/pages/River.tsx +0 -354
- package/src/ui/pages/Search.tsx +0 -386
- package/src/ui/pages/Stub.tsx +0 -9
- package/src/ui/pages/Thread.tsx +0 -473
- package/src/ui/pages/settings/Classifier.tsx +0 -227
- package/src/ui/pages/settings/Data.tsx +0 -190
- package/src/ui/pages/settings/Index.tsx +0 -65
- package/src/ui/pages/settings/Labels.tsx +0 -224
- package/src/ui/pages/settings/Providers.tsx +0 -305
- package/src/ui/pages/settings/SettingsSubnav.tsx +0 -28
- package/src/ui/pages/settings/Sources.tsx +0 -326
- package/src/ui/pages/settings/Views.tsx +0 -96
- package/src/ui/styles.css +0 -1890
- package/src/ui/tsconfig.json +0 -21
- package/src/ui/vite.config.ts +0 -19
- package/tests/fixtures/claude_code/short_session.jsonl +0 -2
- package/tests/fixtures/claude_code/standard_iso.jsonl +0 -4
- package/tests/fixtures/claude_code/tool_heavy.jsonl +0 -8
- package/tests/fixtures/claude_code/with_subagent.jsonl +0 -7
- package/tests/fixtures/facts.ts +0 -17
- package/tests/fixtures/golden-corpus.ts +0 -85
- package/tests/fixtures/hermes/paired_request_dump.json +0 -24
- package/tests/fixtures/hermes/paired_session.json +0 -23
- package/tests/fixtures/hermes/request_dump.json +0 -28
- package/tests/fixtures/hermes/session_iso.json +0 -38
- package/tests/fixtures/hermes/session_unix.json +0 -38
- package/tests/fixtures/hermes/system_only.json +0 -18
- package/tests/fixtures/pi/error-connection-abort.jsonl +0 -8
- package/tests/fixtures/pi/short-successful.jsonl +0 -5
- package/tests/fixtures/pi/with-custom-message.jsonl +0 -6
- package/tests/fixtures/sessions.ts +0 -22
- package/tests/integration/backfill-facts.test.ts +0 -362
- package/tests/integration/citation-explicit.test.ts +0 -111
- package/tests/integration/cite-event.test.ts +0 -169
- package/tests/integration/cite-memo.test.ts +0 -87
- package/tests/integration/db-restore.test.ts +0 -153
- package/tests/integration/embed-backfill.test.ts +0 -176
- package/tests/integration/fact-supersedence.test.ts +0 -313
- package/tests/integration/fts-index.test.ts +0 -60
- package/tests/integration/getbyids-sqlite.test.ts +0 -100
- package/tests/integration/hermes-agent-hooks.test.ts +0 -248
- package/tests/integration/hook-claude-settings.test.ts +0 -218
- package/tests/integration/hook-log.test.ts +0 -54
- package/tests/integration/hook-memo.test.ts +0 -68
- package/tests/integration/hook-pre-compact.test.ts +0 -105
- package/tests/integration/hook-subagent-start.test.ts +0 -102
- package/tests/integration/http.test.ts +0 -401
- package/tests/integration/keyword-search-fts.test.ts +0 -66
- package/tests/integration/mcp-recall-logging.test.ts +0 -88
- package/tests/integration/mcp.test.ts +0 -260
- package/tests/integration/memo-sweep.test.ts +0 -91
- package/tests/integration/prompt-recall-hook.test.ts +0 -88
- package/tests/integration/provider-registry.test.ts +0 -107
- package/tests/integration/recall-golden.test.ts +0 -59
- package/tests/integration/recall-sqlite.test.ts +0 -169
- package/tests/integration/scheduler.test.ts +0 -391
- package/tests/integration/session-end-hook.test.ts +0 -48
- package/tests/integration/session-start-hook.test.ts +0 -126
- package/tests/integration/source-registry.test.ts +0 -122
- package/tests/integration/sqlite-fact-store.test.ts +0 -346
- package/tests/integration/stop-hook.test.ts +0 -560
- package/tests/integration/wal-checkpoint.test.ts +0 -49
- package/tests/unit/cli/launchctl-helpers.test.ts +0 -60
- package/tests/unit/core/adapters/aider.test.ts +0 -230
- package/tests/unit/core/adapters/claude-code.test.ts +0 -118
- package/tests/unit/core/adapters/cursor.test.ts +0 -485
- package/tests/unit/core/adapters/hermes-agent.test.ts +0 -329
- package/tests/unit/core/adapters/hermes.test.ts +0 -81
- package/tests/unit/core/adapters/jsonl-generic.test.ts +0 -142
- package/tests/unit/core/adapters/opencode.test.ts +0 -354
- package/tests/unit/core/adapters/pi.test.ts +0 -110
- package/tests/unit/core/adapters/windsurf.test.ts +0 -416
- package/tests/unit/core/classifier/prompt.test.ts +0 -126
- package/tests/unit/core/embedding/chunk-body.test.ts +0 -100
- package/tests/unit/core/facts/extract-facts.test.ts +0 -117
- package/tests/unit/core/filter.test.ts +0 -40
- package/tests/unit/core/hook/citation-detect-cite-session.test.ts +0 -96
- package/tests/unit/core/hook/citation-detect.test.ts +0 -124
- package/tests/unit/core/hook/gate.test.ts +0 -29
- package/tests/unit/core/hook/pointer-block.test.ts +0 -22
- package/tests/unit/core/hook/select.test.ts +0 -66
- package/tests/unit/core/match-fields.test.ts +0 -39
- package/tests/unit/core/mcp-cite-session.test.ts +0 -51
- package/tests/unit/core/providers/provider-models.test.ts +0 -101
- package/tests/unit/core/query-shape.test.ts +0 -92
- package/tests/unit/core/recall-facts/fact-recall-service.test.ts +0 -258
- package/tests/unit/core/recall-service.test.ts +0 -200
- package/tests/unit/core/storage/live-status.test.ts +0 -54
- package/tests/unit/core/tokenize.test.ts +0 -32
- package/tests/unit/core/useful-scan.test.ts +0 -537
- package/tests/unit/llm/embed.test.ts +0 -93
- package/tests/unit/llm/ollama-client.test.ts +0 -124
- package/tests/unit/scripts/longmemeval-scorer.test.ts +0 -114
- package/tsconfig.json +0 -31
- package/tsconfig.test.json +0 -11
- package/vitest.config.ts +0 -22
package/dist/ui/index.html
CHANGED
|
@@ -4,8 +4,8 @@
|
|
|
4
4
|
<meta charset="UTF-8" />
|
|
5
5
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
|
6
6
|
<title>nlm memory</title>
|
|
7
|
-
<script type="module" crossorigin src="/ui/assets/index-
|
|
8
|
-
<link rel="stylesheet" crossorigin href="/ui/assets/index-
|
|
7
|
+
<script type="module" crossorigin src="/ui/assets/index-CSPTTeeM.js"></script>
|
|
8
|
+
<link rel="stylesheet" crossorigin href="/ui/assets/index-Beo8psd-.css">
|
|
9
9
|
</head>
|
|
10
10
|
<body>
|
|
11
11
|
<div id="root"></div>
|
package/package.json
CHANGED
|
@@ -1,12 +1,37 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "nlm-memory",
|
|
3
|
-
"version": "0.5.
|
|
3
|
+
"version": "0.5.2",
|
|
4
4
|
"description": "Local-first non-linear memory operating system for AI operators.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"license": "Apache-2.0",
|
|
7
7
|
"engines": {
|
|
8
8
|
"node": ">=20.0.0"
|
|
9
9
|
},
|
|
10
|
+
"repository": {
|
|
11
|
+
"type": "git",
|
|
12
|
+
"url": "git+https://github.com/pbmagnet4/nlm-memory-ts.git"
|
|
13
|
+
},
|
|
14
|
+
"homepage": "https://github.com/pbmagnet4/nlm-memory-ts#readme",
|
|
15
|
+
"bugs": {
|
|
16
|
+
"url": "https://github.com/pbmagnet4/nlm-memory-ts/issues"
|
|
17
|
+
},
|
|
18
|
+
"keywords": [
|
|
19
|
+
"ai",
|
|
20
|
+
"memory",
|
|
21
|
+
"mcp",
|
|
22
|
+
"claude-code",
|
|
23
|
+
"codex",
|
|
24
|
+
"hermes",
|
|
25
|
+
"local-first",
|
|
26
|
+
"recall",
|
|
27
|
+
"session-memory"
|
|
28
|
+
],
|
|
29
|
+
"files": [
|
|
30
|
+
"dist",
|
|
31
|
+
"plugin",
|
|
32
|
+
"LICENSE",
|
|
33
|
+
"README.md"
|
|
34
|
+
],
|
|
10
35
|
"bin": {
|
|
11
36
|
"nlm": "dist/cli/nlm.js"
|
|
12
37
|
},
|
|
@@ -1,20 +0,0 @@
|
|
|
1
|
-
{
|
|
2
|
-
"name": "nlm-memory-ts",
|
|
3
|
-
"interface": {
|
|
4
|
-
"displayName": "nlm-memory"
|
|
5
|
-
},
|
|
6
|
-
"plugins": [
|
|
7
|
-
{
|
|
8
|
-
"name": "nlm-memory",
|
|
9
|
-
"source": {
|
|
10
|
-
"source": "local",
|
|
11
|
-
"path": "./plugin"
|
|
12
|
-
},
|
|
13
|
-
"policy": {
|
|
14
|
-
"installation": "AVAILABLE",
|
|
15
|
-
"authentication": "ON_USE"
|
|
16
|
-
},
|
|
17
|
-
"category": "Coding"
|
|
18
|
-
}
|
|
19
|
-
]
|
|
20
|
-
}
|
package/.github/workflows/ci.yml
DELETED
|
@@ -1,30 +0,0 @@
|
|
|
1
|
-
name: CI
|
|
2
|
-
|
|
3
|
-
on:
|
|
4
|
-
push:
|
|
5
|
-
branches: [main]
|
|
6
|
-
pull_request:
|
|
7
|
-
branches: [main]
|
|
8
|
-
|
|
9
|
-
jobs:
|
|
10
|
-
test:
|
|
11
|
-
runs-on: ubuntu-latest
|
|
12
|
-
steps:
|
|
13
|
-
- uses: actions/checkout@v4
|
|
14
|
-
|
|
15
|
-
- uses: actions/setup-node@v4
|
|
16
|
-
with:
|
|
17
|
-
node-version: "20"
|
|
18
|
-
cache: npm
|
|
19
|
-
|
|
20
|
-
- name: Install dependencies
|
|
21
|
-
run: npm ci
|
|
22
|
-
|
|
23
|
-
- name: Typecheck
|
|
24
|
-
run: npm run typecheck
|
|
25
|
-
|
|
26
|
-
- name: Test
|
|
27
|
-
run: npm test
|
|
28
|
-
|
|
29
|
-
- name: Build (server)
|
|
30
|
-
run: npm run build
|
|
@@ -1,112 +0,0 @@
|
|
|
1
|
-
# re-derivation_rate — design
|
|
2
|
-
|
|
3
|
-
## Why
|
|
4
|
-
|
|
5
|
-
`re_derivation_rate` is NLM's strategic metric — the operator-outcome number that competitors (mem0, agentmemory, Letta) cannot match because their destructive lifecycle (decay, auto-forget) erases the data needed to compute it. It is the headline number for Pulse, the cron digest, and any public marketing scorecard. Detection rule, methodology, and a reproducible script live here so the metric is auditable.
|
|
6
|
-
|
|
7
|
-
## Plain-language definition
|
|
8
|
-
|
|
9
|
-
A *re-derivation* is when an operator (you, in any AI runtime) solves the same problem twice across multiple sessions without recall of the prior solution. It is the tax NLM exists to eliminate: every re-derivation is a session where memory could have helped but didn't.
|
|
10
|
-
|
|
11
|
-
`re_derivation_rate` over a window = (re-derivation events) / (decision events) in that window.
|
|
12
|
-
|
|
13
|
-
`re_derivations_prevented` = recall events whose `useful_hit_rate` is true AND whose returned session contained the matching decision. Inverse of re-derivation: the events where memory *did* help.
|
|
14
|
-
|
|
15
|
-
## Detection rule (V1)
|
|
16
|
-
|
|
17
|
-
A pair of sessions `(A, B)` is a re-derivation iff all of the following hold:
|
|
18
|
-
|
|
19
|
-
1. **Same entity.** A and B share at least one entity in their respective `entities` arrays.
|
|
20
|
-
2. **Same decision normalized.** A `decision` marker in A and a `decision` marker in B normalize to overlapping content. Normalization: lowercase, strip stopwords, tokenize, Jaccard similarity ≥ 0.6.
|
|
21
|
-
3. **Temporal gap.** `B.started_at - A.started_at >= 7 days`.
|
|
22
|
-
4. **No supersedence link.** No `session_edges` row of kind `supersedes` connects A and B in either direction.
|
|
23
|
-
5. **No continues link.** No `session_edges` row of kind `continues` connects A and B.
|
|
24
|
-
6. **No intervening recall.** Between A.started_at and B.started_at, no recall event in `query-log.jsonl` or `hook-log.jsonl` returned A's id (would mean B's operator was aware of A and chose not to link).
|
|
25
|
-
|
|
26
|
-
When all six are true, `B` is a re-derivation of `A`. Count B (not A) — the metric measures fresh re-derivations, not the original.
|
|
27
|
-
|
|
28
|
-
## Edge cases and resolutions
|
|
29
|
-
|
|
30
|
-
- **Three sessions A, B, C** where B re-derives A and C re-derives B: count B and C, not A.
|
|
31
|
-
- **Trivial decisions.** Decisions under N tokens (default 6) are excluded — "yes ship it" is not a meaningful decision to track.
|
|
32
|
-
- **High-frequency entities.** If an entity has >50 sessions in the window, scale the Jaccard threshold up to 0.75 to reduce false positives (common topics will inevitably overlap in keyword-trivial ways).
|
|
33
|
-
- **Probe / test entities.** Sessions whose label matches probe patterns (see useful-hit-rate.md) are excluded from both sides.
|
|
34
|
-
|
|
35
|
-
## Computation algorithm
|
|
36
|
-
|
|
37
|
-
```python
|
|
38
|
-
def find_re_derivations(sessions, edges, recalls, window_days):
|
|
39
|
-
pairs = []
|
|
40
|
-
decisions = collect_decisions(sessions) # one row per (session_id, normalized_decision_tokens, entities)
|
|
41
|
-
for ent in distinct_entities(decisions):
|
|
42
|
-
ent_decisions = sorted(by_session_start([d for d in decisions if ent in d.entities]))
|
|
43
|
-
for i, a in enumerate(ent_decisions):
|
|
44
|
-
for b in ent_decisions[i+1:]:
|
|
45
|
-
if days_between(a, b) < 7: continue
|
|
46
|
-
if days_between(a, b) > window_days: break
|
|
47
|
-
if jaccard(a.tokens, b.tokens) < threshold(ent): continue
|
|
48
|
-
if has_edge(edges, a, b, ("supersedes", "continues")): continue
|
|
49
|
-
if recall_returned_a_between(recalls, a, b): continue
|
|
50
|
-
pairs.append((a, b))
|
|
51
|
-
return pairs
|
|
52
|
-
```
|
|
53
|
-
|
|
54
|
-
Runs over the existing canonical sqlite (sessions + session_edges) and the recall log jsonl files. No new schema, no migration. Computed in a single pass; results cached by `(window_start, window_end)` in a new `re_derivation_log` table.
|
|
55
|
-
|
|
56
|
-
## Storage
|
|
57
|
-
|
|
58
|
-
- New table `re_derivation_log`: `(window_start, window_end, computed_at, session_a_id, session_b_id, entity, jaccard, decision_a, decision_b)`. One row per detected pair. Re-computable; deletable; not source of truth.
|
|
59
|
-
- New endpoint field on `/api/recall/stats`: `re_derivation_count_7d`, `re_derivations_prevented_7d`.
|
|
60
|
-
- Pulse: new headline tile showing both numbers and the weekly trend.
|
|
61
|
-
|
|
62
|
-
## CLI
|
|
63
|
-
|
|
64
|
-
- `nlm re-derivation scan` — recomputes the log for a window. Default last 30 days.
|
|
65
|
-
- `nlm re-derivation list --since 7d` — lists detected pairs with the matched decisions for human review (false-positive triage).
|
|
66
|
-
- `nlm re-derivation explain <session-b-id>` — for one B, show why it was flagged (matched A, decision overlap, why no recall covered it).
|
|
67
|
-
|
|
68
|
-
## Calibration loop
|
|
69
|
-
|
|
70
|
-
Re-derivation detection is heuristic. False positives waste reader trust; false negatives undersell the metric. Calibration weekly for the first month after V1:
|
|
71
|
-
|
|
72
|
-
1. Run `nlm re-derivation list --since 7d`
|
|
73
|
-
2. Edward reviews each flagged pair
|
|
74
|
-
3. Mark `true_re_derivation: true|false` in a `re_derivation_feedback` table
|
|
75
|
-
4. Adjust Jaccard threshold + minimum decision length until precision/recall both > 70% on Edward's review
|
|
76
|
-
|
|
77
|
-
After 4 weeks of calibration, freeze the parameters and publish them in `docs/methodology/re-derivation-rate.md` for external use.
|
|
78
|
-
|
|
79
|
-
## Public scorecard format
|
|
80
|
-
|
|
81
|
-
For external publication (gated on the marketing-readiness checklist):
|
|
82
|
-
|
|
83
|
-
```
|
|
84
|
-
Edward's corpus, week of YYYY-MM-DD:
|
|
85
|
-
Sessions in window: N
|
|
86
|
-
Decisions in window: M
|
|
87
|
-
Re-derivations detected: X
|
|
88
|
-
Re-derivations prevented: Y (recall returned the matching prior session)
|
|
89
|
-
Re-derivation rate: X / M = Z.Z%
|
|
90
|
-
Methodology: docs/methodology/re-derivation-rate.md
|
|
91
|
-
Calibration set: docs/calibration/re-derivation-2026-MM.md
|
|
92
|
-
```
|
|
93
|
-
|
|
94
|
-
Publish weekly to the repo. The trend (rate falling over time as NLM gets more useful) is the marketing story.
|
|
95
|
-
|
|
96
|
-
## Why competitors cannot match this
|
|
97
|
-
|
|
98
|
-
agentmemory's 4-tier lifecycle decays old observations and auto-forgets stale facts. Without the historical session record intact, there is no Session A to detect a re-derivation against — the data is gone. mem0 uses passive extraction and accretion, with no native concept of session identity that would let you pair A and B. Letta's core memory is in-context, not historical.
|
|
99
|
-
|
|
100
|
-
NLM's supersedence + full-session retention is the prerequisite for this metric. It is the strategic moat made measurable.
|
|
101
|
-
|
|
102
|
-
## Out of scope (V1)
|
|
103
|
-
|
|
104
|
-
- Cross-runtime re-derivation (decision in Claude Code, re-derived in Hermes). Requires reliable entity normalization across adapters; defer to V2.
|
|
105
|
-
- Semantic similarity instead of Jaccard (would catch paraphrased decisions but requires embedding every decision). Defer.
|
|
106
|
-
- Automatic supersedence link suggestion from detected re-derivations. The metric should measure, not act, until calibrated.
|
|
107
|
-
|
|
108
|
-
## Implementation phasing
|
|
109
|
-
|
|
110
|
-
1. **Phase 1 (after #152, #153, #154 ship):** implement detection algorithm + CLI + scan command. No UI changes. Validate on Edward's corpus.
|
|
111
|
-
2. **Phase 2 (after 2 weeks of calibration):** wire `re_derivation_count_7d` into `/api/recall/stats` and the daily digest. Pulse tile.
|
|
112
|
-
3. **Phase 3 (gated on marketing readiness):** publish first weekly scorecard publicly. Repo README. Landing site.
|
|
@@ -1,79 +0,0 @@
|
|
|
1
|
-
# useful_hit_rate — design
|
|
2
|
-
|
|
3
|
-
## Why
|
|
4
|
-
|
|
5
|
-
`hit_rate` reports the fraction of recall calls that returned ≥1 row. With the MCP default now hybrid, that number is structurally close to 100% — semantic always returns *something*. `hit_rate` no longer separates "found stuff" from "found stuff that mattered." `useful_hit_rate` is the metric we actually want: the fraction of recall calls whose returned results were referenced in the next assistant turn.
|
|
6
|
-
|
|
7
|
-
This is the signal that lets us answer "is NLM serving its intended purpose" with evidence instead of opinion, and it's an input to the headline re-derivation rate metric (see [re-derivation-rate.md](re-derivation-rate.md) — pending).
|
|
8
|
-
|
|
9
|
-
## Definitions
|
|
10
|
-
|
|
11
|
-
**A recall event** is one of:
|
|
12
|
-
- A hook fire (logged in `~/.nlm/hook-log.jsonl` with `wouldInject` ids)
|
|
13
|
-
- An MCP `recall_sessions` / `recall_facts` call (logged in `~/.nlm/query-log.jsonl`)
|
|
14
|
-
- An HTTP `/api/recall` call (logged in `~/.nlm/query-log.jsonl`)
|
|
15
|
-
|
|
16
|
-
**A useful recall** is a recall event where:
|
|
17
|
-
- At least one of the returned session ids OR session labels appears in the next assistant message in the same conversation transcript, AND
|
|
18
|
-
- The match occurs within 3 assistant turns of the recall, AND
|
|
19
|
-
- The recall is not a probe (excluded query patterns: `concurrency probe`, `test probe`, `path test`, `recall test`, smoke/cutover patterns)
|
|
20
|
-
|
|
21
|
-
**`useful_hit_rate`** = (useful recalls) / (real recalls) over the reporting window.
|
|
22
|
-
|
|
23
|
-
## Detection algorithm
|
|
24
|
-
|
|
25
|
-
```
|
|
26
|
-
for each real recall event in window:
|
|
27
|
-
transcript = find_transcript(event.conversationId)
|
|
28
|
-
if transcript is None:
|
|
29
|
-
mark useful = null (unmeasurable)
|
|
30
|
-
continue
|
|
31
|
-
next_assistant_msgs = transcript.messages_after(event.ts, role="assistant", limit=3)
|
|
32
|
-
haystack = " ".join(m.content for m in next_assistant_msgs)
|
|
33
|
-
for hit_id in event.returnedIds:
|
|
34
|
-
if hit_id in haystack or session_label(hit_id) in haystack:
|
|
35
|
-
mark useful = true; break
|
|
36
|
-
else:
|
|
37
|
-
mark useful = false
|
|
38
|
-
```
|
|
39
|
-
|
|
40
|
-
## Data flow
|
|
41
|
-
|
|
42
|
-
1. **Hook recalls** have `conversationId` directly. Transcript path: `~/.claude/projects/<sanitized-project>/<conversationId>.jsonl`.
|
|
43
|
-
2. **MCP recalls** currently have no conversation context in `query-log.jsonl`. Adding `x-claude-session-id` capture to the MCP server is a prerequisite for measuring MCP useful_hit_rate.
|
|
44
|
-
3. **HTTP recalls** are operator-driven (UI browsing) and excluded from this metric — `useful_hit_rate` measures agent recall usefulness, not UI search satisfaction.
|
|
45
|
-
|
|
46
|
-
## Storage
|
|
47
|
-
|
|
48
|
-
- New log file `~/.nlm/useful-hit-log.jsonl`, one entry per scanned recall:
|
|
49
|
-
```json
|
|
50
|
-
{"ts": "...", "source": "hook|mcp", "conversationId": "...", "returnedIds": [...], "useful": true|false|null, "matchedId": "...", "scannedAt": "..."}
|
|
51
|
-
```
|
|
52
|
-
- New CLI: `nlm useful-scan` — scans the last 24h of recalls, joins against transcripts, appends to the log
|
|
53
|
-
- New endpoint field: `/api/recall/stats` includes `useful_hit_rate` and `useful_hit_count` over the same window as `hit_rate`
|
|
54
|
-
|
|
55
|
-
## Out of scope (V1)
|
|
56
|
-
|
|
57
|
-
- MCP useful_hit_rate (blocked on conversation-id capture; track as follow-up)
|
|
58
|
-
- Real-time useful-hit detection (V1 is batch-scan, run on the daily digest cron)
|
|
59
|
-
- Distinguishing "agent quoted the recall" vs "agent acted on it" (the former is a proxy for the latter; V2 could refine)
|
|
60
|
-
- HTTP UI click-through (different metric — would live under a separate `ui_click_rate`)
|
|
61
|
-
|
|
62
|
-
## V1 scope (shipping now)
|
|
63
|
-
|
|
64
|
-
- Ship the daily digest cron consuming existing `hit_rate` (this doc justifies the upgrade path)
|
|
65
|
-
- Add stub field `useful_hit_rate: null` to `/api/recall/stats` so the digest schema is forward-compatible
|
|
66
|
-
- Implement the scanner + CLI in a follow-up commit (target: within 7 days)
|
|
67
|
-
|
|
68
|
-
## Why batch-scan vs hook-vs-hook real-time
|
|
69
|
-
|
|
70
|
-
A second Claude Code hook (`Stop` or `PostToolUse`) could compute usefulness in real time. Rejected because:
|
|
71
|
-
- Doubles installation surface (two hooks per agent runtime)
|
|
72
|
-
- Adds per-turn latency for a metric the user reads once/day
|
|
73
|
-
- Doesn't generalize to Hermes, pi, Codex, Gemini, Aider (no equivalent post-turn hook on most)
|
|
74
|
-
- Batch-scan reads the same transcript files the daemon already polls
|
|
75
|
-
|
|
76
|
-
## Open questions
|
|
77
|
-
|
|
78
|
-
- Hit-label heuristic: substring match is cheap but noisy. Worth fuzzy matching session label tokens? Defer until V1 data shows the false-positive rate.
|
|
79
|
-
- Window for scan: hour-bucket vs day-bucket? Daily-bucket for now to match the digest cadence; revisit if cron interval changes.
|