la-machina-engine 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -22,7 +22,7 @@ npm install la-machina-engine
22
22
 
23
23
  **v0.3.0 — published on npm; production-ready core, evolving feature surface.**
24
24
 
25
- - **1214** unit + integration tests pass (8 pre-existing Bun-timer failures unrelated)
25
+ - **1553** unit + integration tests pass (8 pre-existing Bun-timer failures unrelated)
26
26
  - Zero top-level `node:` imports — runs on Node.js AND Cloudflare Workers
27
27
  - 14 live workflow tests (W1–W14) verified against OpenRouter, real R2, real MCP servers
28
28
  - Pause/resume + async runs + webhooks + state.json + R2 binding storage adapter
@@ -1033,6 +1033,299 @@ dispatch with `{ service, method, path, status, latencyMs, bytesIn }`
1033
1033
  services are configured. Absent `config.api` → tool never registered,
1034
1034
  no prompt mention.
1035
1035
 
1036
+ ### Tool-result offload — keep the context lean on chatty runs
1037
+
1038
+ Research-style runs often fire a dozen `WebSearch` / `WebFetch` /
1039
+ `ApiCall` invocations and each one can return tens of KB. The raw
1040
+ payloads flood the main context even before the compactor kicks in.
1041
+ Tool-result offload is the **preventive** fix: when a tool returns
1042
+ more bytes than your threshold, the engine stores the full content
1043
+ under the agent's log path and replaces the in-context message with
1044
+ a short deterministic summary + a `ref` token. A built-in `FetchData`
1045
+ tool rehydrates the original payload on demand — same UX as
1046
+ `SkillPage` for skills.
1047
+
1048
+ Off by default. Flip it on at the engine level:
1049
+
1050
+ ```ts
1051
+ const engine = initEngine({
1052
+ compaction: {
1053
+ toolResultOffload: {
1054
+ enabled: true,
1055
+ thresholdBytes: 2048, // default: 2048 (2 KB)
1056
+ maxPreviewChars: 500, // default: 500
1057
+ },
1058
+ },
1059
+ })
1060
+ ```
1061
+
1062
+ When enabled, every tool result whose body exceeds `thresholdBytes`
1063
+ lands at
1064
+ `projects/{runId}/nodes/{nodeId}/toolResults/{toolUseId}.json`
1065
+ (subagents offload under their own `subagents/{agentId}/toolResults/…`),
1066
+ and the model sees a summary like:
1067
+
1068
+ ```
1069
+ [WebSearch] Array of 10 items (14.2 KB).
1070
+ First item preview: {"title":"…","url":"…"}
1071
+ Use FetchData with ref="toolu_abc" to read the full array.
1072
+ ```
1073
+
1074
+ The model calls `FetchData({ ref: "toolu_abc" })` when it actually
1075
+ needs the raw bytes. One extra round-trip, and only when the
1076
+ information is actually required.
1077
+
1078
+ **Per-run override** (SaaS pattern — one engine, different thresholds
1079
+ per tenant or per task):
1080
+
1081
+ ```ts
1082
+ await engine.run({
1083
+ task: '…',
1084
+ compaction: {
1085
+ toolResultOffload: { enabled: true, thresholdBytes: 4096 },
1086
+ },
1087
+ })
1088
+ ```
1089
+
1090
+ **Behavioural invariants:**
1091
+
1092
+ - **Error tool results are never offloaded.** The model needs them
1093
+ verbatim to adapt; replacing them with a ref would break debugging.
1094
+ - **`FetchData`'s own output is never re-offloaded.** Would trap the
1095
+ model in a hydrate loop.
1096
+ - **Each agent sees only its own refs.** A subagent can't `FetchData`
1097
+ a parent's offloaded blob — `FetchData` is bound to each agent's
1098
+ log path at construction.
1099
+ - **Refs survive resume.** Blobs live in the same storage adapter as
1100
+ transcripts and snapshots; resuming a paused run still has access
1101
+ to every offloaded payload.
1102
+ - **Strict `>` at threshold.** A result at exactly `thresholdBytes`
1103
+ stays inline.
1104
+
1105
+ **Pluggable summarizer** — the default is deterministic
1106
+ (shape-aware for JSON arrays and top-level objects, first-N-chars
1107
+ for arbitrary text). Users wanting semantic summaries can plug in
1108
+ their own:
1109
+
1110
+ ```ts
1111
+ toolResultOffload: {
1112
+ enabled: true,
1113
+ summarizer: async (ctx) => {
1114
+ // ctx = { toolName, toolInput, rawContent, rawBytes, ref, maxPreviewChars }
1115
+ // Return a string; it MUST include `ctx.ref` so the model can
1116
+ // call FetchData to rehydrate the full content.
1117
+ return myCustomSummary(ctx)
1118
+ },
1119
+ }
1120
+ ```
1121
+
1122
+ > **Coming in a future release:** an engine-shipped LLM summarizer
1123
+ > (call a small/fast model to write a real summary instead of the
1124
+ > deterministic one). Track via Plan 021's "Deferred with triggers"
1125
+ > list. The `summarizer` callback IS the extension point the LLM
1126
+ > version will plug into, so users implementing custom summarizers
1127
+ > today are on the stable surface.
1128
+
1129
+ **Disabling:** `tools.disabled: ['FetchData']` removes the tool even
1130
+ when offload is enabled. Absent `config.compaction.toolResultOffload`
1131
+ → tool not registered, no threshold checks, no storage writes, no
1132
+ prompt mention.
1133
+
1134
+ **When offload actually saves tokens.** Offload wins when (a) a run
1135
+ makes many tool calls and (b) the model rarely needs the full payload
1136
+ of most of them — e.g. a research run with 12 `WebSearch` calls where
1137
+ only 2 are deep-read for quotes. If the model needs the full content
1138
+ of every tool call (a single-shot "fetch this large thing and answer")
1139
+ offload costs an extra `FetchData` round-trip without saving anything.
1140
+ Rule of thumb: leave it off for single-tool-call workflows, turn it
1141
+ on for multi-call research / browsing / repeated-API flows. Benchmark
1142
+ on your workload — the live test at
1143
+ `scripts/workflows/w17-offload-live.mjs` shows how.
1144
+
1145
+ ### Knowledge base — `SearchKnowledge` + `ReadKnowledge`
1146
+
1147
+ When the model needs to look things up in your tenant's docs without
1148
+ loading whole files into context, opt into the knowledge base. Two
1149
+ built-in tools — `SearchKnowledge` (token-overlap-ranked snippets)
1150
+ and `ReadKnowledge` (one section or a whole file) — let an agent
1151
+ walk a per-tenant vault on demand.
1152
+
1153
+ **Layout.** Each tenant gets a folder at
1154
+ `workspaces/{workspaceId}/knowledge/`, sibling to `.claude/`. Top-
1155
+ level subfolders are *bases* — independent corpora each with their
1156
+ own pre-built `_index.json`:
1157
+
1158
+ ```
1159
+ workspaces/acme-corp/
1160
+ ├── .claude/ # engine state — transcripts, memory, …
1161
+ └── knowledge/
1162
+ ├── hr-policies/ # base: "hr-policies"
1163
+ │ ├── _index.json # built by writeKnowledgeIndex()
1164
+ │ ├── handbook.md
1165
+ │ └── remote-work.md
1166
+ └── sales-playbook/ # base: "sales-playbook"
1167
+ ├── _index.json
1168
+ └── q1/
1169
+ └── pricing.md
1170
+ ```
1171
+
1172
+ **Build the index** when the corpus changes — for each base, the
1173
+ indexer walks its subtree, splits markdown at heading boundaries,
1174
+ tokenises section bodies, and writes one `_index.json` per base:
1175
+
1176
+ ```ts
1177
+ import { writeKnowledgeIndex, R2StorageAdapter } from 'la-machina-engine'
1178
+
1179
+ const k = new R2StorageAdapter(r2Config, 'workspaces/acme-corp/knowledge')
1180
+ await writeKnowledgeIndex({ adapter: k, base: 'hr-policies' })
1181
+ await writeKnowledgeIndex({ adapter: k, base: 'sales-playbook' })
1182
+ ```
1183
+
1184
+ **Configure the engine** to enable the tools (off by default):
1185
+
1186
+ ```ts
1187
+ const engine = initEngine({
1188
+ storage: { provider: 'r2', /* … */ },
1189
+ knowledge: {
1190
+ enabled: true, // engine-level capability flag
1191
+ maxSearchResults: 5, // top-K per SearchKnowledge call
1192
+ maxReadBytes: 10_000, // ReadKnowledge truncation cap
1193
+ },
1194
+ })
1195
+ ```
1196
+
1197
+ **Per-run scoping.** Folders are runtime-only — pass them via
1198
+ `RunOptions.knowledge.folders`. Sub-paths inside a base work too
1199
+ (e.g., `'sales-playbook/q1'` only sees Q1 content):
1200
+
1201
+ ```ts
1202
+ await engine.run({
1203
+ task: 'What is our 401k match rate?',
1204
+ knowledge: {
1205
+ folders: ['hr-policies', 'sales-playbook/q1'],
1206
+ external: [
1207
+ // External file links — fetched on demand, never indexed.
1208
+ // `headers` are runtime-only and NEVER persist anywhere.
1209
+ {
1210
+ name: 'product-catalog',
1211
+ description: 'Product catalog CSV with unit pricing',
1212
+ url: 'https://api.acme.example/catalog.csv',
1213
+ format: 'csv',
1214
+ headers: { Authorization: 'Bearer sk_real_token' },
1215
+ },
1216
+ ],
1217
+ },
1218
+ })
1219
+ ```
1220
+
1221
+ The model then calls:
1222
+
1223
+ - `SearchKnowledge({ query: '401k matching' })` → top-K ranked
1224
+ snippets, each with a `ref` like `hr-policies/handbook.md#benefits`
1225
+ - `ReadKnowledge({ ref: 'hr-policies/handbook.md#benefits' })` →
1226
+ full body of that section
1227
+ - `ReadKnowledge({ ref: 'ext:product-catalog' })` → fetches the
1228
+ registered URL with its headers, runs the `csv` extractor, returns
1229
+ text
1230
+
1231
+ **Format support.** Native: `md`, `txt`, `json`, `csv`, `html` (script/
1232
+ style stripped, entities decoded, whitespace collapsed). Optional:
1233
+ `pdf` (via `pdf-parse`) and `docx` (via `mammoth`). Both have
1234
+ `requiresNode: true` — on Workers without those packages installed,
1235
+ they return a structured `ERR_KNOWLEDGE_FORMAT_UNSUPPORTED` error.
1236
+
1237
+ **Path safety.** All folder + ref strings flow through one validator
1238
+ in `src/knowledge/scope.ts` that rejects absolute paths, traversal
1239
+ (`..`), unsafe characters, and out-of-base file refs. A dedicated
1240
+ test (`scope.test.ts`) pins the behaviour — every weakening would
1241
+ open a tenant-boundary hole.
1242
+
1243
+ **External link headers — non-persistence guarantee.** External
1244
+ `headers` live entirely inside the tool factory closure and on the
1245
+ `init.headers` of one `fetch` call per request. They never reach the
1246
+ LLM, the transcript, `state.json`, snapshots, or any storage write.
1247
+ A sentinel-based test suite (`externalLinkSecrets.test.ts`) and the
1248
+ live R2 test (`scripts/workflows/w20-knowledge-r2.mjs`) verify this:
1249
+ the live test seeds a known sentinel into the `Authorization` header,
1250
+ runs against real R2, and reads every transcript shard back from the
1251
+ bucket asserting zero leaks.
1252
+
1253
+ **Composing with offload.** If a `ReadKnowledge` result exceeds your
1254
+ `compaction.toolResultOffload.thresholdBytes`, the offload pipeline
1255
+ takes over — the body is written under `toolResults/`, the model
1256
+ sees a summary + ref, and `FetchData` rehydrates on demand. The two
1257
+ features compose without any extra wiring.
1258
+
1259
+ **Disabling.** `tools.disabled: ['SearchKnowledge', 'ReadKnowledge']`
1260
+ turns the tools off even when knowledge is enabled. Absent
1261
+ `config.knowledge.enabled` → no adapter built, no tools registered,
1262
+ no prompt mention.
1263
+
1264
+ #### Codebase layout
1265
+
1266
+ The knowledge subsystem is small and self-contained. If you need to
1267
+ extend it (new format extractor, custom scorer, alternative index
1268
+ schema), these are the files involved:
1269
+
1270
+ ```
1271
+ src/
1272
+ ├── knowledge/ # subsystem (self-contained)
1273
+ │ ├── types.ts # V1-suffixed public types (KnowledgeFolderRefV1, KnowledgeExternalLinkV1, KnowledgeFormatV1, ResolvedKnowledgeConfigV1, RunKnowledgeOptionsV1, SectionEntryV1, KnowledgeIndexV1, …)
1274
+ │ ├── scope.ts # parseFolderRef / parseKnowledgeRef / refInScope — load-bearing path safety
1275
+ │ ├── tokenize.ts # tokenize() + scoreOverlap() — deterministic, no LLM
1276
+ │ ├── indexer.ts # buildKnowledgeIndex() / writeKnowledgeIndex() — section split + wiki-link extraction
1277
+ │ └── extractors.ts # getExtractor(format) — md/txt/json/csv/html native; pdf/docx lazy-import
1278
+
1279
+ ├── tools/
1280
+ │ ├── searchKnowledge.ts # createSearchKnowledgeTool() — token-overlap ranked snippets
1281
+ │ └── readKnowledge.ts # createReadKnowledgeTool() — section / file / ext: ref dispatch
1282
+
1283
+ ├── storage/
1284
+ │ ├── interface.ts # adds optional `EngineStorage.knowledge?`
1285
+ │ └── factory.ts # builds the knowledge adapter at workspaces/{ws}/knowledge/ when enabled
1286
+
1287
+ ├── config/
1288
+ │ ├── types.ts # ResolvedConfig.knowledge?: ResolvedKnowledgeConfigV1
1289
+ │ ├── schema.ts # KnowledgeConfigResolved zod schema (scalars only — no folders/headers)
1290
+ │ └── merge.ts # KNOWLEDGE_DEFAULTS + fillKnowledgeDefaults()
1291
+
1292
+ ├── engine/
1293
+ │ ├── engine.ts # resolveKnowledgeRuntime() + buildToolRegistry knowledge wire-up
1294
+ │ └── types.ts # adds `knowledge?: RunKnowledgeOptionsV1` to RunOptions/ResumeOptions
1295
+
1296
+ └── index.ts # public exports: writeKnowledgeIndex, buildKnowledgeIndex,
1297
+ # createSearchKnowledgeTool, createReadKnowledgeTool, getExtractor,
1298
+ # KnowledgeFormatV1, KnowledgeIndexV1, RunKnowledgeOptionsV1, …
1299
+
1300
+ test/
1301
+ ├── unit/
1302
+ │ ├── knowledge/
1303
+ │ │ ├── tokenize.test.ts # 15 tests — stop-words, dedup, scoring
1304
+ │ │ ├── indexer.test.ts # 16 tests — section split, wiki-links, recursion
1305
+ │ │ ├── scope.test.ts # path-safety pin (every weakening = tenant-boundary hole)
1306
+ │ │ ├── extractors.test.ts # 17 tests — all formats including pdf/docx fallbacks
1307
+ │ │ └── externalLinkSecrets.test.ts # 7 sentinel-based non-persistence tests
1308
+ │ ├── tools/
1309
+ │ │ ├── searchKnowledge.test.ts # caching, multi-base, sub-path, cap, factory rejection
1310
+ │ │ └── readKnowledge.test.ts # 18 tests — all three ref kinds + error paths
1311
+ │ └── config/
1312
+ │ └── knowledgeSchema.test.ts # 13 tests — defaults, partials, header rejection
1313
+
1314
+ └── integration/engine/
1315
+ ├── knowledgeE2E.test.ts # 6 scenarios — registration, round-trip, disabled, sub-path, subagent inheritance
1316
+ ├── knowledgeWithOffload.test.ts # large ReadKnowledge → offload blob + clean transcript
1317
+ └── knowledgeMultiBase.test.ts # multi-base ranking, base-prefixed refs, indexed + external mix
1318
+
1319
+ scripts/workflows/
1320
+ ├── w20-knowledge-r2.mjs # live R2 — vault search/read + external link
1321
+ └── w21-external-files-knowledge.mjs # live R2 — md/json/csv/html external file round-trip
1322
+ ```
1323
+
1324
+ `src/knowledge/` is the only directory you need to touch to add a
1325
+ new format. Append a new `KnowledgeExtractorV1` to `extractors.ts`
1326
+ and add the type to `KnowledgeFormatV1` in `types.ts` — everything
1327
+ else is dispatched off `getExtractor(format)`.
1328
+
1036
1329
  ### Sync vs. async — when to use which
1037
1330
 
1038
1331
  | Scenario | Use |
@@ -1266,6 +1559,7 @@ All features ported 1:1 from La-Machina's production runtime. Pure JS, Workers-c
1266
1559
  - [x] 22 built-in tools
1267
1560
  - [x] Custom tool registration via `defineTool()`
1268
1561
  - [x] Device path blocking (/dev/zero, /dev/random, /proc/kcore)
1562
+ - [x] Knowledge base (`SearchKnowledge` + `ReadKnowledge`) — opt-in, per-tenant vault under `workspaces/{ws}/knowledge/`, section-level indexing, format extractors (md/txt/json/csv/html native; pdf/docx via optional deps), external link headers with non-persistence guarantee
1269
1563
 
1270
1564
  ### Agent Hierarchy
1271
1565
  - [x] Subagent spawning with depth tracking (SubagentRegistry)
@@ -1292,7 +1586,7 @@ All features ported 1:1 from La-Machina's production runtime. Pure JS, Workers-c
1292
1586
  - [x] Workers compatibility (zero top-level node: imports)
1293
1587
 
1294
1588
  ### Testing
1295
- - [x] 960 tests across 86 files
1589
+ - [x] 1553 tests across 142 files
1296
1590
  - [x] 10 live workflow tests (W1-W10) against OpenRouter + R2
1297
1591
  - [x] Coverage: 81% lines, 85% branches, 91% functions
1298
1592
  - [x] CI pipeline (lint + typecheck + test + coverage gates)
@@ -1376,7 +1670,7 @@ Features intentionally not ported — either Anthropic-only, CLI-specific, or de
1376
1670
  ```bash
1377
1671
  npm install
1378
1672
  npm run build # tsup → dist/ (ESM + CJS + .d.ts)
1379
- npm test # 1214 tests (~12s with bun)
1673
+ npm test # 1553 tests (~30s with vitest)
1380
1674
  npm run test:watch # watch mode
1381
1675
  npm run test:coverage # with coverage gates
1382
1676
  npm run typecheck # TypeScript strict
@@ -1415,11 +1709,12 @@ publish permission on `la-machina-engine` and "Bypass 2FA" enabled.
1415
1709
 
1416
1710
  | Category | Files | Tests |
1417
1711
  |----------|-------|-------|
1418
- | Unit | 70+ | ~870 |
1419
- | Integration | 15+ | ~130 |
1712
+ | Unit | 113+ | ~1200 |
1713
+ | Integration | 15+ | ~150 |
1420
1714
  | E2E | 5 | ~30 |
1421
1715
  | Coverage additions | 20+ | ~130 |
1422
- | **Total** | **115+** | **1214** (current `bun test` count; 8 pre-existing Bun timer failures unrelated) |
1716
+ | Knowledge (Plan 023) | 11 | ~85 |
1717
+ | **Total** | **142** | **1553** (current `vitest run`; 8 skipped pre-existing) |
1423
1718
 
1424
1719
  ### Live Workflow Tests
1425
1720
 
@@ -1439,6 +1734,10 @@ publish permission on `la-machina-engine` and "Bypass 2FA" enabled.
1439
1734
  | W12 | Multi-agent + MCP + skills + HITL (parent gates child's publish) | 4 |
1440
1735
  | W13 | Per-run skill override (inline body + URL + fetch cache) | 4 |
1441
1736
  | W14 | MCP auth refresh + sampling round-trip (stdio + http) | n/a (integration) |
1737
+ | W17 | Tool-result offload (FetchData rehydrate) | 4 |
1738
+ | W19 | Kitchen-sink R2 (subagent + HITL + ApiCall + offload + skills + memory + hooks + webhook) | 12 |
1739
+ | W20 | Knowledge base on R2 (vault search/read + external link + bearer non-persistence) | 8 |
1740
+ | W21 | External-file knowledge on R2 — md/json/csv/html round-trip + 401 bounds | 5 |
1442
1741
 
1443
1742
  ---
1444
1743