la-machina-engine 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -495,7 +495,7 @@ engine.run({ runId, nodeId, task })
495
495
 
496
496
  ### Storage Adapter
497
497
 
498
- Two backends, same interface:
498
+ Three backends, same interface and the same relative layout on all of them:
499
499
 
500
500
  | Adapter | Backend | Use |
501
501
  |---------|---------|-----|
@@ -503,6 +503,33 @@ Two backends, same interface:
503
503
  | `R2StorageAdapter` | Cloudflare R2 via S3 protocol | Node / anywhere with S3 creds |
504
504
  | `R2BindingStorageAdapter` | Cloudflare R2 native binding (`env.BUCKET`) | Cloudflare Workers (`provider: 'r2-binding'`) |
505
505
 
506
+ **Path layout (identical across all three backends):**
507
+
508
+ ```
509
+ {rootPath}/workspaces/{workspaceId}/.claude/ ← tenant root
510
+ ├── memory/ ← tenant-shared, survives across runs
511
+ ├── skills/ ← (if config.skills.autoload)
512
+ └── projects/{runId}/nodes/{nodeId}/
513
+ ├── state.json, snapshot.json, 000000.jsonl, meta.json
514
+ └── subagents/{agentId}/… ← recursive, same shape
515
+ ```
516
+
517
+ `workspaces/` is a namespace guard (keeps engine data separate from
518
+ anything else in a shared bucket/filesystem); `.claude/` marks
519
+ engine-owned content. Both cost one directory level each.
520
+
521
+ **The workspace IS the tenant boundary.** One `workspaceId` per
522
+ tenant; nothing is shared across workspaces. The previous
523
+ `global` storage scope was removed in v0.5.0 — see migration note
524
+ below.
525
+
526
+ > **Migration from pre-0.5.0**: if you had data at `{rootPath}/.claude/`
527
+ > (the old global scope), move it under your workspace root:
528
+ > `mv {rootPath}/.claude {rootPath}/workspaces/{workspaceId}/.claude`.
529
+ > `config.memory.scope: 'global'` still parses but emits a
530
+ > deprecation warning and is rewritten to `'workspace'`; it'll be
531
+ > rejected outright in v1.0.0.
532
+
506
533
  ### Smart Memory
507
534
 
508
535
  Per-workspace learning across runs:
@@ -1006,6 +1033,115 @@ dispatch with `{ service, method, path, status, latencyMs, bytesIn }`
1006
1033
  services are configured. Absent `config.api` → tool never registered,
1007
1034
  no prompt mention.
1008
1035
 
1036
+ ### Tool-result offload — keep the context lean on chatty runs
1037
+
1038
+ Research-style runs often fire a dozen `WebSearch` / `WebFetch` /
1039
+ `ApiCall` invocations and each one can return tens of KB. The raw
1040
+ payloads flood the main context even before the compactor kicks in.
1041
+ Tool-result offload is the **preventive** fix: when a tool returns
1042
+ more bytes than your threshold, the engine stores the full content
1043
+ under the agent's log path and replaces the in-context message with
1044
+ a short deterministic summary + a `ref` token. A built-in `FetchData`
1045
+ tool rehydrates the original payload on demand — same UX as
1046
+ `SkillPage` for skills.
1047
+
1048
+ Off by default. Flip it on at the engine level:
1049
+
1050
+ ```ts
1051
+ const engine = initEngine({
1052
+ compaction: {
1053
+ toolResultOffload: {
1054
+ enabled: true,
1055
+ thresholdBytes: 2048, // default: 2048 (2 KB)
1056
+ maxPreviewChars: 500, // default: 500
1057
+ },
1058
+ },
1059
+ })
1060
+ ```
1061
+
1062
+ When enabled, every tool result whose body exceeds `thresholdBytes`
1063
+ lands at
1064
+ `projects/{runId}/nodes/{nodeId}/toolResults/{toolUseId}.json`
1065
+ (subagents offload under their own `subagents/{agentId}/toolResults/…`),
1066
+ and the model sees a summary like:
1067
+
1068
+ ```
1069
+ [WebSearch] Array of 10 items (14.2 KB).
1070
+ First item preview: {"title":"…","url":"…"}
1071
+ Use FetchData with ref="toolu_abc" to read the full array.
1072
+ ```
1073
+
1074
+ The model calls `FetchData({ ref: "toolu_abc" })` when it actually
1075
+ needs the raw bytes. One extra round-trip, and only when the
1076
+ information is actually required.
1077
+
1078
+ **Per-run override** (SaaS pattern — one engine, different thresholds
1079
+ per tenant or per task):
1080
+
1081
+ ```ts
1082
+ await engine.run({
1083
+ task: '…',
1084
+ compaction: {
1085
+ toolResultOffload: { enabled: true, thresholdBytes: 4096 },
1086
+ },
1087
+ })
1088
+ ```
1089
+
1090
+ **Behavioural invariants:**
1091
+
1092
+ - **Error tool results are never offloaded.** The model needs them
1093
+ verbatim to adapt; replacing them with a ref would break debugging.
1094
+ - **`FetchData`'s own output is never re-offloaded.** Would trap the
1095
+ model in a hydrate loop.
1096
+ - **Each agent sees only its own refs.** A subagent can't `FetchData`
1097
+ a parent's offloaded blob — `FetchData` is bound to each agent's
1098
+ log path at construction.
1099
+ - **Refs survive resume.** Blobs live in the same storage adapter as
1100
+ transcripts and snapshots; resuming a paused run still has access
1101
+ to every offloaded payload.
1102
+ - **Strict `>` at threshold.** A result at exactly `thresholdBytes`
1103
+ stays inline.
1104
+
1105
+ **Pluggable summarizer** — the default is deterministic
1106
+ (shape-aware for JSON arrays and top-level objects, first-N-chars
1107
+ for arbitrary text). Users wanting semantic summaries can plug in
1108
+ their own:
1109
+
1110
+ ```ts
1111
+ toolResultOffload: {
1112
+ enabled: true,
1113
+ summarizer: async (ctx) => {
1114
+ // ctx = { toolName, toolInput, rawContent, rawBytes, ref, maxPreviewChars }
1115
+ // Return a string; it MUST include `ctx.ref` so the model can
1116
+ // call FetchData to rehydrate the full content.
1117
+ return myCustomSummary(ctx)
1118
+ },
1119
+ }
1120
+ ```
1121
+
1122
+ > **Coming in a future release:** an engine-shipped LLM summarizer
1123
+ > (call a small/fast model to write a real summary instead of the
1124
+ > deterministic one). Track via Plan 021's "Deferred with triggers"
1125
+ > list. The `summarizer` callback IS the extension point the LLM
1126
+ > version will plug into, so users implementing custom summarizers
1127
+ > today are on the stable surface.
1128
+
1129
+ **Disabling:** `tools.disabled: ['FetchData']` removes the tool even
1130
+ when offload is enabled. Absent `config.compaction.toolResultOffload`
1131
+ → tool not registered, no threshold checks, no storage writes, no
1132
+ prompt mention.
1133
+
1134
+ **When offload actually saves tokens.** Offload wins when (a) a run
1135
+ makes many tool calls and (b) the model rarely needs the full payload
1136
+ of most of them — e.g. a research run with 12 `WebSearch` calls where
1137
+ only 2 are deep-read for quotes. If the model needs the full content
1138
+ of every tool call (a single-shot "fetch this large thing and answer")
1139
+ offload costs an extra `FetchData` round-trip without saving anything.
1140
+ Rule of thumb: leave it off for single-tool-call workflows, turn it
1141
+ on for multi-call research / browsing / repeated-API flows. Benchmark
1142
+ on your workload — the live test at
1143
+ `scripts/workflows/w17-offload-live.mjs` shows how.
1144
+
1009
1145
  ### Sync vs. async — when to use which
1010
1146
 
1011
1147
  | Scenario | Use |