npm - la-machina-engine - Versions diffs - 0.4.0 → 0.6.0 - Mend

la-machina-engine 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -495,7 +495,7 @@ engine.run({ runId, nodeId, task })
 ### Storage Adapter
-Two backends, same interface:
+Three backends, same interface and the same relative layout on all of them:
 | Adapter | Backend | Use |
 |---------|---------|-----|
@@ -503,6 +503,33 @@ Two backends, same interface:
 | `R2StorageAdapter` | Cloudflare R2 via S3 protocol | Node / anywhere with S3 creds |
 | `R2BindingStorageAdapter` | Cloudflare R2 native binding (`env.BUCKET`) | Cloudflare Workers (`provider: 'r2-binding'`) |
+**Path layout (identical across all three backends):**
+```
+{rootPath}/workspaces/{workspaceId}/.claude/   ← tenant root
+├── memory/              ← tenant-shared, survives across runs
+├── skills/              ← (if config.skills.autoload)
+└── projects/{runId}/nodes/{nodeId}/
+    ├── state.json, snapshot.json, 000000.jsonl, meta.json
+    └── subagents/{agentId}/…    ← recursive, same shape
+```
+`workspaces/` is a namespace guard (keeps engine data separate from
+anything else in a shared bucket/filesystem); `.claude/` marks
+engine-owned content. Both cost one directory level each.
+**The workspace IS the tenant boundary.** One `workspaceId` per
+tenant; nothing is shared across workspaces. The previous
+`global` storage scope was removed in v0.5.0 — see migration note
+below.
+> **Migration from pre-0.5.0**: if you had data at `{rootPath}/.claude/`
+> (the old global scope), move it under your workspace root:
+> `mv {rootPath}/.claude {rootPath}/workspaces/{workspaceId}/.claude`.
+> `config.memory.scope: 'global'` still parses but emits a
+> deprecation warning and is rewritten to `'workspace'`; it'll be
+> rejected outright in v1.0.0.
 ### Smart Memory
 Per-workspace learning across runs:
@@ -1006,6 +1033,115 @@ dispatch with `{ service, method, path, status, latencyMs, bytesIn }`
 services are configured. Absent `config.api` → tool never registered,
 no prompt mention.
+### Tool-result offload — keep the context lean on chatty runs
+Research-style runs often fire a dozen `WebSearch` / `WebFetch` /
+`ApiCall` invocations and each one can return tens of KB. The raw
+payloads flood the main context even before the compactor kicks in.
+Tool-result offload is the **preventive** fix: when a tool returns
+more bytes than your threshold, the engine stores the full content
+under the agent's log path and replaces the in-context message with
+a short deterministic summary + a `ref` token. A built-in `FetchData`
+tool rehydrates the original payload on demand — same UX as
+`SkillPage` for skills.
+Off by default. Flip it on at the engine level:
+```ts
+const engine = initEngine({
+  compaction: {
+    toolResultOffload: {
+      enabled: true,
+      thresholdBytes: 2048,      // default: 2048 (2 KB)
+      maxPreviewChars: 500,      // default: 500
+    },
+  },
+})
+```
+When enabled, every tool result whose body exceeds `thresholdBytes`
+lands at
+`projects/{runId}/nodes/{nodeId}/toolResults/{toolUseId}.json`
+(subagents offload under their own `subagents/{agentId}/toolResults/…`),
+and the model sees a summary like:
+```
+[WebSearch] Array of 10 items (14.2 KB).
+First item preview: {"title":"…","url":"…"}
+Use FetchData with ref="toolu_abc" to read the full array.
+```
+The model calls `FetchData({ ref: "toolu_abc" })` when it actually
+needs the raw bytes. One extra round-trip, and only when the
+information is actually required.
+**Per-run override** (SaaS pattern — one engine, different thresholds
+per tenant or per task):
+```ts
+await engine.run({
+  task: '…',
+  compaction: {
+    toolResultOffload: { enabled: true, thresholdBytes: 4096 },
+  },
+})
+```
+**Behavioural invariants:**
+- **Error tool results are never offloaded.** The model needs them
+  verbatim to adapt; replacing them with a ref would break debugging.
+- **`FetchData`'s own output is never re-offloaded.** Would trap the
+  model in a hydrate loop.
+- **Each agent sees only its own refs.** A subagent can't `FetchData`
+  a parent's offloaded blob — `FetchData` is bound to each agent's
+  log path at construction.
+- **Refs survive resume.** Blobs live in the same storage adapter as
+  transcripts and snapshots; resuming a paused run still has access
+  to every offloaded payload.
+- **Strict `>` at threshold.** A result at exactly `thresholdBytes`
+  stays inline.
+**Pluggable summarizer** — the default is deterministic
+(shape-aware for JSON arrays and top-level objects, first-N-chars
+for arbitrary text). Users wanting semantic summaries can plug in
+their own:
+```ts
+toolResultOffload: {
+  enabled: true,
+  summarizer: async (ctx) => {
+    // ctx = { toolName, toolInput, rawContent, rawBytes, ref, maxPreviewChars }
+    // Return a string; it MUST include `ctx.ref` so the model can
+    // call FetchData to rehydrate the full content.
+    return myCustomSummary(ctx)
+  },
+}
+```
+> **Coming in a future release:** an engine-shipped LLM summarizer
+> (call a small/fast model to write a real summary instead of the
+> deterministic one). Track via Plan 021's "Deferred with triggers"
+> list. The `summarizer` callback IS the extension point the LLM
+> version will plug into, so users implementing custom summarizers
+> today are on the stable surface.
+**Disabling:** `tools.disabled: ['FetchData']` removes the tool even
+when offload is enabled. Absent `config.compaction.toolResultOffload`
+→ tool not registered, no threshold checks, no storage writes, no
+prompt mention.
+**When offload actually saves tokens.** Offload wins when (a) a run
+makes many tool calls and (b) the model rarely needs the full payload
+of most of them — e.g. a research run with 12 `WebSearch` calls where
+only 2 are deep-read for quotes. If the model needs the full content
+of every tool call (a single-shot "fetch this large thing and answer")
+offload costs an extra `FetchData` round-trip without saving anything.
+Rule of thumb: leave it off for single-tool-call workflows, turn it
+on for multi-call research / browsing / repeated-API flows. Benchmark
+on your workload — the live test at
+`scripts/workflows/w17-offload-live.mjs` shows how.
 ### Sync vs. async — when to use which
 | Scenario | Use |