npm - la-machina-engine - Versions diffs - 0.5.0 → 0.7.0 - Mend

la-machina-engine 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -22,7 +22,7 @@ npm install la-machina-engine
 **v0.3.0 — published on npm; production-ready core, evolving feature surface.**
-- **1214** unit + integration tests pass (8 pre-existing Bun-timer failures unrelated)
+- **1553** unit + integration tests pass (8 pre-existing Bun-timer failures unrelated)
 - Zero top-level `node:` imports — runs on Node.js AND Cloudflare Workers
 - 14 live workflow tests (W1–W14) verified against OpenRouter, real R2, real MCP servers
 - Pause/resume + async runs + webhooks + state.json + R2 binding storage adapter
@@ -1033,6 +1033,299 @@ dispatch with `{ service, method, path, status, latencyMs, bytesIn }`
 services are configured. Absent `config.api` → tool never registered,
 no prompt mention.
+### Tool-result offload — keep the context lean on chatty runs
+Research-style runs often fire a dozen `WebSearch` / `WebFetch` /
+`ApiCall` invocations and each one can return tens of KB. The raw
+payloads flood the main context even before the compactor kicks in.
+Tool-result offload is the **preventive** fix: when a tool returns
+more bytes than your threshold, the engine stores the full content
+under the agent's log path and replaces the in-context message with
+a short deterministic summary + a `ref` token. A built-in `FetchData`
+tool rehydrates the original payload on demand — same UX as
+`SkillPage` for skills.
+Off by default. Flip it on at the engine level:
+```ts
+const engine = initEngine({
+  compaction: {
+    toolResultOffload: {
+      enabled: true,
+      thresholdBytes: 2048,      // default: 2048 (2 KB)
+      maxPreviewChars: 500,      // default: 500
+    },
+  },
+})
+```
+When enabled, every tool result whose body exceeds `thresholdBytes`
+lands at
+`projects/{runId}/nodes/{nodeId}/toolResults/{toolUseId}.json`
+(subagents offload under their own `subagents/{agentId}/toolResults/…`),
+and the model sees a summary like:
+```
+[WebSearch] Array of 10 items (14.2 KB).
+First item preview: {"title":"…","url":"…"}
+Use FetchData with ref="toolu_abc" to read the full array.
+```
+The model calls `FetchData({ ref: "toolu_abc" })` when it actually
+needs the raw bytes. One extra round-trip, and only when the
+information is actually required.
+**Per-run override** (SaaS pattern — one engine, different thresholds
+per tenant or per task):
+```ts
+await engine.run({
+  task: '…',
+  compaction: {
+    toolResultOffload: { enabled: true, thresholdBytes: 4096 },
+  },
+})
+```
+**Behavioural invariants:**
+- **Error tool results are never offloaded.** The model needs them
+  verbatim to adapt; replacing them with a ref would break debugging.
+- **`FetchData`'s own output is never re-offloaded.** Would trap the
+  model in a hydrate loop.
+- **Each agent sees only its own refs.** A subagent can't `FetchData`
+  a parent's offloaded blob — `FetchData` is bound to each agent's
+  log path at construction.
+- **Refs survive resume.** Blobs live in the same storage adapter as
+  transcripts and snapshots; resuming a paused run still has access
+  to every offloaded payload.
+- **Strict `>` at threshold.** A result at exactly `thresholdBytes`
+  stays inline.
+**Pluggable summarizer** — the default is deterministic
+(shape-aware for JSON arrays and top-level objects, first-N-chars
+for arbitrary text). Users wanting semantic summaries can plug in
+their own:
+```ts
+toolResultOffload: {
+  enabled: true,
+  summarizer: async (ctx) => {
+    // ctx = { toolName, toolInput, rawContent, rawBytes, ref, maxPreviewChars }
+    // Return a string; it MUST include `ctx.ref` so the model can
+    // call FetchData to rehydrate the full content.
+    return myCustomSummary(ctx)
+  },
+}
+```
+> **Coming in a future release:** an engine-shipped LLM summarizer
+> (call a small/fast model to write a real summary instead of the
+> deterministic one). Track via Plan 021's "Deferred with triggers"
+> list. The `summarizer` callback IS the extension point the LLM
+> version will plug into, so users implementing custom summarizers
+> today are on the stable surface.
+**Disabling:** `tools.disabled: ['FetchData']` removes the tool even
+when offload is enabled. Absent `config.compaction.toolResultOffload`
+→ tool not registered, no threshold checks, no storage writes, no
+prompt mention.
+**When offload actually saves tokens.** Offload wins when (a) a run
+makes many tool calls and (b) the model rarely needs the full payload
+of most of them — e.g. a research run with 12 `WebSearch` calls where
+only 2 are deep-read for quotes. If the model needs the full content
+of every tool call (a single-shot "fetch this large thing and answer")
+offload costs an extra `FetchData` round-trip without saving anything.
+Rule of thumb: leave it off for single-tool-call workflows, turn it
+on for multi-call research / browsing / repeated-API flows. Benchmark
+on your workload — the live test at
+`scripts/workflows/w17-offload-live.mjs` shows how.
+### Knowledge base — `SearchKnowledge` + `ReadKnowledge`
+When the model needs to look things up in your tenant's docs without
+loading whole files into context, opt into the knowledge base. Two
+built-in tools — `SearchKnowledge` (token-overlap-ranked snippets)
+and `ReadKnowledge` (one section or a whole file) — let an agent
+walk a per-tenant vault on demand.
+**Layout.** Each tenant gets a folder at
+`workspaces/{workspaceId}/knowledge/`, sibling to `.claude/`. Top-
+level subfolders are *bases* — independent corpora each with their
+own pre-built `_index.json`:
+```
+workspaces/acme-corp/
+├── .claude/                      # engine state — transcripts, memory, …
+└── knowledge/
+    ├── hr-policies/              # base: "hr-policies"
+    │   ├── _index.json           # built by writeKnowledgeIndex()
+    │   ├── handbook.md
+    │   └── remote-work.md
+    └── sales-playbook/           # base: "sales-playbook"
+        ├── _index.json
+        └── q1/
+            └── pricing.md
+```
+**Build the index** when the corpus changes — for each base, the
+indexer walks its subtree, splits markdown at heading boundaries,
+tokenises section bodies, and writes one `_index.json` per base:
+```ts
+import { writeKnowledgeIndex, R2StorageAdapter } from 'la-machina-engine'
+const k = new R2StorageAdapter(r2Config, 'workspaces/acme-corp/knowledge')
+await writeKnowledgeIndex({ adapter: k, base: 'hr-policies' })
+await writeKnowledgeIndex({ adapter: k, base: 'sales-playbook' })
+```
+**Configure the engine** to enable the tools (off by default):
+```ts
+const engine = initEngine({
+  storage: { provider: 'r2', /* … */ },
+  knowledge: {
+    enabled: true,           // engine-level capability flag
+    maxSearchResults: 5,     // top-K per SearchKnowledge call
+    maxReadBytes: 10_000,    // ReadKnowledge truncation cap
+  },
+})
+```
+**Per-run scoping.** Folders are runtime-only — pass them via
+`RunOptions.knowledge.folders`. Sub-paths inside a base work too
+(e.g., `'sales-playbook/q1'` only sees Q1 content):
+```ts
+await engine.run({
+  task: 'What is our 401k match rate?',
+  knowledge: {
+    folders: ['hr-policies', 'sales-playbook/q1'],
+    external: [
+      // External file links — fetched on demand, never indexed.
+      // `headers` are runtime-only and NEVER persist anywhere.
+      {
+        name: 'product-catalog',
+        description: 'Product catalog CSV with unit pricing',
+        url: 'https://api.acme.example/catalog.csv',
+        format: 'csv',
+        headers: { Authorization: 'Bearer sk_real_token' },
+      },
+    ],
+  },
+})
+```
+The model then calls:
+- `SearchKnowledge({ query: '401k matching' })` → top-K ranked
+  snippets, each with a `ref` like `hr-policies/handbook.md#benefits`
+- `ReadKnowledge({ ref: 'hr-policies/handbook.md#benefits' })` →
+  full body of that section
+- `ReadKnowledge({ ref: 'ext:product-catalog' })` → fetches the
+  registered URL with its headers, runs the `csv` extractor, returns
+  text
+**Format support.** Native: `md`, `txt`, `json`, `csv`, `html` (script/
+style stripped, entities decoded, whitespace collapsed). Optional:
+`pdf` (via `pdf-parse`) and `docx` (via `mammoth`). Both have
+`requiresNode: true` — on Workers without those packages installed,
+they return a structured `ERR_KNOWLEDGE_FORMAT_UNSUPPORTED` error.
+**Path safety.** All folder + ref strings flow through one validator
+in `src/knowledge/scope.ts` that rejects absolute paths, traversal
+(`..`), unsafe characters, and out-of-base file refs. A dedicated
+test (`scope.test.ts`) pins the behaviour — every weakening would
+open a tenant-boundary hole.
+**External link headers — non-persistence guarantee.** External
+`headers` live entirely inside the tool factory closure and on the
+`init.headers` of one `fetch` call per request. They never reach the
+LLM, the transcript, `state.json`, snapshots, or any storage write.
+A sentinel-based test suite (`externalLinkSecrets.test.ts`) and the
+live R2 test (`scripts/workflows/w20-knowledge-r2.mjs`) verify this:
+the live test seeds a known sentinel into the `Authorization` header,
+runs against real R2, and reads every transcript shard back from the
+bucket asserting zero leaks.
+**Composing with offload.** If a `ReadKnowledge` result exceeds your
+`compaction.toolResultOffload.thresholdBytes`, the offload pipeline
+takes over — the body is written under `toolResults/`, the model
+sees a summary + ref, and `FetchData` rehydrates on demand. The two
+features compose without any extra wiring.
+**Disabling.** `tools.disabled: ['SearchKnowledge', 'ReadKnowledge']`
+turns the tools off even when knowledge is enabled. Absent
+`config.knowledge.enabled` → no adapter built, no tools registered,
+no prompt mention.
+#### Codebase layout
+The knowledge subsystem is small and self-contained. If you need to
+extend it (new format extractor, custom scorer, alternative index
+schema), these are the files involved:
+```
+src/
+├── knowledge/                          # subsystem (self-contained)
+│   ├── types.ts                        # V1-suffixed public types (KnowledgeFolderRefV1, KnowledgeExternalLinkV1, KnowledgeFormatV1, ResolvedKnowledgeConfigV1, RunKnowledgeOptionsV1, SectionEntryV1, KnowledgeIndexV1, …)
+│   ├── scope.ts                        # parseFolderRef / parseKnowledgeRef / refInScope — load-bearing path safety
+│   ├── tokenize.ts                     # tokenize() + scoreOverlap() — deterministic, no LLM
+│   ├── indexer.ts                      # buildKnowledgeIndex() / writeKnowledgeIndex() — section split + wiki-link extraction
+│   └── extractors.ts                   # getExtractor(format) — md/txt/json/csv/html native; pdf/docx lazy-import
+│
+├── tools/
+│   ├── searchKnowledge.ts              # createSearchKnowledgeTool() — token-overlap ranked snippets
+│   └── readKnowledge.ts                # createReadKnowledgeTool() — section / file / ext: ref dispatch
+│
+├── storage/
+│   ├── interface.ts                    # adds optional `EngineStorage.knowledge?`
+│   └── factory.ts                      # builds the knowledge adapter at workspaces/{ws}/knowledge/ when enabled
+│
+├── config/
+│   ├── types.ts                        # ResolvedConfig.knowledge?: ResolvedKnowledgeConfigV1
+│   ├── schema.ts                       # KnowledgeConfigResolved zod schema (scalars only — no folders/headers)
+│   └── merge.ts                        # KNOWLEDGE_DEFAULTS + fillKnowledgeDefaults()
+│
+├── engine/
+│   ├── engine.ts                       # resolveKnowledgeRuntime() + buildToolRegistry knowledge wire-up
+│   └── types.ts                        # adds `knowledge?: RunKnowledgeOptionsV1` to RunOptions/ResumeOptions
+│
+└── index.ts                            # public exports: writeKnowledgeIndex, buildKnowledgeIndex,
+                                        # createSearchKnowledgeTool, createReadKnowledgeTool, getExtractor,
+                                        # KnowledgeFormatV1, KnowledgeIndexV1, RunKnowledgeOptionsV1, …
+test/
+├── unit/
+│   ├── knowledge/
+│   │   ├── tokenize.test.ts            # 15 tests — stop-words, dedup, scoring
+│   │   ├── indexer.test.ts             # 16 tests — section split, wiki-links, recursion
+│   │   ├── scope.test.ts               # path-safety pin (every weakening = tenant-boundary hole)
+│   │   ├── extractors.test.ts          # 17 tests — all formats including pdf/docx fallbacks
+│   │   └── externalLinkSecrets.test.ts # 7 sentinel-based non-persistence tests
+│   ├── tools/
+│   │   ├── searchKnowledge.test.ts     # caching, multi-base, sub-path, cap, factory rejection
+│   │   └── readKnowledge.test.ts       # 18 tests — all three ref kinds + error paths
+│   └── config/
+│       └── knowledgeSchema.test.ts     # 13 tests — defaults, partials, header rejection
+│
+└── integration/engine/
+    ├── knowledgeE2E.test.ts            # 6 scenarios — registration, round-trip, disabled, sub-path, subagent inheritance
+    ├── knowledgeWithOffload.test.ts    # large ReadKnowledge → offload blob + clean transcript
+    └── knowledgeMultiBase.test.ts      # multi-base ranking, base-prefixed refs, indexed + external mix
+scripts/workflows/
+├── w20-knowledge-r2.mjs                # live R2 — vault search/read + external link
+└── w21-external-files-knowledge.mjs    # live R2 — md/json/csv/html external file round-trip
+```
+`src/knowledge/` is the only directory you need to touch to add a
+new format. Append a new `KnowledgeExtractorV1` to `extractors.ts`
+and add the type to `KnowledgeFormatV1` in `types.ts` — everything
+else is dispatched off `getExtractor(format)`.
 ### Sync vs. async — when to use which
 | Scenario | Use |
@@ -1266,6 +1559,7 @@ All features ported 1:1 from La-Machina's production runtime. Pure JS, Workers-c
 - [x] 22 built-in tools
 - [x] Custom tool registration via `defineTool()`
 - [x] Device path blocking (/dev/zero, /dev/random, /proc/kcore)
+- [x] Knowledge base (`SearchKnowledge` + `ReadKnowledge`) — opt-in, per-tenant vault under `workspaces/{ws}/knowledge/`, section-level indexing, format extractors (md/txt/json/csv/html native; pdf/docx via optional deps), external link headers with non-persistence guarantee
 ### Agent Hierarchy
 - [x] Subagent spawning with depth tracking (SubagentRegistry)
@@ -1292,7 +1586,7 @@ All features ported 1:1 from La-Machina's production runtime. Pure JS, Workers-c
 - [x] Workers compatibility (zero top-level node: imports)
 ### Testing
-- [x] 960 tests across 86 files
+- [x] 1553 tests across 142 files
 - [x] 10 live workflow tests (W1-W10) against OpenRouter + R2
 - [x] Coverage: 81% lines, 85% branches, 91% functions
 - [x] CI pipeline (lint + typecheck + test + coverage gates)
@@ -1376,7 +1670,7 @@ Features intentionally not ported — either Anthropic-only, CLI-specific, or de
 ```bash
 npm install
 npm run build          # tsup → dist/ (ESM + CJS + .d.ts)
-npm test               # 1214 tests (~12s with bun)
+npm test               # 1553 tests (~30s with vitest)
 npm run test:watch     # watch mode
 npm run test:coverage  # with coverage gates
 npm run typecheck      # TypeScript strict
@@ -1415,11 +1709,12 @@ publish permission on `la-machina-engine` and "Bypass 2FA" enabled.
 | Category | Files | Tests |
 |----------|-------|-------|
-| Unit | 70+ | ~870 |
-| Integration | 15+ | ~130 |
+| Unit | 113+ | ~1200 |
+| Integration | 15+ | ~150 |
 | E2E | 5 | ~30 |
 | Coverage additions | 20+ | ~130 |
-| **Total** | **115+** | **1214** (current `bun test` count; 8 pre-existing Bun timer failures unrelated) |
+| Knowledge (Plan 023) | 11 | ~85 |
+| **Total** | **142** | **1553** (current `vitest run`; 8 skipped pre-existing) |
 ### Live Workflow Tests
@@ -1439,6 +1734,10 @@ publish permission on `la-machina-engine` and "Bypass 2FA" enabled.
 | W12 | Multi-agent + MCP + skills + HITL (parent gates child's publish) | 4 |
 | W13 | Per-run skill override (inline body + URL + fetch cache) | 4 |
 | W14 | MCP auth refresh + sampling round-trip (stdio + http) | n/a (integration) |
+| W17 | Tool-result offload (FetchData rehydrate) | 4 |
+| W19 | Kitchen-sink R2 (subagent + HITL + ApiCall + offload + skills + memory + hooks + webhook) | 12 |
+| W20 | Knowledge base on R2 (vault search/read + external link + bearer non-persistence) | 8 |
+| W21 | External-file knowledge on R2 — md/json/csv/html round-trip + 401 bounds | 5 |
 ---