npm - la-machina-engine - Versions diffs - 0.3.0 → 0.4.0 - Mend

la-machina-engine 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -920,6 +920,92 @@ bunx wrangler dev --local
 Everything else (state.json, webhooks, polling, resume, recovery) works unchanged.
+### External APIs — the `ApiCall` built-in
+When you configure one or more services via `config.api`, the engine
+auto-registers an `ApiCall` tool that lets the model call tenant-scoped
+external HTTP APIs without ever seeing credentials. The model picks
+a service name from a closed enum; the engine injects auth from
+your `env` map (or a `resolveAuth` callback for dynamic schemes).
+```ts
+const engine = initEngine({
+  // …model, storage, etc.
+  api: {
+    services: [
+      {
+        name: 'widgets',
+        baseUrl: 'https://api.acme.example/v1',
+        auth: { type: 'bearer', tokenRef: 'widgets:token' },
+        allowedPaths: [/^\/widgets(\/\d+)?$/],   // optional safety rail
+      },
+    ],
+    env: { 'widgets:token': 'sk_real_token' },    // loaded from your vault
+  },
+})
+```
+The model sees `ApiCall` with a `service` enum locked to your list.
+When it calls `ApiCall({ service: 'widgets', method: 'POST',
+path: '/widgets', body: {...} })`, the engine resolves the bearer
+token from `env`, attaches it as `Authorization: Bearer ...`, and
+fetches. The token never enters the model's context, the transcript,
+state.json, logs, or any response field — a dedicated test suite
+(`apiCallSecretIsolation.test.ts`) enforces this.
+**Multi-tenant SaaS** — pass per-tenant services via `RunOptions.api`
+instead of `config.api`, so one engine instance serves many tenants:
+```ts
+await engine.run({
+  task: '...',
+  api: {
+    services: tenantServices,
+    env: tenantEnv,
+  },
+})
+```
+**Auth types** (the first four are zero-code, the last is the escape hatch):
+| Type | Shape | Header produced | Use for |
+|---|---|---|---|
+| `none` | `{ type: 'none' }` | — | Public APIs |
+| `bearer` | `{ type: 'bearer', tokenRef }` | `Authorization: Bearer <env[tokenRef]>` | OpenAI, GitHub PAT, Airtable |
+| `header` | `{ type: 'header', name, valueRef }` | `<name>: <env[valueRef]>` | SendGrid (`X-API-Key`), any single-header API |
+| `basic` | `{ type: 'basic', userRef, passRef }` | `Authorization: Basic <base64(user:pass)>` | Twilio, Bitbucket |
+| `custom` | `{ type: 'custom', id }` | Whatever `resolveAuth` returns | OAuth refresh, HMAC signing, JWT minting |
+For `custom`, supply `resolveAuth(auth, ctx)`: an async function the
+engine calls per dispatch. The `ctx` carries `serviceName`, `method`,
+`path` so HMAC-style schemes can sign the request context.
+```ts
+api: {
+  services: [{ name: 'gdrive', baseUrl: '...', auth: { type: 'custom', id: 'oauth:google' } }],
+  resolveAuth: async (auth, ctx) => {
+    if (auth.type === 'custom' && auth.id === 'oauth:google') {
+      const token = await oauthCache.getFreshAccessToken(tenantId)
+      return { Authorization: `Bearer ${token}` }
+    }
+    return {}
+  },
+}
+```
+**Safety rails** enforced per-call: service enum lockdown, per-service
+`allowedPaths` + `allowedMethods`, `maxBodyBytes` cap,
+`maxResponseBytes` cap, case-insensitive auth-header sanitizer (the
+model cannot spoof `Authorization` via `input.headers`).
+**Observability:** `onRequest` / `onResponse` hooks fire around each
+dispatch with `{ service, method, path, status, latencyMs, bytesIn }`
+— no secrets — for metering, billing, audit logs.
+**Disabling:** `tools.disabled: ['ApiCall']` turns it off even when
+services are configured. Absent `config.api` → tool never registered,
+no prompt mention.
 ### Sync vs. async — when to use which
 | Scenario | Use |
@@ -929,8 +1015,109 @@ Everything else (state.json, webhooks, polling, resume, recovery) works unchange
 | Long task, client can't block | `engine.start()` + `getStatus` / `waitFor` |
 | HITL in a web app (user closes tab) | `engine.start()` + webhook on `paused` |
 | Cloudflare Workers (any non-trivial run) | `storage.provider: 'r2-binding'` + DO + `preferBindingTransport` |
+| Worker needs Bash / stdio MCP | Async + `config.runner` → handoff to Node (see below) |
 | Server crash recovery | `engine.recoverOrphanedRuns()` on startup |
+### Runner contract — Node-only tools on Workers
+Cloudflare Workers can't spawn processes, which means no Bash, no
+stdio-based MCP, no ripgrep. When the engine detects this, it
+replaces each such tool with a **capability stub** — same name, same
+description (so the model still sees it in the catalogue), but calling
+the stub returns `isError: true` with a structured message.
+You have two options when a Worker run needs those tools:
+- **Sync run** (`engine.run()`) — the stub executes, the model
+  adapts its answer ("I couldn't run Bash in this environment, so
+  here's what I can tell you…"), and the run completes `status: 'done'`
+  with `meta.capabilitiesMissing: ['Bash', …]` so callers can detect
+  missing capabilities and decide whether to retry elsewhere.
+- **Async run** (`engine.start()` / `engine.resumeAsync()`) with
+  `config.runner` set — the engine intercepts the stub call, pauses
+  with reason `'handoff_to_runner'`, and POSTs `{ runId }` to your
+  runner. The runner is a separate Node process that reads the
+  snapshot from the same R2 bucket, resumes with real tools
+  registered, and writes the final state back. Worker's
+  `engine.waitFor(runId)` returns `'done'` once the runner finishes.
+The engine ships **no runner package** — you build yours against the
+HTTP contract below. A ~100-line reference implementation you can
+fork lives at [`examples/runner-node/`](examples/runner-node/).
+#### Configuring the Worker side
+```ts
+const engine = initEngine({
+  // …storage, model, etc.
+  runner: {
+    url: 'https://runner.tenant-a.internal/continue',
+    secret: process.env.RUNNER_SECRET,   // shared with the runner
+  },
+})
+```
+Leave `runner` unset to disable handoff entirely — stubbed tools then
+fall back to the sync-style graceful degradation even on async runs.
+#### The HTTP contract
+A runner must implement two endpoints:
+**`POST /continue`** — called by the engine when an async run hits a
+Node-only tool.
+```
+Headers:
+  Authorization: Bearer <secret>   # MUST match config.runner.secret
+  Content-Type: application/json
+Body:
+  { "runId": string }
+Response:
+  202 Accepted   — runner accepted, will process in background
+  401 Unauthorized — bad bearer
+  400 Bad Request  — missing / malformed runId
+```
+Behavior:
+1. Verify the bearer token.
+2. Call `engine.resumeAsync({ runId })` on a runner-side engine
+   configured with:
+   - The **same R2 bucket + rootPath + workspaceId** as the Worker
+   - The real Node-only tools registered (Bash, stdio MCP, etc.)
+   - The same LLM provider config
+   - **No `config.runner`** (the runner doesn't hand off further)
+3. Return 202 immediately; the engine's own background executor
+   finishes the run and writes state back to R2.
+**`GET /health`** — returns 200 when the runner accepts `/continue`.
+#### POST failures
+If the runner POST throws (network error) or returns non-2xx, the
+engine flips the run to `status: 'failed'` with error code
+`ERR_RUNNER_UNREACHABLE` before finalizing — callers never see a
+silent hang. Rotate the bearer secret by updating both ends and
+redeploying; in-flight runs during the rotation fail with
+`ERR_RUNNER_UNREACHABLE` and can be retried.
+#### Per-tenant isolation
+Deployment concern, not engine concern. Run **one runner process per
+tenant** when secrets must not be shared or when tenants need resource
+isolation. Each tenant's Worker points at its matching runner URL;
+the engine doesn't know or care about the topology.
+#### What's deferred
+Tool-level proxying (hopping to Node for a single tool call and back),
+multi-runner failover, runner → Worker sampling, and replay-protected
+signatures are intentionally out of scope for v1. See
+[`plans/019-runner-pattern-per-tenant.md`](plans/019-runner-pattern-per-tenant.md)
+for the full deferred list + triggers.
 ---
 ## Agent Hierarchy