la-machina-engine 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -920,6 +920,92 @@ bunx wrangler dev --local
920
920
 
921
921
  Everything else (state.json, webhooks, polling, resume, recovery) works unchanged.
922
922
 
923
+ ### External APIs — the `ApiCall` built-in
924
+
925
+ When you configure one or more services via `config.api`, the engine
926
+ auto-registers an `ApiCall` tool that lets the model call tenant-scoped
927
+ external HTTP APIs without ever seeing credentials. The model picks
928
+ a service name from a closed enum; the engine injects auth from
929
+ your `env` map (or a `resolveAuth` callback for dynamic schemes).
930
+
931
+ ```ts
932
+ const engine = initEngine({
933
+ // …model, storage, etc.
934
+ api: {
935
+ services: [
936
+ {
937
+ name: 'widgets',
938
+ baseUrl: 'https://api.acme.example/v1',
939
+ auth: { type: 'bearer', tokenRef: 'widgets:token' },
940
+ allowedPaths: [/^\/widgets(\/\d+)?$/], // optional safety rail
941
+ },
942
+ ],
943
+ env: { 'widgets:token': 'sk_real_token' }, // loaded from your vault
944
+ },
945
+ })
946
+ ```
947
+
948
+ The model sees `ApiCall` with a `service` enum locked to your list.
949
+ When it calls `ApiCall({ service: 'widgets', method: 'POST',
950
+ path: '/widgets', body: {...} })`, the engine resolves the bearer
951
+ token from `env`, attaches it as `Authorization: Bearer ...`, and
952
+ fetches. The token never enters the model's context, the transcript,
953
+ state.json, logs, or any response field — a dedicated test suite
954
+ (`apiCallSecretIsolation.test.ts`) enforces this.
955
+
956
+ **Multi-tenant SaaS** — pass per-tenant services via `RunOptions.api`
957
+ instead of `config.api`, so one engine instance serves many tenants:
958
+
959
+ ```ts
960
+ await engine.run({
961
+ task: '...',
962
+ api: {
963
+ services: tenantServices,
964
+ env: tenantEnv,
965
+ },
966
+ })
967
+ ```
968
+
969
+ **Auth types** (the first four are zero-code, the last is the escape hatch):
970
+
971
+ | Type | Shape | Header produced | Use for |
972
+ |---|---|---|---|
973
+ | `none` | `{ type: 'none' }` | — | Public APIs |
974
+ | `bearer` | `{ type: 'bearer', tokenRef }` | `Authorization: Bearer <env[tokenRef]>` | OpenAI, GitHub PAT, Airtable |
975
+ | `header` | `{ type: 'header', name, valueRef }` | `<name>: <env[valueRef]>` | SendGrid (`X-API-Key`), any single-header API |
976
+ | `basic` | `{ type: 'basic', userRef, passRef }` | `Authorization: Basic <base64(user:pass)>` | Twilio, Bitbucket |
977
+ | `custom` | `{ type: 'custom', id }` | Whatever `resolveAuth` returns | OAuth refresh, HMAC signing, JWT minting |
978
+
979
+ For `custom`, supply `resolveAuth(auth, ctx)`: an async function the
980
+ engine calls per dispatch. The `ctx` carries `serviceName`, `method`,
981
+ `path` so HMAC-style schemes can sign the request context.
982
+
983
+ ```ts
984
+ api: {
985
+ services: [{ name: 'gdrive', baseUrl: '...', auth: { type: 'custom', id: 'oauth:google' } }],
986
+ resolveAuth: async (auth, ctx) => {
987
+ if (auth.type === 'custom' && auth.id === 'oauth:google') {
988
+ const token = await oauthCache.getFreshAccessToken(tenantId)
989
+ return { Authorization: `Bearer ${token}` }
990
+ }
991
+ return {}
992
+ },
993
+ }
994
+ ```
995
+
996
+ **Safety rails** enforced per-call: service enum lockdown, per-service
997
+ `allowedPaths` + `allowedMethods`, `maxBodyBytes` cap,
998
+ `maxResponseBytes` cap, case-insensitive auth-header sanitizer (the
999
+ model cannot spoof `Authorization` via `input.headers`).
1000
+
1001
+ **Observability:** `onRequest` / `onResponse` hooks fire around each
1002
+ dispatch with `{ service, method, path, status, latencyMs, bytesIn }`
1003
+ — no secrets — for metering, billing, audit logs.
1004
+
1005
+ **Disabling:** `tools.disabled: ['ApiCall']` turns it off even when
1006
+ services are configured. Absent `config.api` → tool never registered,
1007
+ no prompt mention.
1008
+
923
1009
  ### Sync vs. async — when to use which
924
1010
 
925
1011
  | Scenario | Use |
@@ -929,8 +1015,109 @@ Everything else (state.json, webhooks, polling, resume, recovery) works unchange
929
1015
  | Long task, client can't block | `engine.start()` + `getStatus` / `waitFor` |
930
1016
  | HITL in a web app (user closes tab) | `engine.start()` + webhook on `paused` |
931
1017
  | Cloudflare Workers (any non-trivial run) | `storage.provider: 'r2-binding'` + DO + `preferBindingTransport` |
1018
+ | Worker needs Bash / stdio MCP | Async + `config.runner` → handoff to Node (see below) |
932
1019
  | Server crash recovery | `engine.recoverOrphanedRuns()` on startup |
933
1020
 
1021
+ ### Runner contract — Node-only tools on Workers
1022
+
1023
+ Cloudflare Workers can't spawn processes, which means no Bash, no
1024
+ stdio-based MCP, no ripgrep. When the engine detects this, it
1025
+ replaces each such tool with a **capability stub** — same name, same
1026
+ description (so the model still sees it in the catalogue), but calling
1027
+ the stub returns `isError: true` with a structured message.
1028
+
1029
+ You have two options when a Worker run needs those tools:
1030
+
1031
+ - **Sync run** (`engine.run()`) — the stub executes, the model
1032
+ adapts its answer ("I couldn't run Bash in this environment, so
1033
+ here's what I can tell you…"), and the run completes `status: 'done'`
1034
+ with `meta.capabilitiesMissing: ['Bash', …]` so callers can detect
1035
+ missing capabilities and decide whether to retry elsewhere.
1036
+ - **Async run** (`engine.start()` / `engine.resumeAsync()`) with
1037
+ `config.runner` set — the engine intercepts the stub call, pauses
1038
+ with reason `'handoff_to_runner'`, and POSTs `{ runId }` to your
1039
+ runner. The runner is a separate Node process that reads the
1040
+ snapshot from the same R2 bucket, resumes with real tools
1041
+ registered, and writes the final state back. Worker's
1042
+ `engine.waitFor(runId)` returns `'done'` once the runner finishes.
1043
+
1044
+ The engine ships **no runner package** — you build yours against the
1045
+ HTTP contract below. A ~100-line reference implementation you can
1046
+ fork lives at [`examples/runner-node/`](examples/runner-node/).
1047
+
1048
+ #### Configuring the Worker side
1049
+
1050
+ ```ts
1051
+ const engine = initEngine({
1052
+ // …storage, model, etc.
1053
+ runner: {
1054
+ url: 'https://runner.tenant-a.internal/continue',
1055
+ secret: process.env.RUNNER_SECRET, // shared with the runner
1056
+ },
1057
+ })
1058
+ ```
1059
+
1060
+ Leave `runner` unset to disable handoff entirely — stubbed tools then
1061
+ fall back to the sync-style graceful degradation even on async runs.
1062
+
1063
+ #### The HTTP contract
1064
+
1065
+ A runner must implement two endpoints:
1066
+
1067
+ **`POST /continue`** — called by the engine when an async run hits a
1068
+ Node-only tool.
1069
+
1070
+ ```
1071
+ Headers:
1072
+ Authorization: Bearer <secret> # MUST match config.runner.secret
1073
+ Content-Type: application/json
1074
+
1075
+ Body:
1076
+ { "runId": string }
1077
+
1078
+ Response:
1079
+ 202 Accepted — runner accepted, will process in background
1080
+ 401 Unauthorized — bad bearer
1081
+ 400 Bad Request — missing / malformed runId
1082
+ ```
1083
+
1084
+ Behavior:
1085
+ 1. Verify the bearer token.
1086
+ 2. Call `engine.resumeAsync({ runId })` on a runner-side engine
1087
+ configured with:
1088
+ - The **same R2 bucket + rootPath + workspaceId** as the Worker
1089
+ - The real Node-only tools registered (Bash, stdio MCP, etc.)
1090
+ - The same LLM provider config
1091
+ - **No `config.runner`** (the runner doesn't hand off further)
1092
+ 3. Return 202 immediately; the engine's own background executor
1093
+ finishes the run and writes state back to R2.
1094
+
1095
+ **`GET /health`** — returns 200 when the runner accepts `/continue`.
1096
+
1097
+ #### POST failures
1098
+
1099
+ If the runner POST throws (network error) or returns non-2xx, the
1100
+ engine flips the run to `status: 'failed'` with error code
1101
+ `ERR_RUNNER_UNREACHABLE` before finalizing — callers never see a
1102
+ silent hang. Rotate the bearer secret by updating both ends and
1103
+ redeploying; in-flight runs during the rotation fail with
1104
+ `ERR_RUNNER_UNREACHABLE` and can be retried.
1105
+
1106
+ #### Per-tenant isolation
1107
+
1108
+ Deployment concern, not engine concern. Run **one runner process per
1109
+ tenant** when secrets must not be shared or when tenants need resource
1110
+ isolation. Each tenant's Worker points at its matching runner URL;
1111
+ the engine doesn't know or care about the topology.
1112
+
1113
+ #### What's deferred
1114
+
1115
+ Tool-level proxying (hopping to Node for a single tool call and back),
1116
+ multi-runner failover, runner → Worker sampling, and replay-protected
1117
+ signatures are intentionally out of scope for v1. See
1118
+ [`plans/019-runner-pattern-per-tenant.md`](plans/019-runner-pattern-per-tenant.md)
1119
+ for the full deferred list + triggers.
1120
+
934
1121
  ---
935
1122
 
936
1123
  ## Agent Hierarchy