la-machina-engine 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -495,7 +495,7 @@ engine.run({ runId, nodeId, task })
495
495
 
496
496
  ### Storage Adapter
497
497
 
498
- Two backends, same interface:
498
+ Three backends, same interface and the same relative layout on all of them:
499
499
 
500
500
  | Adapter | Backend | Use |
501
501
  |---------|---------|-----|
@@ -503,6 +503,33 @@ Two backends, same interface:
503
503
  | `R2StorageAdapter` | Cloudflare R2 via S3 protocol | Node / anywhere with S3 creds |
504
504
  | `R2BindingStorageAdapter` | Cloudflare R2 native binding (`env.BUCKET`) | Cloudflare Workers (`provider: 'r2-binding'`) |
505
505
 
506
+ **Path layout (identical across all three backends):**
507
+
508
+ ```
509
+ {rootPath}/workspaces/{workspaceId}/.claude/ ← tenant root
510
+ ├── memory/ ← tenant-shared, survives across runs
511
+ ├── skills/ ← (if config.skills.autoload)
512
+ └── projects/{runId}/nodes/{nodeId}/
513
+ ├── state.json, snapshot.json, 000000.jsonl, meta.json
514
+ └── subagents/{agentId}/… ← recursive, same shape
515
+ ```
516
+
517
+ `workspaces/` is a namespace guard (keeps engine data separate from
518
+ anything else in a shared bucket/filesystem); `.claude/` marks
519
+ engine-owned content. Both cost one directory level each.
520
+
521
+ **The workspace IS the tenant boundary.** One `workspaceId` per
522
+ tenant; nothing is shared across workspaces. The previous
523
+ `global` storage scope was removed in v0.5.0 — see migration note
524
+ below.
525
+
526
+ > **Migration from pre-0.5.0**: if you had data at `{rootPath}/.claude/`
527
+ > (the old global scope), move it under your workspace root:
528
+ > `mv {rootPath}/.claude {rootPath}/workspaces/{workspaceId}/.claude`.
529
+ > `config.memory.scope: 'global'` still parses but emits a
530
+ > deprecation warning and is rewritten to `'workspace'`; it'll be
531
+ > rejected outright in v1.0.0.
532
+
506
533
  ### Smart Memory
507
534
 
508
535
  Per-workspace learning across runs:
@@ -920,6 +947,92 @@ bunx wrangler dev --local
920
947
 
921
948
  Everything else (state.json, webhooks, polling, resume, recovery) works unchanged.
922
949
 
950
+ ### External APIs — the `ApiCall` built-in
951
+
952
+ When you configure one or more services via `config.api`, the engine
953
+ auto-registers an `ApiCall` tool that lets the model call tenant-scoped
954
+ external HTTP APIs without ever seeing credentials. The model picks
955
+ a service name from a closed enum; the engine injects auth from
956
+ your `env` map (or a `resolveAuth` callback for dynamic schemes).
957
+
958
+ ```ts
959
+ const engine = initEngine({
960
+ // …model, storage, etc.
961
+ api: {
962
+ services: [
963
+ {
964
+ name: 'widgets',
965
+ baseUrl: 'https://api.acme.example/v1',
966
+ auth: { type: 'bearer', tokenRef: 'widgets:token' },
967
+ allowedPaths: [/^\/widgets(\/\d+)?$/], // optional safety rail
968
+ },
969
+ ],
970
+ env: { 'widgets:token': 'sk_real_token' }, // loaded from your vault
971
+ },
972
+ })
973
+ ```
974
+
975
+ The model sees `ApiCall` with a `service` enum locked to your list.
976
+ When it calls `ApiCall({ service: 'widgets', method: 'POST',
977
+ path: '/widgets', body: {...} })`, the engine resolves the bearer
978
+ token from `env`, attaches it as `Authorization: Bearer ...`, and
979
+ fetches. The token never enters the model's context, the transcript,
980
+ state.json, logs, or any response field — a dedicated test suite
981
+ (`apiCallSecretIsolation.test.ts`) enforces this.
982
+
983
+ **Multi-tenant SaaS** — pass per-tenant services via `RunOptions.api`
984
+ instead of `config.api`, so one engine instance serves many tenants:
985
+
986
+ ```ts
987
+ await engine.run({
988
+ task: '...',
989
+ api: {
990
+ services: tenantServices,
991
+ env: tenantEnv,
992
+ },
993
+ })
994
+ ```
995
+
996
+ **Auth types** (the first four are zero-code, the last is the escape hatch):
997
+
998
+ | Type | Shape | Header produced | Use for |
999
+ |---|---|---|---|
1000
+ | `none` | `{ type: 'none' }` | — | Public APIs |
1001
+ | `bearer` | `{ type: 'bearer', tokenRef }` | `Authorization: Bearer <env[tokenRef]>` | OpenAI, GitHub PAT, Airtable |
1002
+ | `header` | `{ type: 'header', name, valueRef }` | `<name>: <env[valueRef]>` | SendGrid (`X-API-Key`), any single-header API |
1003
+ | `basic` | `{ type: 'basic', userRef, passRef }` | `Authorization: Basic <base64(user:pass)>` | Twilio, Bitbucket |
1004
+ | `custom` | `{ type: 'custom', id }` | Whatever `resolveAuth` returns | OAuth refresh, HMAC signing, JWT minting |
1005
+
1006
+ For `custom`, supply `resolveAuth(auth, ctx)`: an async function the
1007
+ engine calls per dispatch. The `ctx` carries `serviceName`, `method`,
1008
+ `path` so HMAC-style schemes can sign the request context.
1009
+
1010
+ ```ts
1011
+ api: {
1012
+ services: [{ name: 'gdrive', baseUrl: '...', auth: { type: 'custom', id: 'oauth:google' } }],
1013
+ resolveAuth: async (auth, ctx) => {
1014
+ if (auth.type === 'custom' && auth.id === 'oauth:google') {
1015
+ const token = await oauthCache.getFreshAccessToken(tenantId)
1016
+ return { Authorization: `Bearer ${token}` }
1017
+ }
1018
+ return {}
1019
+ },
1020
+ }
1021
+ ```
1022
+
1023
+ **Safety rails** enforced per-call: service enum lockdown, per-service
1024
+ `allowedPaths` + `allowedMethods`, `maxBodyBytes` cap,
1025
+ `maxResponseBytes` cap, case-insensitive auth-header sanitizer (the
1026
+ model cannot spoof `Authorization` via `input.headers`).
1027
+
1028
+ **Observability:** `onRequest` / `onResponse` hooks fire around each
1029
+ dispatch with `{ service, method, path, status, latencyMs, bytesIn }`
1030
+ — no secrets — for metering, billing, audit logs.
1031
+
1032
+ **Disabling:** `tools.disabled: ['ApiCall']` turns it off even when
1033
+ services are configured. Absent `config.api` → tool never registered,
1034
+ no prompt mention.
1035
+
923
1036
  ### Sync vs. async — when to use which
924
1037
 
925
1038
  | Scenario | Use |
@@ -929,8 +1042,109 @@ Everything else (state.json, webhooks, polling, resume, recovery) works unchange
929
1042
  | Long task, client can't block | `engine.start()` + `getStatus` / `waitFor` |
930
1043
  | HITL in a web app (user closes tab) | `engine.start()` + webhook on `paused` |
931
1044
  | Cloudflare Workers (any non-trivial run) | `storage.provider: 'r2-binding'` + DO + `preferBindingTransport` |
1045
+ | Worker needs Bash / stdio MCP | Async + `config.runner` → handoff to Node (see below) |
932
1046
  | Server crash recovery | `engine.recoverOrphanedRuns()` on startup |
933
1047
 
1048
+ ### Runner contract — Node-only tools on Workers
1049
+
1050
+ Cloudflare Workers can't spawn processes, which means no Bash, no
1051
+ stdio-based MCP, no ripgrep. When the engine detects this, it
1052
+ replaces each such tool with a **capability stub** — same name, same
1053
+ description (so the model still sees it in the catalogue), but calling
1054
+ the stub returns `isError: true` with a structured message.
1055
+
1056
+ You have two options when a Worker run needs those tools:
1057
+
1058
+ - **Sync run** (`engine.run()`) — the stub executes, the model
1059
+ adapts its answer ("I couldn't run Bash in this environment, so
1060
+ here's what I can tell you…"), and the run completes `status: 'done'`
1061
+ with `meta.capabilitiesMissing: ['Bash', …]` so callers can detect
1062
+ missing capabilities and decide whether to retry elsewhere.
1063
+ - **Async run** (`engine.start()` / `engine.resumeAsync()`) with
1064
+ `config.runner` set — the engine intercepts the stub call, pauses
1065
+ with reason `'handoff_to_runner'`, and POSTs `{ runId }` to your
1066
+ runner. The runner is a separate Node process that reads the
1067
+ snapshot from the same R2 bucket, resumes with real tools
1068
+ registered, and writes the final state back. Worker's
1069
+ `engine.waitFor(runId)` returns `'done'` once the runner finishes.
1070
+
1071
+ The engine ships **no runner package** — you build yours against the
1072
+ HTTP contract below. A ~100-line reference implementation you can
1073
+ fork lives at [`examples/runner-node/`](examples/runner-node/).
1074
+
1075
+ #### Configuring the Worker side
1076
+
1077
+ ```ts
1078
+ const engine = initEngine({
1079
+ // …storage, model, etc.
1080
+ runner: {
1081
+ url: 'https://runner.tenant-a.internal/continue',
1082
+ secret: process.env.RUNNER_SECRET, // shared with the runner
1083
+ },
1084
+ })
1085
+ ```
1086
+
1087
+ Leave `runner` unset to disable handoff entirely — stubbed tools then
1088
+ fall back to the sync-style graceful degradation even on async runs.
1089
+
1090
+ #### The HTTP contract
1091
+
1092
+ A runner must implement two endpoints:
1093
+
1094
+ **`POST /continue`** — called by the engine when an async run hits a
1095
+ Node-only tool.
1096
+
1097
+ ```
1098
+ Headers:
1099
+ Authorization: Bearer <secret> # MUST match config.runner.secret
1100
+ Content-Type: application/json
1101
+
1102
+ Body:
1103
+ { "runId": string }
1104
+
1105
+ Response:
1106
+ 202 Accepted — runner accepted, will process in background
1107
+ 401 Unauthorized — bad bearer
1108
+ 400 Bad Request — missing / malformed runId
1109
+ ```
1110
+
1111
+ Behavior:
1112
+ 1. Verify the bearer token.
1113
+ 2. Call `engine.resumeAsync({ runId })` on a runner-side engine
1114
+ configured with:
1115
+ - The **same R2 bucket + rootPath + workspaceId** as the Worker
1116
+ - The real Node-only tools registered (Bash, stdio MCP, etc.)
1117
+ - The same LLM provider config
1118
+ - **No `config.runner`** (the runner doesn't hand off further)
1119
+ 3. Return 202 immediately; the engine's own background executor
1120
+ finishes the run and writes state back to R2.
1121
+
1122
+ **`GET /health`** — returns 200 when the runner accepts `/continue`.
1123
+
1124
+ #### POST failures
1125
+
1126
+ If the runner POST throws (network error) or returns non-2xx, the
1127
+ engine flips the run to `status: 'failed'` with error code
1128
+ `ERR_RUNNER_UNREACHABLE` before finalizing — callers never see a
1129
+ silent hang. Rotate the bearer secret by updating both ends and
1130
+ redeploying; in-flight runs during the rotation fail with
1131
+ `ERR_RUNNER_UNREACHABLE` and can be retried.
1132
+
1133
+ #### Per-tenant isolation
1134
+
1135
+ Deployment concern, not engine concern. Run **one runner process per
1136
+ tenant** when secrets must not be shared or when tenants need resource
1137
+ isolation. Each tenant's Worker points at its matching runner URL;
1138
+ the engine doesn't know or care about the topology.
1139
+
1140
+ #### What's deferred
1141
+
1142
+ Tool-level proxying (hopping to Node for a single tool call and back),
1143
+ multi-runner failover, runner → Worker sampling, and replay-protected
1144
+ signatures are intentionally out of scope for v1. See
1145
+ [`plans/019-runner-pattern-per-tenant.md`](plans/019-runner-pattern-per-tenant.md)
1146
+ for the full deferred list + triggers.
1147
+
934
1148
  ---
935
1149
 
936
1150
  ## Agent Hierarchy