npm - la-machina-engine - Versions diffs - 0.3.0 - Mend

la-machina-engine 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,1239 @@
+# la-machina-engine
+[![npm version](https://img.shields.io/npm/v/la-machina-engine.svg)](https://www.npmjs.com/package/la-machina-engine)
+[![npm downloads](https://img.shields.io/npm/dm/la-machina-engine.svg)](https://www.npmjs.com/package/la-machina-engine)
+[![CI](https://github.com/zahidhasanaunto/la-machina-engine/actions/workflows/ci.yml/badge.svg)](https://github.com/zahidhasanaunto/la-machina-engine/actions/workflows/ci.yml)
+[![Publish](https://github.com/zahidhasanaunto/la-machina-engine/actions/workflows/publish.yml/badge.svg)](https://github.com/zahidhasanaunto/la-machina-engine/actions/workflows/publish.yml)
+[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+**Headless, multi-provider LLM agent engine for workflow automation.**
+A library, not a CLI. You `import` it, give it a task, and it runs a bounded agent loop — streaming an LLM, dispatching tools, spawning subagents, persisting a durable transcript — until the task is done, paused, or fails. Memory learns across runs. Runs pause mid-execution and resume later with full state. Storage is pluggable — local filesystem in dev, Cloudflare R2 in production, same code.
+Built for embedding inside a workflow orchestrator (e.g. an n8n-style DAG runner where each node needs an LLM brain). If you want a terminal chatbot, use Claude Code. If you want the brain that runs inside each node of a production workflow, use this.
+```bash
+npm install la-machina-engine
+```
+---
+## Status
+**v0.3.0 — published on npm; production-ready core, evolving feature surface.**
+- **1214** unit + integration tests pass (8 pre-existing Bun-timer failures unrelated)
+- Zero top-level `node:` imports — runs on Node.js AND Cloudflare Workers
+- 14 live workflow tests (W1–W14) verified against OpenRouter, real R2, real MCP servers
+- Pause/resume + async runs + webhooks + state.json + R2 binding storage adapter
+- MCP support: stdio, http (Streamable + Workers-safe binding transport), sse — with auth refresh + sampling
+- Skills: disk-backed default + per-run override (inline body or HTTPS url)
+- Subagent gate propagation (opt-in) — parent can pause when a child's tool is gated
+---
+## Table of Contents
+- [Design Principles](#design-principles)
+- [Install](#install)
+- [Quick Start](#quick-start)
+- [CLI Test Harness](#cli-test-harness)
+- [Core Concepts](#core-concepts)
+- [Async API (start / waitFor / webhooks)](#async-api)
+- [Multi-Provider Support](#multi-provider-support)
+- [Agent Hierarchy](#agent-hierarchy)
+- [Configuration Reference](#configuration-reference)
+- [What's Implemented](#whats-implemented)
+- [What's Deferred](#whats-deferred)
+- [Architecture](#architecture)
+- [Development](#development)
+- [License](#license)
+---
+## Design Principles
+- **Zero-config works.** `initEngine()` with no arguments runs, given `ANTHROPIC_API_KEY` in the environment.
+- **Every knob has a default.** No config option is required; all are overridable.
+- **Headless.** No terminal UI, no React, no Ink. Plain Node/Workers library.
+- **Cloud-native.** Storage is pluggable — local filesystem in dev, Cloudflare R2 in production.
+- **Pausable.** Runs can suspend mid-turn via a gate callback and resume later with full state.
+- **Workers-compatible.** Zero top-level `node:` imports. All platform-specific code is lazy-loaded.
+- **Multi-provider.** Anthropic native + Vercel AI SDK (OpenAI, Google, OpenRouter, 75+ providers).
+- **Error-isolated.** A misbehaving tool cannot crash the agent loop. The loop never throws.
+- **TypeScript-first.** Strict mode, Zod-inferred types, discriminated unions for all results.
+---
+## Install
+```bash
+npm install la-machina-engine
+```
+Requires Node 20+. Works in Bun, Cloudflare Workers (with R2 storage).
+The package is published as a single bundled module (~330 KB ESM, ~340 KB CJS) with full TypeScript types. Built with [tsup](https://tsup.egoist.dev/), CI publishes with [Sigstore provenance](https://docs.npmjs.com/generating-provenance-statements).
+---
+## Quick Start
+```ts
+import { initEngine } from 'la-machina-engine'
+const engine = initEngine() // uses ANTHROPIC_API_KEY from env
+const response = await engine.run({
+  nodeId: 'node_xyz',
+  task: 'Summarize the contents of data.csv and propose 3 insights.',
+  // runId optional — auto-generated as run_<uuid> if omitted
+})
+if (response.status === 'done') {
+  console.log(response.data)       // text string (or JSON object if outputFormat: 'json')
+  console.log(`runId: ${response.runId}`)
+  console.log(`${response.meta.turns} turns, ${response.meta.tokensUsed.input + response.meta.tokensUsed.output} tokens`)
+}
+```
+### With OpenRouter (multi-provider)
+```ts
+const engine = new Engine(
+  initEngine({
+    model: {
+      provider: 'proxy',
+      modelId: 'google/gemini-2.5-pro',
+      apiKey: 'sk-or-...',
+      baseURL: 'https://openrouter.ai/api',
+    },
+  }).config,
+  {
+    fetch: async (input, init = {}) => {
+      const headers = new Headers(init.headers ?? {})
+      headers.delete('x-api-key')
+      headers.set('Authorization', 'Bearer ' + apiKey)
+      return fetch(input, { ...init, headers })
+    },
+  },
+)
+```
+### With R2 cloud storage
+```ts
+const engine = initEngine({
+  storage: {
+    provider: 'r2',
+    rootPath: 'my-project',
+    workspaceId: 'production',
+    r2: {
+      bucket: 'my-bucket',
+      region: 'auto',
+      accessKeyId: '...',
+      secretAccessKey: '...',
+      endpoint: 'https://xxx.r2.cloudflarestorage.com',
+    },
+  },
+})
+```
+---
+## CLI Test Harness
+A minimal interactive CLI for testing the engine directly:
+```bash
+node cli.mjs                                    # interactive REPL
+node cli.mjs "your task here"                   # one-shot
+node cli.mjs --model anthropic/claude-sonnet-4 "task"
+```
+Multi-turn conversation maintained across prompts. Commands: `/clear` (reset), `/turns` (info).
+Env vars: `OPENROUTER_API_KEY` (required), `ENGINE_MODEL`, `ENGINE_STORAGE`, `ENGINE_MAX_TURNS`.
+---
+## Response Format
+`engine.run()` and `engine.resume()` return `EngineResponse` directly — one flat shape for every status.
+```ts
+const response = await engine.run({
+  nodeId: 'n1',
+  task: '...',
+  // runId optional — auto-generated if omitted
+})
+// response.runId, response.status, response.data, response.meta, response.errors, response.timestamp
+```
+**Single shape, every status. Client always reads `response.data` and tracks `response.runId`.**
+### Done — Text Mode (default)
+```json
+{
+  "runId": "run_abc",
+  "status": "done",
+  "data": "The analysis shows revenue grew 15% year-over-year with strongest performance in Q4.",
+  "meta": {
+    "nodeId": "analyze",
+    "turns": 5,
+    "tokensUsed": { "input": 12500, "output": 3200 },
+    "durationMs": 8500,
+    "output": "The analysis shows revenue grew 15% year-over-year with strongest performance in Q4.",
+    "transcript": { "path": "projects/run_abc/nodes/analyze", "lastShardIndex": 0 }
+  },
+  "errors": [],
+  "timestamp": 1712966400000
+}
+```
+### Done — JSON Mode with Schema
+```ts
+const result = await engine.run({
+  task: 'Fetch example.com and extract pricing tiers',
+  outputFormat: 'json',
+  outputSchema: z.object({
+    tiers: z.array(z.object({ name: z.string(), price: z.number() })),
+  }),
+})
+```
+```json
+{
+  "runId": "run_abc",
+  "status": "done",
+  "data": {
+    "tiers": [
+      { "name": "Starter", "price": 29 },
+      { "name": "Pro", "price": 99 },
+      { "name": "Enterprise", "price": 299 }
+    ]
+  },
+  "meta": {
+    "nodeId": "extract",
+    "turns": 3,
+    "tokensUsed": { "input": 8000, "output": 1500 },
+    "durationMs": 12000,
+    "output": "{\"tiers\":[{\"name\":\"Starter\",\"price\":29},{\"name\":\"Pro\",\"price\":99},{\"name\":\"Enterprise\",\"price\":299}]}",
+    "transcript": { "path": "projects/run_abc/nodes/extract", "lastShardIndex": 0 }
+  },
+  "errors": [],
+  "timestamp": 1712966400000
+}
+```
+### Done — JSON Mode, Parse Failed
+When the model doesn't return valid JSON despite instructions:
+```json
+{
+  "runId": "run_abc",
+  "status": "done",
+  "data": "Here's the pricing: Starter at $29, Pro at $99, Enterprise at $299.",
+  "meta": {
+    "nodeId": "extract",
+    "turns": 2,
+    "tokensUsed": { "input": 5000, "output": 800 },
+    "durationMs": 6000,
+    "output": "Here's the pricing: Starter at $29, Pro at $99, Enterprise at $299."
+  },
+  "errors": [],
+  "timestamp": 1712966400000
+}
+```
+`data` falls back to raw text. Client checks `typeof response.data === 'object'` to verify structured output.
+### Paused — Human Approval Needed
+`data` is the **content** the human reviews — not internal details like paths. Path info is in `meta.pendingToolCall`.
+```json
+{
+  "runId": "run_abc",
+  "status": "paused",
+  "data": "# Q4 Revenue Summary\n\nTotal revenue: $705,000...",
+  "meta": {
+    "nodeId": "write-report",
+    "turns": 3,
+    "tokensUsed": { "input": 8500, "output": 2100 },
+    "durationMs": 9200,
+    "snapshot": {
+      "version": 1,
+      "status": "paused",
+      "runId": "run_abc",
+      "nodeId": "write-report",
+      "pausedAt": "2026-04-13T14:30:00.000Z",
+      "pauseReason": "gate_required",
+      "messageCount": 6,
+      "lastShardIndex": 3,
+      "lastMessageUuid": "a1b2c3d4-...",
+      "pendingToolCall": {
+        "toolName": "Write",
+        "toolUseId": "toolu_abc123",
+        "input": { "path": "reports/q4-summary.md", "content": "# Q4 Revenue Summary..." },
+        "calledAt": "2026-04-13T14:30:00.000Z"
+      },
+      "tokensUsedSoFar": { "input": 8500, "output": 2100 },
+      "turnsUsed": 3
+    },
+    "pendingToolCall": {
+      "toolName": "Write",
+      "toolUseId": "toolu_abc123",
+      "input": { "path": "reports/q4-summary.md", "content": "..." },
+      "calledAt": "2026-04-13T14:30:00.000Z"
+    },
+    "pauseReason": "gate_required",
+    "transcript": { "path": "projects/run_abc/nodes/write-report", "lastShardIndex": 3 }
+  },
+  "errors": [],
+  "timestamp": 1712966400000
+}
+```
+**`data`** = the tool input the agent wanted to execute (what the human reviews).
+**`meta.snapshot`** = pass to `engine.resume()` to continue.
+**`meta.pendingToolCall`** = shortcut to see what tool was blocked.
+### Paused — Topic Selection (Custom Tool)
+When the agent tries to Write a JSON file with choices, `data` is the JSON content string:
+```json
+{
+  "runId": "blog-001",
+  "status": "paused",
+  "data": "{\"topics\":[{\"title\":\"AI Trade Wars\",\"angle\":\"startup impact\"},{\"title\":\"EU Migration Reform\",\"angle\":\"policy analysis\"},{\"title\":\"Climate Summit\",\"angle\":\"developing nations\"}]}",
+  "meta": {
+    "nodeId": "research",
+    "turns": 2,
+    "tokensUsed": { "input": 5000, "output": 1200 },
+    "durationMs": 7500,
+    "snapshot": { "..." },
+    "pendingToolCall": {
+      "toolName": "Write",
+      "toolUseId": "toolu_xyz",
+      "input": { "path": "topics.json", "content": "{\"topics\":[...]}" },
+      "calledAt": "2026-04-13T10:00:00.000Z"
+    },
+    "pauseReason": "gate_required"
+  },
+  "errors": [],
+  "timestamp": 1712966400000
+}
+```
+Resume with user's choice:
+```ts
+await engine.resume({
+  snapshot: response.meta.snapshot,
+  gateAnswer: 'Approved. User selected Topic 1: AI Trade Wars.',
+})
+```
+### Failed — Max Turns Exceeded
+```json
+{
+  "runId": "run_abc",
+  "status": "failed",
+  "data": null,
+  "meta": {
+    "nodeId": "complex-task",
+    "turns": 0,
+    "tokensUsed": { "input": 0, "output": 0 },
+    "durationMs": 45000,
+    "transcript": { "path": "projects/run_abc/nodes/complex-task", "lastShardIndex": 8 }
+  },
+  "errors": [
+    {
+      "code": "ERR_MAX_TURNS",
+      "message": "Run exceeded max turns"
+    }
+  ],
+  "timestamp": 1712966400000
+}
+```
+### Failed — API Error After Retries
+```json
+{
+  "runId": "run_abc",
+  "status": "failed",
+  "data": null,
+  "meta": {
+    "nodeId": "task",
+    "turns": 0,
+    "tokensUsed": { "input": 0, "output": 0 },
+    "durationMs": 12000,
+    "transcript": { "path": "projects/run_abc/nodes/task", "lastShardIndex": 0 }
+  },
+  "errors": [
+    {
+      "code": "ERR_RATE_LIMIT",
+      "message": "429 Too Many Requests"
+    }
+  ],
+  "timestamp": 1712966400000
+}
+```
+### Failed — Max Tokens Recovery Exhausted
+```json
+{
+  "runId": "run_abc",
+  "status": "failed",
+  "data": null,
+  "meta": {
+    "nodeId": "long-task",
+    "turns": 0,
+    "tokensUsed": { "input": 0, "output": 0 },
+    "durationMs": 30000,
+    "transcript": { "path": "projects/run_abc/nodes/long-task", "lastShardIndex": 5 }
+  },
+  "errors": [
+    {
+      "code": "ERR_MAX_TOKENS",
+      "message": "max_tokens recovery exhausted after 3 attempts"
+    }
+  ],
+  "timestamp": 1712966400000
+}
+```
+### Error Codes Reference
+| Code | When | Retryable? |
+|------|------|-----------|
+| `ERR_MAX_TURNS` | Run exceeded `execution.maxTurns` | No — increase maxTurns |
+| `ERR_MAX_TOKENS` | max_tokens recovery failed after 3 attempts | No — reduce task scope |
+| `ERR_RUN_TIMEOUT` | Run exceeded `execution.runTimeoutMs` | No — increase timeout |
+| `ERR_RATE_LIMIT` | 429 after retry backoff exhausted | Yes — wait and retry |
+| `ERR_API` | 500/502/503 after retries | Yes — transient |
+| `ERR_API_OVERLOADED` | 529 five consecutive times | Yes — wait longer |
+| `ERR_AUTH` | 401/403 invalid API key | No — fix credentials |
+| `ERR_CONFIG` | Invalid configuration | No — fix config |
+| `ERR_STREAM_PARSE` | Malformed API response | No — provider issue |
+| `ERR_STREAM_INCOMPLETE` | Stream ended without message_stop | Yes — transient |
+| `ERR_UNEXPECTED_STOP` | Unknown stop reason from API | No — investigate |
+| `SCHEMA_VALIDATION_FAILED` | JSON output doesn't match outputSchema | No — adjust schema or task |
+| `JSON_PARSE_FAILED` | Model didn't return valid JSON | No — adjust task |
+### Workflow Runner Integration
+```ts
+// Step 1: Run — runId optional, auto-generated if omitted
+const response = await engine.run({
+  nodeId: 'extract',
+  task: 'Fetch pricing from example.com',
+  outputFormat: 'json',
+  outputSchema: pricingSchema,
+})
+switch (response.status) {
+  case 'done':
+    // data is already typed/validated per outputSchema (JSON mode)
+    // or a plain text string (text mode)
+    passToNextNode(response.data)
+    break
+  case 'paused':
+    // Client only needs to remember the runId
+    saveToApprovalQueue({
+      runId: response.runId,
+      pendingAction: response.meta.pendingToolCall?.toolName,
+      data: response.data,  // what to show the human
+    })
+    notifyHuman('Approval needed')
+    break
+  case 'failed':
+    logErrors(response.errors)            // [{ code, message }]
+    retryOrEscalate(response)
+    break
+}
+// Step 2: Later, resume — just pass the runId
+const resumed = await engine.resume({
+  runId: response.runId,
+  gateAnswer: 'Approved by manager',
+})
+```
+---
+## Core Concepts
+### The Run Lifecycle
+```
+engine.run({ runId, nodeId, task })
+  │
+  ├─ Build: storage, client, tools, memory, prompt, transcript
+  ├─ preRun hook
+  ├─ agentLoop:
+  │   while (!done) {
+  │     normalize messages (strip blocks, ensure alternation, tool pairing)
+  │     API streamMessage (with reactive recovery on max_tokens/413)
+  │     collect text + thinking + tool_use blocks
+  │     dispatch tools via StreamingToolExecutor (parallel safe, serial unsafe)
+  │     truncate results > 100K chars
+  │     postTurn + stopHooks (can prevent continuation)
+  │   }
+  ├─ postRun hook (always fires)
+  └─ return: done | paused | failed
+```
+### Storage Adapter
+Two backends, same interface:
+| Adapter | Backend | Use |
+|---------|---------|-----|
+| `LocalStorageAdapter` | `node:fs/promises` (lazy import) | Dev, tests |
+| `R2StorageAdapter` | Cloudflare R2 via S3 protocol | Node / anywhere with S3 creds |
+| `R2BindingStorageAdapter` | Cloudflare R2 native binding (`env.BUCKET`) | Cloudflare Workers (`provider: 'r2-binding'`) |
+### Smart Memory
+Per-workspace learning across runs:
+- **Profile** — agent identity
+- **Rules** — behavioral constraints (always/never/when)
+- **Lessons** — facts learned from prior runs (token-budgeted)
+- **Episodes** — session-level observations (JSONL per session)
+Modes: `off` (stateless), `read-only` (recall only), `read-write` (self-improving).
+### Skills
+Markdown docs the model can pull on demand via the `SkillPage` tool. Two resolution modes, both drive the same runtime contract:
+**1. Disk-backed (default)** — one directory per skill:
+```
+{storage-root}/workspaces/{ws}/.claude/skills/
+├── memo-style/
+│   ├── SKILL.md          ← required — name + description + body
+│   └── pages/
+│       └── examples.md   ← optional multi-page skill
+└── brand-voice/
+    └── SKILL.md
+```
+Enable via `config.skills.autoload: true`. The engine lists directory entries at run start, emits `name + description` into the system prompt, and lazy-loads bodies when the model calls `SkillPage`.
+**2. Per-run override** — bind a specific skill bundle to one `engine.run()` / `engine.resumeAsync()` call without touching storage:
+```ts
+await engine.run({
+  runId, nodeId, task,
+  skills: [
+    {
+      name: 'memo-style',
+      description: 'Internal memo format.',
+      body: '# memo-style\n\n## TL;DR\n...',   // inline = zero-latency
+    },
+    {
+      name: 'brand-voice',
+      description: 'Company tone and voice.',
+      url: 'https://cdn.acme.com/skills/brand-voice/v3/SKILL.md',   // lazy fetch, cached per run
+      headers: { Authorization: 'Bearer ...' },
+      pages: {
+        examples: { url: 'https://cdn.acme.com/skills/brand-voice/v3/examples.md' },
+      },
+    },
+  ],
+})
+```
+Override **replaces** disk discovery for that run — the model sees exactly the skills you list, nothing from `config.skills.path`. Useful in per-node workflow engines where each node needs a different bundle.
+Security: set `config.skills.allowedHosts` (e.g. `['cdn.acme.com']`) to restrict URL fetches. Undefined = open (dev default). Requests outside the allowlist throw before hitting the network.
+Caching: within one run, each URL is fetched at most once — subsequent `SkillPage` calls for the same skill/page are served from memory. Cache is per `InlineSkillSource` instance, so a fresh `engine.run()` always re-reads.
+### Tools (22 built-in)
+| Tool | Safe? | Description |
+|------|-------|-------------|
+| Bash | No | Shell execution via `/bin/sh -c` (Node.js only) |
+| Read | Yes | File read with line numbers, PDF, images |
+| Write | No | Atomic file write |
+| Edit | No | String replacement with uniqueness check |
+| Glob | Yes | File pattern matching |
+| Grep | Yes | Regex search (ripgrep when available, JS fallback) |
+| WebFetch | Yes | HTTP fetch with HTML-to-text |
+| WebSearch | Yes | DuckDuckGo web search |
+| Agent | No | Spawn subagent (depth-bounded) |
+| SendMessage | No | Inter-agent communication |
+| Sleep | Yes | Delay for rate limiting |
+| ToolSearch | Yes | Search registered tools |
+| Memorize | No | Write to smart memory |
+| Recall | Yes | Read from smart memory |
+| TaskCreate/Get/List/Update | Mixed | Task tracking |
+| NotebookEdit | No | Jupyter notebook editing |
+| ListMcpResources | Yes | MCP resource browsing |
+| ReadMcpResource | Yes | MCP resource reading |
+| SkillPage | Yes | Lazy skill page loading |
+"Safe" = `isConcurrencySafe` — safe tools run in parallel via the StreamingToolExecutor.
+### Hooks (8 slots)
+| Hook | When | Can block? |
+|------|------|-----------|
+| `preRun` | Before agent loop starts | No |
+| `postRun` | After run completes (always fires) | No |
+| `preTurn` | Before each API call | No |
+| `postTurn` | After each tool dispatch | No |
+| `preToolCall` | Before each tool execution | No |
+| `postToolCall` | After each tool execution | No |
+| `gateBeforeTool` | Before tool dispatch — can pause the run | Yes (pause) |
+| `stopHooks` | After each turn — can stop the run | Yes (stop) |
+---
+## Async API
+`engine.run()` and `engine.resume()` are **synchronous** — they block until the run reaches a terminal state (`done` | `paused` | `failed`). For long-running work (multi-minute tasks, HITL workflows with human wait time, Cloudflare Workers / Durable Object hosts), the engine ships a parallel **async API**.
+The async API is **additive**: sync calls still work exactly as before. Async just adds dispatch, polling, webhooks, and durable state.
+### Methods
+| Method | Purpose |
+|---|---|
+| `engine.start(opts)` | Schedule a run in the background. Returns `{ runId, nodeId, status }` immediately. |
+| `engine.resumeAsync(opts)` | Async version of `resume()`. Same options + optional `webhook`. |
+| `engine.getStatus(runId, nodeId?)` | Read current state. Returns `EngineResponse` (provisional while running, final when terminal). |
+| `engine.waitFor(runId, opts?)` | Poll until terminal. Returns the final `EngineResponse`. Respects `timeoutMs`. |
+| `engine.cancelRun(runId, nodeId?)` | Abort a running run. Marks state as `cancelled`. |
+| `engine.retryWebhook(runId, deliveryId)` | Re-fire a past webhook delivery (useful after downstream downtime). |
+| `engine.recoverOrphanedRuns({ staleThresholdMs })` | Scan `state.json` files on startup and mark stale-heartbeat runs as `failed`. |
+### state.json — durable per-run state
+Every async run writes a `state.json` file alongside the transcript:
+```
+projects/{runId}/nodes/{nodeId}/
+├── 000000.jsonl       # transcript shards
+├── meta.json          # transcript metadata
+├── snapshot.json      # pause snapshot (if paused)
+└── state.json         # async run state + full response
+```
+Shape:
+```ts
+{
+  version: 1,
+  runId: 'run_abc',
+  nodeId: 'node_1',
+  status: 'queued' | 'running' | 'paused' | 'done' | 'failed' | 'cancelled' | 'not_found',
+  startedAt: 1700000000000,
+  lastHeartbeat: 1700000012345,
+  progress: {
+    turns: number,                   // advances as the agent loop runs
+    tokensUsed: { input, output },   // cumulative across turns
+    currentActivity:                 // what the loop is doing RIGHT NOW
+      'idle' | 'streaming' | 'tool_dispatch' | 'compacting',
+    lastTool?: string,               // set when currentActivity === 'tool_dispatch'
+  },
+  response: EngineResponse | null,   // populated on terminal; same shape as sync run()
+  webhook?: {
+    url, events, secret?, headers?,
+    deliveries: [{ id, event, attempt, status, httpCode?, error? }, ...]
+  }
+}
+```
+`getStatus()` reads this file and returns:
+- the embedded `response` once the run is terminal,
+- a provisional snapshot with real-time `progress` fields while work is in flight,
+- `status: 'not_found'` with `errors[0].code === 'NOT_FOUND'` if no state file exists.
+**Heartbeat**: the agent loop updates `progress` at each turn boundary (streaming start, tool dispatch, turn end). Writes are throttled to at most one per 500ms AND only when activity changes, so R2 costs stay predictable even on long runs.
+### Webhooks
+Pass a `webhook` object to `start()` / `resumeAsync()` and the engine will POST the final `EngineResponse` to your URL on the configured events.
+```ts
+await engine.start({
+  runId: 'run_abc',
+  nodeId: 'node_1',
+  task: 'long task',
+  webhook: {
+    url: 'https://your-app.com/hooks/la-machina',
+    secret: 'shared-hmac-secret',           // optional — enables X-LaMachina-Signature
+    events: ['paused', 'done', 'failed'],  // default: all three
+    headers: { 'X-Tenant': 'acme' },       // optional — passed through
+  },
+})
+```
+**Request headers:**
+| Header | Value |
+|---|---|
+| `Content-Type` | `application/json` |
+| `X-LaMachina-Event` | `status.paused` \| `status.done` \| `status.failed` |
+| `X-LaMachina-RunId` | Run ID from your `start()` call |
+| `X-LaMachina-Delivery` | Unique UUID per delivery attempt |
+| `X-LaMachina-Timestamp` | Unix ms (used in HMAC input) |
+| `X-LaMachina-Signature` | `sha256=<hex>` — HMAC over `${timestamp}.${body}` (only if `secret` set) |
+**Retry schedule** (exponential-ish):
+```
+attempt 1: immediate
+attempt 2: +10s
+attempt 3: +60s
+attempt 4: +5min
+attempt 5: +30min
+then give up
+```
+Retry decisions:
+| HTTP | Retry? |
+|---|---|
+| 2xx | No (delivered) |
+| 408 Request Timeout | Yes |
+| 429 Rate Limited | Yes |
+| 5xx | Yes |
+| 410 Gone | **No** (permanent — resource removed) |
+| Other 4xx | No (client bug — don't retry) |
+| Network error / timeout | Yes |
+Every attempt is appended to `state.webhook.deliveries[]` for audit.
+### Node.js example — sync HITL and async HITL together
+```ts
+import { initEngine, Engine } from 'la-machina-engine'
+const { config } = initEngine({
+  model: { provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY },
+  storage: { provider: 'r2', rootPath: 'tenants/acme', workspaceId: 'default', r2: { ... } },
+  hooks: {
+    gateBeforeTool: (toolName) =>
+      toolName === 'Write' ? { allow: false, reason: 'human approval' } : { allow: true },
+  },
+})
+const engine = new Engine(config)
+// Async run with webhook — returns immediately
+const { runId } = await engine.start({
+  runId: 'run_' + Date.now(),
+  nodeId: 'n1',
+  task: 'Refactor the config module.',
+  webhook: { url: 'https://app.example.com/hooks', secret: process.env.HOOK_SECRET },
+})
+// Later: client polls
+const current = await engine.getStatus(runId, 'n1')
+if (current.status === 'paused') {
+  // Human approves → resume (with no gate) asynchronously
+  await engine.resumeAsync({
+    runId,
+    nodeId: 'n1',
+    snapshot: current.meta.snapshot,
+    webhook: { url: 'https://app.example.com/hooks', secret: process.env.HOOK_SECRET },
+  })
+  const final = await engine.waitFor(runId, { nodeId: 'n1', timeoutMs: 300_000 })
+  console.log('final:', final.status, final.data)
+}
+// Startup: recover any runs that crashed mid-execution
+const orphaned = await engine.recoverOrphanedRuns({ staleThresholdMs: 5 * 60_000 })
+console.log('recovered', orphaned.length, 'orphaned runs')
+```
+### MCP — auth refresh + sampling
+Two opt-in features for MCP server integrations (Plan 018):
+**`headersProvider`** — refresh OAuth tokens between requests:
+```ts
+mcp: {
+  servers: {
+    github: {
+      type: 'http',
+      url: 'https://mcp.github.example.com',
+      headers: { 'X-Tenant': 'acme' },          // static
+      headersProvider: async () => ({           // dynamic, called per send
+        Authorization: `Bearer ${await refreshGithubToken()}`,
+      }),
+    },
+  },
+}
+```
+The provider is called before every MCP request; its result merges over the static `headers`. On HTTP 401 the engine invokes the provider a second time and retries the request once. Without this hook, a long run dies the moment its bearer token expires (~1 hour for OAuth).
+**`allowSampling`** — let an MCP server request LLM completions through the engine:
+```ts
+mcp: {
+  servers: {
+    research_tools: {
+      type: 'http',
+      url: 'https://mcp.research.example.com',
+      allowSampling: true,                      // OFF by default
+    },
+  },
+}
+// Optional — provide a custom handler. Default routes to engine's own ModelAdapter.
+new Engine(config, {
+  samplingHandler: async (request, context) => {
+    // request.messages, request.maxTokens, request.systemPrompt, ...
+    // context.serverName, context.depth, context.runId
+    return {
+      role: 'assistant',
+      model: 'cheap-model-for-mcp',
+      content: { type: 'text', text: '...' },
+      stopReason: 'endTurn',
+    }
+  },
+})
+```
+When `allowSampling: false` (the default), the engine omits the `sampling` capability from its MCP handshake — servers that try to call `sampling/createMessage` get a "method not supported" error from the SDK directly.
+When `allowSampling: true`, the engine installs a request handler that routes to either your custom `samplingHandler` or a built-in default that uses the engine's own model. The default handler refuses recursive sampling past `DEFAULT_SAMPLING_MAX_DEPTH = 3` to prevent loops. Token usage from sampling counts against any `tokenBudget` you've set on the parent run.
+Off-by-default is deliberate: sampling consumes your LLM budget. Opt in per server only when you've vetted the MCP.
+### Cloudflare Workers — three building blocks
+A Worker deployment needs three pieces beyond the standard engine:
+1. **Storage: native R2 binding** via `storage.provider: 'r2-binding'` — avoids the `@aws-sdk/client-s3` bundle and its ListObjectsV2 hang on the Workers runtime.
+2. **Agent loop lifetime: Durable Objects** — the default fire-and-forget executor can't survive a Worker request return, so wrap work in `ctx.waitUntil()` inside a DO, or provide a custom `BackgroundExecutor`.
+3. **MCP transport: `preferBindingTransport: true`** — makes the engine's MCP client use plain POST JSON-RPC instead of the SDK's Streamable-HTTP SSE client (which hangs on Workers after `initialize`).
+#### Storage — R2 binding provider
+```ts
+import { initEngine, Engine } from 'la-machina-engine'
+const { config } = initEngine({
+  model: { provider: 'anthropic', apiKey: env.ANTHROPIC_API_KEY },
+  storage: {
+    provider: 'r2-binding',
+    rootPath: 'tenants/acme',
+    workspaceId: 'default',
+    r2Binding: env.STORAGE,   // the R2Bucket binding from wrangler.toml
+  },
+})
+const engine = new Engine(config)
+```
+No S3 credentials, no endpoint URL — the binding handles auth. Works with `wrangler dev --local` (Miniflare emulates R2 in-memory).
+`wrangler.toml`:
+```toml
+[[r2_buckets]]
+binding = "STORAGE"
+bucket_name = "la-machina"
+preview_bucket_name = "la-machina-preview"
+```
+#### Agent loop lifetime — Durable Objects
+Each `runId` maps to a DO via `idFromName(runId)`. The DO calls `engine.start()` inside `ctx.waitUntil()`, which keeps the isolate alive past the Worker request's return. Resumes route to the same DO so they pick up the paused snapshot.
+```ts
+export class RunDurableObject extends DurableObject<Env> {
+  override async fetch(req: Request): Promise<Response> {
+    const body = await req.json()
+    this.ctx.waitUntil(this.doRun(body))   // keeps DO alive until done
+    return new Response(null, { status: 202 })
+  }
+  private async doRun(body: StartBody): Promise<void> {
+    const engine = buildEngine(this.env, body.rootPath)
+    await engine.start({
+      runId: body.runId,
+      nodeId: body.nodeId,
+      task: body.task,
+      ...(body.webhook ? { webhook: body.webhook } : {}),
+    })
+    await engine.waitFor(body.runId, { nodeId: body.nodeId, pollIntervalMs: 500 })
+  }
+}
+```
+Alternative (advanced): implement `BackgroundExecutor` and pass it via `EngineInternals.backgroundExecutor` if you want `engine.start()` itself to schedule into a DO from the Worker fetch handler. See `examples/cloudflare-worker-ts/src/runDO.ts` for the common-case pattern.
+#### MCP on Workers — `preferBindingTransport`
+```ts
+initEngine({
+  // ...
+  mcp: {
+    servers: {
+      flow: {
+        type: 'http',
+        url: 'https://your-mcp-server.com/mcp',
+        preferBindingTransport: true,   // ← Workers-safe
+      },
+    },
+  },
+})
+```
+When this flag is set, the engine's MCP client uses `BindingHttpTransport` — a stateless POST-only JSON-RPC transport. No long-lived SSE reader, no streaming notifications (not needed for tool calling).
+On Node, leave the flag off to keep the full Streamable-HTTP feature set.
+#### Working reference
+A complete TypeScript example is at `examples/cloudflare-worker-ts/`:
+- `src/env.ts` — builds an Engine with `r2-binding` + `preferBindingTransport`
+- `src/runDO.ts` — `RunDurableObject` with `ctx.waitUntil()`
+- `src/index.ts` — `POST /sync`, `POST /async/start`, `GET /async/status/:runId`, `POST /async/resume/:runId`, `POST /demo/webhook` receiver with HMAC verification
+- `mcp-server/server.mjs` — local HTTP MCP server for the memo-pipeline demo
+- `test-client.sh` — end-to-end curl demo
+Run:
+```
+cd examples/cloudflare-worker-ts
+cp .dev.vars.example .dev.vars && $EDITOR .dev.vars
+bunx wrangler dev --local
+./test-client.sh
+```
+Everything else (state.json, webhooks, polling, resume, recovery) works unchanged.
+### Sync vs. async — when to use which
+| Scenario | Use |
+|---|---|
+| Simple task, < 60s | `engine.run()` (sync) |
+| HITL where you can block the caller | `engine.run()` + `engine.resume()` |
+| Long task, client can't block | `engine.start()` + `getStatus` / `waitFor` |
+| HITL in a web app (user closes tab) | `engine.start()` + webhook on `paused` |
+| Cloudflare Workers (any non-trivial run) | `storage.provider: 'r2-binding'` + DO + `preferBindingTransport` |
+| Server crash recovery | `engine.recoverOrphanedRuns()` on startup |
+---
+## Agent Hierarchy
+Three execution modes:
+### Normal Mode (`engine.run()`)
+- Parent gets all tools, children get all except Agent
+- Depth bounded by `maxSubagentDepth` (default 5)
+### Coordinator Mode (`coordinator.enabled: true`)
+- **Code-enforced** tool split — coordinator can only delegate
+- Coordinator: Agent, SendMessage, Tasks, Memory, ToolSearch
+- Workers: Bash, Read, Write, Edit, Glob, Grep
+### Orchestrator Mode (`engine.orchestrate()`)
+- Deterministic state machine: Plan → Research → Implement → Verify → Finalize
+- Parallel researchers (`maxParallelResearchers`, default 3)
+- Retry policies per phase with file snapshot rollback
+### Features
+| Feature | Description |
+|---------|-------------|
+| **Fork subagent** | Child inherits parent's full message context (placeholder tool_results for cache sharing) |
+| **Background agents** | `run_in_background: true` — fire-and-forget, results drain at turn boundary |
+| **SendMessage** | Inter-agent message queue, routed by name or agentId |
+| **Parallel batching** | Concurrent-safe tools execute via `Promise.all` with semaphore |
+| **Bash error cascading** | Bash error aborts sibling tools via `AbortController` |
+| **Subagent gate propagation** | Opt-in: parent gate threads into child loops; child denial pauses the parent with `pendingSubagent` set |
+### Subagent gate propagation (opt-in)
+By default, `hooks.gateBeforeTool` applies ONLY to the parent's direct tool dispatch. Subagents spawned via the `Agent` tool run their own inner loops with no gate. That's fine when your risky tool calls live on the parent, but fails when a subagent itself wants to call a risky tool.
+Set `hooks.propagateGateToSubagents: true` and the parent's gate is threaded into every subagent's loop. When a subagent's gate denies a tool, the child pauses → the engine surfaces a `SubagentPausedError` → the parent's loop catches it and produces its own `paused` result with a new snapshot field:
+```ts
+snapshot.pendingSubagent = {
+  subagentType: 'researcher',          // which agent type ran
+  parentToolUseId: 'agent_abc',        // the parent's Agent tool_use
+  childSnapshot: { … nested RunSnapshot … },
+}
+```
+`snapshot.pauseReason` becomes `'subagent_gate_required'` (distinct from the plain `'gate_required'` case so clients can distinguish). The parent's `pendingToolCall` still points at the Agent invocation, so resume semantics re-spawn the subagent.
+**Scope in v0.1**: one level of nesting. Deeper trees still propagate the gate into every level, but the snapshot only records the immediate child. Resume after a subagent pause re-runs the subagent from scratch — no mid-subagent continuation yet.
+---
+## Configuration Reference
+15 sections, all optional. Full defaults in `src/config/defaults.ts`.
+```ts
+initEngine({
+  model: {
+    provider: 'anthropic',        // 'anthropic' | 'openai' | 'google' | 'openai-compatible' | 'proxy'
+    modelId: 'claude-opus-4-6',
+    apiKey: '',                   // or ANTHROPIC_API_KEY env
+    baseURL: undefined,
+    maxTokens: 8192,
+    temperature: 1,
+    maxRetries: 2,
+  },
+  storage: {
+    provider: 'local',            // 'local' | 'r2'
+    rootPath: '~/.claude',
+    workspaceId: 'default',
+    r2: { bucket, region, accessKeyId, secretAccessKey, endpoint },
+  },
+  memory: { mode: 'off', scope: 'workspace' },
+  tools: { enabled: ['*'], disabled: [], custom: [] },
+  agents: { builtins: ['general-purpose'], customPath: undefined },
+  skills: { path: undefined, autoload: false },
+  execution: {
+    maxTurns: 50,
+    maxSubagentDepth: 5,
+    turnTimeoutMs: 300_000,
+    runTimeoutMs: 1_800_000,
+    contextLimit: 200_000,
+    maxToolConcurrency: 10,
+  },
+  transcript: { enabled: true, flushPolicy: 'turn-end', idleFlushMs: 2000 },
+  hooks: { preRun: [], postRun: [], preTurn: [], postTurn: [], preToolCall: [], postToolCall: [], gateBeforeTool: undefined, stopHooks: [] },
+  logging: { level: 'warn', sink: 'stderr' },
+  mcp: { servers: {}, connectTimeoutMs: 10_000, callTimeoutMs: 60_000, shutdownTimeoutMs: 5_000 },
+  permissions: { mode: 'open', rules: [] },
+  compaction: { strategy: 'auto', threshold: 0.85, keepLast: 6, summaryMaxTokens: 4096, microcompact: true, microcompactAgeMs: 300_000 },
+  coordinator: { enabled: false, workerTools: ['Bash','Read','Write','Edit','Glob','Grep'], maxConcurrentWorkers: 5 },
+  orchestrator: { enabled: false, retries: { plan, research, implement, verify, review }, maxParallelResearchers: 3, enableReview: false, enableRollback: true, maxPlanSteps: 20, agentMaxTurns: 15 },
+})
+```
+---
+## What's Implemented
+All features ported 1:1 from La-Machina's production runtime. Pure JS, Workers-compatible.
+### Core Agent Loop
+- [x] Multi-turn agent loop with streaming
+- [x] Reactive recovery — max_tokens 3-stage (escalate 8K→64K → recovery message 3x → fail)
+- [x] 413 prompt-too-long recovery (emergency compact → retry)
+- [x] API retry with exponential backoff (429/500/529 + Retry-After + network errors)
+- [x] Message normalization (merge same-role, strip empty, alternation enforcement, tool pairing)
+- [x] Internal block stripping (advisor/signature/tool_reference/caller)
+- [x] Extended thinking preservation (in transcript, stripped before API)
+- [x] Tool result truncation (100K char cap)
+- [x] Duplicate tool_use ID deduction
+- [x] Empty assistant response handling
+- [x] Token budget enforcement (graceful stop)
+### Tool Execution
+- [x] StreamingToolExecutor — stateful queue, concurrency control, ordered results
+- [x] Parallel tool batching (`isConcurrencySafe` flag per tool)
+- [x] Bash error cascading (AbortController aborts sibling tools)
+- [x] 22 built-in tools
+- [x] Custom tool registration via `defineTool()`
+- [x] Device path blocking (/dev/zero, /dev/random, /proc/kcore)
+### Agent Hierarchy
+- [x] Subagent spawning with depth tracking (SubagentRegistry)
+- [x] Fork subagent (context inheritance, placeholder tool_results, recursion guard)
+- [x] Background agents (fire-and-forget, result drain at turn boundary)
+- [x] SendMessage (inter-agent message queue)
+- [x] Coordinator mode (code-enforced tool split)
+- [x] Orchestrator state machine (plan → research → implement → verify → finalize)
+- [x] Parallel researchers (configurable count)
+- [x] Agent registry persistence (toJSON/fromJSON, metadata sidecar)
+### Infrastructure
+- [x] Storage adapters (Local + R2) with atomic writes
+- [x] Transcript system (NDJSON shards, snapshot/restore, corruption recovery)
+- [x] Smart memory (profile, rules, lessons, episodes)
+- [x] Skills loader with multi-page lazy loading
+- [x] MCP client (stdio/SSE/HTTP) with instruction delta tracking
+- [x] Compaction (microcompact, summarize, drop-middle)
+- [x] Permissions (open/rules/locked modes)
+- [x] System prompt (12 sections, git injection, memory/skills injection)
+- [x] 8-slot hook system (including stop hooks with continuation control)
+- [x] Enriched PostRunEvent (toolCallCount, transcriptPath)
+- [x] Cross-runtime UUID (Web Crypto API fallback)
+- [x] Workers compatibility (zero top-level node: imports)
+### Testing
+- [x] 960 tests across 86 files
+- [x] 10 live workflow tests (W1-W10) against OpenRouter + R2
+- [x] Coverage: 81% lines, 85% branches, 91% functions
+- [x] CI pipeline (lint + typecheck + test + coverage gates)
+---
+## What's Deferred
+Features intentionally not ported — either Anthropic-only, CLI-specific, or design choices for headless library.
+### Design Choices (not bugs)
+| Feature | La-Machina | Engine | Why |
+|---------|-----------|--------|-----|
+| CLAUDE.md loading | Loads project/user/global files | None | Library — caller provides context via task |
+| Full resume reconstruction | Transcript replay + orphaned thinking cleanup | Snapshot-based (Phase 10 TODO) | Simpler model, covers 90% of use cases |
+| Streaming events to caller | Yields per-event for UI | Returns single RunResult | Headless — no UI to stream to |
+| Plugin system | Dynamic marketplace + MCP bundle | Skills + MCP only | Library doesn't need runtime plugins |
+| Multi-turn CLI session | Full conversation persistence | CLI-only history (in cli.mjs) | Not applicable to headless runs |
+### Anthropic-Only (not applicable to multi-provider)
+| Feature | What it does | Why skipped |
+|---------|-------------|-------------|
+| System prompt cache_control | Marks sections with `cache_control: { type: 'ephemeral' }` | Ignored by non-Anthropic providers |
+| Beta headers | Sends prompt-caching, extended-thinking betas | Provider-specific |
+| Tool schema caching | Caches tool definitions for prompt cache stability | Anthropic optimization |
+| Task budget (output_config) | Sends token budget to API so model self-paces | Anthropic-only API field |
+| Fast mode / speed param | `speed: 'fast'` for faster inference | Anthropic-only |
+| Effort param | `output_config.effort` controls reasoning depth | Anthropic-only |
+| Advisor model | Secondary reviewer model | GrowthBook-gated, Anthropic-only |
+### Stubbed in La-Machina Too
+| Feature | Status |
+|---------|--------|
+| Reactive compact | `feature('REACTIVE_COMPACT')` = false, file doesn't exist |
+| Context collapse | `feature('CONTEXT_COLLAPSE')` = false, stub returns `undefined` |
+| Tool use summaries | SDK/mobile UI only, doesn't affect agent behavior |
+| Bash classifier | Stub returns false always (ANT-ONLY) |
+---
+## Architecture
+```
+┌──────────────────────────┐
+│       Engine             │
+│                          │
+│  engine.run()            │
+│                          │
+│  1. storage adapter      │◄── createEngineStorage(config.storage)
+│  2. API client           │◄── createModelAdapter(config.model)
+│  3. tool registry        │◄── buildToolRegistry(config.tools)
+│  4. smart memory         │◄── createSmartMemory(config.memory)
+│  5. system prompt        │◄── buildSystemPrompt(memory + skills + git)
+│  6. transcript writer    │
+│  7. run context          │
+│                          │
+│  agentLoop:              │
+│    ┌──────────────────┐  │
+│    │ normalizeMessages│  │  ← strip blocks, ensure alternation, tool pairing
+│    │ streamMessage    │──┼──► ModelAdapter ──► Anthropic / OpenRouter / AI SDK
+│    │ StreamingToolExec│──┼──► ToolExecutor ──► 22 tools (parallel safe batch)
+│    │ truncateResult   │  │
+│    │ stopHooks        │  │  ← can prevent continuation
+│    │ reactive recovery│  │  ← max_tokens, 413, retry with backoff
+│    └──────────────────┘  │
+│                          │
+│  return RunResult        │
+└──────────────────────────┘
+         │
+         ▼
+  { done | paused | failed }
+```
+---
+## Development
+```bash
+npm install
+npm run build          # tsup → dist/ (ESM + CJS + .d.ts)
+npm test               # 1214 tests (~12s with bun)
+npm run test:watch     # watch mode
+npm run test:coverage  # with coverage gates
+npm run typecheck      # TypeScript strict
+npm run lint           # ESLint
+npm run ci             # lint + typecheck + test + coverage
+```
+### Releasing
+Releases are fully automated via `.github/workflows/publish.yml`. The
+workflow runs on every push to `main` and publishes only when
+`package.json#version` differs from the version on npm.
+```bash
+# 1. Make changes + commit
+git commit -am "feat: add X"
+# 2. Bump version (auto-tags + commits)
+npm version patch       # 0.3.0 → 0.3.1
+# or `minor` / `major` / `0.4.0 --no-git-tag-version` for explicit
+# 3. Push
+git push && git push --tags
+# CI now:
+#   - Builds + typechecks
+#   - npm publish --access public --provenance
+#   - Pushes a v{version} tag
+#   - Creates a GitHub Release with auto-generated notes
+```
+Required GitHub repo secret: `NPM_TOKEN` — granular access token with
+publish permission on `la-machina-engine` and "Bypass 2FA" enabled.
+### Test Counts
+| Category | Files | Tests |
+|----------|-------|-------|
+| Unit | 70+ | ~870 |
+| Integration | 15+ | ~130 |
+| E2E | 5 | ~30 |
+| Coverage additions | 20+ | ~130 |
+| **Total** | **115+** | **1214** (current `bun test` count; 8 pre-existing Bun timer failures unrelated) |
+### Live Workflow Tests
+| Test | Description | Turns |
+|------|-------------|-------|
+| W1 | Invoice data transformer | 3 |
+| W2 | Cross-run memory learning | 4 |
+| W3 | Gated approval (pause/resume) | 5 |
+| W4 | Permission-locked analyst | 6 |
+| W5 | Research with skills | 5 |
+| W6 | MCP integration | 4 |
+| W7 | Coordinator delegation | 6 |
+| W8 | Orchestrator (plan→verify) | 8 |
+| W9 | Client onboarding (5 phases) | 22 |
+| W10 | SaaS dashboard (long-running) | 9 |
+| W11 | Async HITL (sync + async + webhook) | 6 |
+| W12 | Multi-agent + MCP + skills + HITL (parent gates child's publish) | 4 |
+| W13 | Per-run skill override (inline body + URL + fetch cache) | 4 |
+| W14 | MCP auth refresh + sampling round-trip (stdio + http) | n/a (integration) |
+---
+## License
+MIT
+## Acknowledgments
+- [La-Machina](https://github.com/zahidhasanaunto/La-Machina) — the reference implementation this engine was ported from
+- [Anthropic](https://anthropic.com) for Claude and the Messages API
+- [Vercel AI SDK](https://sdk.vercel.ai) for multi-provider abstraction