la-machina-engine 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,1239 @@
1
+ # la-machina-engine
2
+
3
+ [![npm version](https://img.shields.io/npm/v/la-machina-engine.svg)](https://www.npmjs.com/package/la-machina-engine)
4
+ [![npm downloads](https://img.shields.io/npm/dm/la-machina-engine.svg)](https://www.npmjs.com/package/la-machina-engine)
5
+ [![CI](https://github.com/zahidhasanaunto/la-machina-engine/actions/workflows/ci.yml/badge.svg)](https://github.com/zahidhasanaunto/la-machina-engine/actions/workflows/ci.yml)
6
+ [![Publish](https://github.com/zahidhasanaunto/la-machina-engine/actions/workflows/publish.yml/badge.svg)](https://github.com/zahidhasanaunto/la-machina-engine/actions/workflows/publish.yml)
7
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
8
+
9
+ **Headless, multi-provider LLM agent engine for workflow automation.**
10
+
11
+ A library, not a CLI. You `import` it, give it a task, and it runs a bounded agent loop — streaming an LLM, dispatching tools, spawning subagents, persisting a durable transcript — until the task is done, paused, or fails. Memory learns across runs. Runs pause mid-execution and resume later with full state. Storage is pluggable — local filesystem in dev, Cloudflare R2 in production, same code.
12
+
13
+ Built for embedding inside a workflow orchestrator (e.g. an n8n-style DAG runner where each node needs an LLM brain). If you want a terminal chatbot, use Claude Code. If you want the brain that runs inside each node of a production workflow, use this.
14
+
15
+ ```bash
16
+ npm install la-machina-engine
17
+ ```
18
+
19
+ ---
20
+
21
+ ## Status
22
+
23
+ **v0.3.0 — published on npm; production-ready core, evolving feature surface.**
24
+
25
+ - **1214** unit + integration tests pass (8 pre-existing Bun-timer failures unrelated)
26
+ - Zero top-level `node:` imports — runs on Node.js AND Cloudflare Workers
27
+ - 14 live workflow tests (W1–W14) verified against OpenRouter, real R2, real MCP servers
28
+ - Pause/resume + async runs + webhooks + state.json + R2 binding storage adapter
29
+ - MCP support: stdio, http (Streamable + Workers-safe binding transport), sse — with auth refresh + sampling
30
+ - Skills: disk-backed default + per-run override (inline body or HTTPS url)
31
+ - Subagent gate propagation (opt-in) — parent can pause when a child's tool is gated
32
+
33
+ ---
34
+
35
+ ## Table of Contents
36
+
37
+ - [Design Principles](#design-principles)
38
+ - [Install](#install)
39
+ - [Quick Start](#quick-start)
40
+ - [CLI Test Harness](#cli-test-harness)
41
+ - [Core Concepts](#core-concepts)
42
+ - [Async API (start / waitFor / webhooks)](#async-api)
43
+ - [Multi-Provider Support](#multi-provider-support)
44
+ - [Agent Hierarchy](#agent-hierarchy)
45
+ - [Configuration Reference](#configuration-reference)
46
+ - [What's Implemented](#whats-implemented)
47
+ - [What's Deferred](#whats-deferred)
48
+ - [Architecture](#architecture)
49
+ - [Development](#development)
50
+ - [License](#license)
51
+
52
+ ---
53
+
54
+ ## Design Principles
55
+
56
+ - **Zero-config works.** `initEngine()` with no arguments runs, given `ANTHROPIC_API_KEY` in the environment.
57
+ - **Every knob has a default.** No config option is required; all are overridable.
58
+ - **Headless.** No terminal UI, no React, no Ink. Plain Node/Workers library.
59
+ - **Cloud-native.** Storage is pluggable — local filesystem in dev, Cloudflare R2 in production.
60
+ - **Pausable.** Runs can suspend mid-turn via a gate callback and resume later with full state.
61
+ - **Workers-compatible.** Zero top-level `node:` imports. All platform-specific code is lazy-loaded.
62
+ - **Multi-provider.** Anthropic native + Vercel AI SDK (OpenAI, Google, OpenRouter, 75+ providers).
63
+ - **Error-isolated.** A misbehaving tool cannot crash the agent loop. The loop never throws.
64
+ - **TypeScript-first.** Strict mode, Zod-inferred types, discriminated unions for all results.
65
+
66
+ ---
67
+
68
+ ## Install
69
+
70
+ ```bash
71
+ npm install la-machina-engine
72
+ ```
73
+
74
+ Requires Node 20+. Works in Bun, Cloudflare Workers (with R2 storage).
75
+
76
+ The package is published as a single bundled module (~330 KB ESM, ~340 KB CJS) with full TypeScript types. Built with [tsup](https://tsup.egoist.dev/), CI publishes with [Sigstore provenance](https://docs.npmjs.com/generating-provenance-statements).
77
+
78
+ ---
79
+
80
+ ## Quick Start
81
+
82
+ ```ts
83
+ import { initEngine } from 'la-machina-engine'
84
+
85
+ const engine = initEngine() // uses ANTHROPIC_API_KEY from env
86
+
87
+ const response = await engine.run({
88
+ nodeId: 'node_xyz',
89
+ task: 'Summarize the contents of data.csv and propose 3 insights.',
90
+ // runId optional — auto-generated as run_<uuid> if omitted
91
+ })
92
+
93
+ if (response.status === 'done') {
94
+ console.log(response.data) // text string (or JSON object if outputFormat: 'json')
95
+ console.log(`runId: ${response.runId}`)
96
+ console.log(`${response.meta.turns} turns, ${response.meta.tokensUsed.input + response.meta.tokensUsed.output} tokens`)
97
+ }
98
+ ```
99
+
100
+ ### With OpenRouter (multi-provider)
101
+
102
+ ```ts
103
+ const engine = new Engine(
104
+ initEngine({
105
+ model: {
106
+ provider: 'proxy',
107
+ modelId: 'google/gemini-2.5-pro',
108
+ apiKey: 'sk-or-...',
109
+ baseURL: 'https://openrouter.ai/api',
110
+ },
111
+ }).config,
112
+ {
113
+ fetch: async (input, init = {}) => {
114
+ const headers = new Headers(init.headers ?? {})
115
+ headers.delete('x-api-key')
116
+ headers.set('Authorization', 'Bearer ' + apiKey)
117
+ return fetch(input, { ...init, headers })
118
+ },
119
+ },
120
+ )
121
+ ```
122
+
123
+ ### With R2 cloud storage
124
+
125
+ ```ts
126
+ const engine = initEngine({
127
+ storage: {
128
+ provider: 'r2',
129
+ rootPath: 'my-project',
130
+ workspaceId: 'production',
131
+ r2: {
132
+ bucket: 'my-bucket',
133
+ region: 'auto',
134
+ accessKeyId: '...',
135
+ secretAccessKey: '...',
136
+ endpoint: 'https://xxx.r2.cloudflarestorage.com',
137
+ },
138
+ },
139
+ })
140
+ ```
141
+
142
+ ---
143
+
144
+ ## CLI Test Harness
145
+
146
+ A minimal interactive CLI for testing the engine directly:
147
+
148
+ ```bash
149
+ node cli.mjs # interactive REPL
150
+ node cli.mjs "your task here" # one-shot
151
+ node cli.mjs --model anthropic/claude-sonnet-4 "task"
152
+ ```
153
+
154
+ Multi-turn conversation maintained across prompts. Commands: `/clear` (reset), `/turns` (info).
155
+
156
+ Env vars: `OPENROUTER_API_KEY` (required), `ENGINE_MODEL`, `ENGINE_STORAGE`, `ENGINE_MAX_TURNS`.
157
+
158
+ ---
159
+
160
+ ## Response Format
161
+
162
+ `engine.run()` and `engine.resume()` return `EngineResponse` directly — one flat shape for every status.
163
+
164
+ ```ts
165
+ const response = await engine.run({
166
+ nodeId: 'n1',
167
+ task: '...',
168
+ // runId optional — auto-generated if omitted
169
+ })
170
+ // response.runId, response.status, response.data, response.meta, response.errors, response.timestamp
171
+ ```
172
+
173
+ **Single shape, every status. Client always reads `response.data` and tracks `response.runId`.**
174
+
175
+ ### Done — Text Mode (default)
176
+
177
+ ```json
178
+ {
179
+ "runId": "run_abc",
180
+ "status": "done",
181
+ "data": "The analysis shows revenue grew 15% year-over-year with strongest performance in Q4.",
182
+ "meta": {
183
+ "nodeId": "analyze",
184
+ "turns": 5,
185
+ "tokensUsed": { "input": 12500, "output": 3200 },
186
+ "durationMs": 8500,
187
+ "output": "The analysis shows revenue grew 15% year-over-year with strongest performance in Q4.",
188
+ "transcript": { "path": "projects/run_abc/nodes/analyze", "lastShardIndex": 0 }
189
+ },
190
+ "errors": [],
191
+ "timestamp": 1712966400000
192
+ }
193
+ ```
194
+
195
+ ### Done — JSON Mode with Schema
196
+
197
+ ```ts
198
+ const result = await engine.run({
199
+ task: 'Fetch example.com and extract pricing tiers',
200
+ outputFormat: 'json',
201
+ outputSchema: z.object({
202
+ tiers: z.array(z.object({ name: z.string(), price: z.number() })),
203
+ }),
204
+ })
205
+ ```
206
+
207
+ ```json
208
+ {
209
+ "runId": "run_abc",
210
+ "status": "done",
211
+ "data": {
212
+ "tiers": [
213
+ { "name": "Starter", "price": 29 },
214
+ { "name": "Pro", "price": 99 },
215
+ { "name": "Enterprise", "price": 299 }
216
+ ]
217
+ },
218
+ "meta": {
219
+ "nodeId": "extract",
220
+ "turns": 3,
221
+ "tokensUsed": { "input": 8000, "output": 1500 },
222
+ "durationMs": 12000,
223
+ "output": "{\"tiers\":[{\"name\":\"Starter\",\"price\":29},{\"name\":\"Pro\",\"price\":99},{\"name\":\"Enterprise\",\"price\":299}]}",
224
+ "transcript": { "path": "projects/run_abc/nodes/extract", "lastShardIndex": 0 }
225
+ },
226
+ "errors": [],
227
+ "timestamp": 1712966400000
228
+ }
229
+ ```
230
+
231
+ ### Done — JSON Mode, Parse Failed
232
+
233
+ When the model doesn't return valid JSON despite instructions:
234
+
235
+ ```json
236
+ {
237
+ "runId": "run_abc",
238
+ "status": "done",
239
+ "data": "Here's the pricing: Starter at $29, Pro at $99, Enterprise at $299.",
240
+ "meta": {
241
+ "nodeId": "extract",
242
+ "turns": 2,
243
+ "tokensUsed": { "input": 5000, "output": 800 },
244
+ "durationMs": 6000,
245
+ "output": "Here's the pricing: Starter at $29, Pro at $99, Enterprise at $299."
246
+ },
247
+ "errors": [],
248
+ "timestamp": 1712966400000
249
+ }
250
+ ```
251
+
252
+ `data` falls back to raw text. Client checks `typeof response.data === 'object'` to verify structured output.
253
+
254
+ ### Paused — Human Approval Needed
255
+
256
+ `data` is the **content** the human reviews — not internal details like paths. Path info is in `meta.pendingToolCall`.
257
+
258
+ ```json
259
+ {
260
+ "runId": "run_abc",
261
+ "status": "paused",
262
+ "data": "# Q4 Revenue Summary\n\nTotal revenue: $705,000...",
263
+ "meta": {
264
+ "nodeId": "write-report",
265
+ "turns": 3,
266
+ "tokensUsed": { "input": 8500, "output": 2100 },
267
+ "durationMs": 9200,
268
+ "snapshot": {
269
+ "version": 1,
270
+ "status": "paused",
271
+ "runId": "run_abc",
272
+ "nodeId": "write-report",
273
+ "pausedAt": "2026-04-13T14:30:00.000Z",
274
+ "pauseReason": "gate_required",
275
+ "messageCount": 6,
276
+ "lastShardIndex": 3,
277
+ "lastMessageUuid": "a1b2c3d4-...",
278
+ "pendingToolCall": {
279
+ "toolName": "Write",
280
+ "toolUseId": "toolu_abc123",
281
+ "input": { "path": "reports/q4-summary.md", "content": "# Q4 Revenue Summary..." },
282
+ "calledAt": "2026-04-13T14:30:00.000Z"
283
+ },
284
+ "tokensUsedSoFar": { "input": 8500, "output": 2100 },
285
+ "turnsUsed": 3
286
+ },
287
+ "pendingToolCall": {
288
+ "toolName": "Write",
289
+ "toolUseId": "toolu_abc123",
290
+ "input": { "path": "reports/q4-summary.md", "content": "..." },
291
+ "calledAt": "2026-04-13T14:30:00.000Z"
292
+ },
293
+ "pauseReason": "gate_required",
294
+ "transcript": { "path": "projects/run_abc/nodes/write-report", "lastShardIndex": 3 }
295
+ },
296
+ "errors": [],
297
+ "timestamp": 1712966400000
298
+ }
299
+ ```
300
+
301
+ **`data`** = the tool input the agent wanted to execute (what the human reviews).
302
+ **`meta.snapshot`** = pass to `engine.resume()` to continue.
303
+ **`meta.pendingToolCall`** = shortcut to see what tool was blocked.
304
+
305
+ ### Paused — Topic Selection (Custom Tool)
306
+
307
+ When the agent tries to Write a JSON file with choices, `data` is the JSON content string:
308
+
309
+ ```json
310
+ {
311
+ "runId": "blog-001",
312
+ "status": "paused",
313
+ "data": "{\"topics\":[{\"title\":\"AI Trade Wars\",\"angle\":\"startup impact\"},{\"title\":\"EU Migration Reform\",\"angle\":\"policy analysis\"},{\"title\":\"Climate Summit\",\"angle\":\"developing nations\"}]}",
314
+ "meta": {
315
+ "nodeId": "research",
316
+ "turns": 2,
317
+ "tokensUsed": { "input": 5000, "output": 1200 },
318
+ "durationMs": 7500,
319
+ "snapshot": { "..." },
320
+ "pendingToolCall": {
321
+ "toolName": "Write",
322
+ "toolUseId": "toolu_xyz",
323
+ "input": { "path": "topics.json", "content": "{\"topics\":[...]}" },
324
+ "calledAt": "2026-04-13T10:00:00.000Z"
325
+ },
326
+ "pauseReason": "gate_required"
327
+ },
328
+ "errors": [],
329
+ "timestamp": 1712966400000
330
+ }
331
+ ```
332
+
333
+ Resume with user's choice:
334
+ ```ts
335
+ await engine.resume({
336
+ snapshot: response.meta.snapshot,
337
+ gateAnswer: 'Approved. User selected Topic 1: AI Trade Wars.',
338
+ })
339
+ ```
340
+
341
+ ### Failed — Max Turns Exceeded
342
+
343
+ ```json
344
+ {
345
+ "runId": "run_abc",
346
+ "status": "failed",
347
+ "data": null,
348
+ "meta": {
349
+ "nodeId": "complex-task",
350
+ "turns": 0,
351
+ "tokensUsed": { "input": 0, "output": 0 },
352
+ "durationMs": 45000,
353
+ "transcript": { "path": "projects/run_abc/nodes/complex-task", "lastShardIndex": 8 }
354
+ },
355
+ "errors": [
356
+ {
357
+ "code": "ERR_MAX_TURNS",
358
+ "message": "Run exceeded max turns"
359
+ }
360
+ ],
361
+ "timestamp": 1712966400000
362
+ }
363
+ ```
364
+
365
+ ### Failed — API Error After Retries
366
+
367
+ ```json
368
+ {
369
+ "runId": "run_abc",
370
+ "status": "failed",
371
+ "data": null,
372
+ "meta": {
373
+ "nodeId": "task",
374
+ "turns": 0,
375
+ "tokensUsed": { "input": 0, "output": 0 },
376
+ "durationMs": 12000,
377
+ "transcript": { "path": "projects/run_abc/nodes/task", "lastShardIndex": 0 }
378
+ },
379
+ "errors": [
380
+ {
381
+ "code": "ERR_RATE_LIMIT",
382
+ "message": "429 Too Many Requests"
383
+ }
384
+ ],
385
+ "timestamp": 1712966400000
386
+ }
387
+ ```
388
+
389
+ ### Failed — Max Tokens Recovery Exhausted
390
+
391
+ ```json
392
+ {
393
+ "runId": "run_abc",
394
+ "status": "failed",
395
+ "data": null,
396
+ "meta": {
397
+ "nodeId": "long-task",
398
+ "turns": 0,
399
+ "tokensUsed": { "input": 0, "output": 0 },
400
+ "durationMs": 30000,
401
+ "transcript": { "path": "projects/run_abc/nodes/long-task", "lastShardIndex": 5 }
402
+ },
403
+ "errors": [
404
+ {
405
+ "code": "ERR_MAX_TOKENS",
406
+ "message": "max_tokens recovery exhausted after 3 attempts"
407
+ }
408
+ ],
409
+ "timestamp": 1712966400000
410
+ }
411
+ ```
412
+
413
+ ### Error Codes Reference
414
+
415
+ | Code | When | Retryable? |
416
+ |------|------|-----------|
417
+ | `ERR_MAX_TURNS` | Run exceeded `execution.maxTurns` | No — increase maxTurns |
418
+ | `ERR_MAX_TOKENS` | max_tokens recovery failed after 3 attempts | No — reduce task scope |
419
+ | `ERR_RUN_TIMEOUT` | Run exceeded `execution.runTimeoutMs` | No — increase timeout |
420
+ | `ERR_RATE_LIMIT` | 429 after retry backoff exhausted | Yes — wait and retry |
421
+ | `ERR_API` | 500/502/503 after retries | Yes — transient |
422
+ | `ERR_API_OVERLOADED` | 529 five consecutive times | Yes — wait longer |
423
+ | `ERR_AUTH` | 401/403 invalid API key | No — fix credentials |
424
+ | `ERR_CONFIG` | Invalid configuration | No — fix config |
425
+ | `ERR_STREAM_PARSE` | Malformed API response | No — provider issue |
426
+ | `ERR_STREAM_INCOMPLETE` | Stream ended without message_stop | Yes — transient |
427
+ | `ERR_UNEXPECTED_STOP` | Unknown stop reason from API | No — investigate |
428
+ | `SCHEMA_VALIDATION_FAILED` | JSON output doesn't match outputSchema | No — adjust schema or task |
429
+ | `JSON_PARSE_FAILED` | Model didn't return valid JSON | No — adjust task |
430
+
431
+ ### Workflow Runner Integration
432
+
433
+ ```ts
434
+ // Step 1: Run — runId optional, auto-generated if omitted
435
+ const response = await engine.run({
436
+ nodeId: 'extract',
437
+ task: 'Fetch pricing from example.com',
438
+ outputFormat: 'json',
439
+ outputSchema: pricingSchema,
440
+ })
441
+
442
+ switch (response.status) {
443
+ case 'done':
444
+ // data is already typed/validated per outputSchema (JSON mode)
445
+ // or a plain text string (text mode)
446
+ passToNextNode(response.data)
447
+ break
448
+
449
+ case 'paused':
450
+ // Client only needs to remember the runId
451
+ saveToApprovalQueue({
452
+ runId: response.runId,
453
+ pendingAction: response.meta.pendingToolCall?.toolName,
454
+ data: response.data, // what to show the human
455
+ })
456
+ notifyHuman('Approval needed')
457
+ break
458
+
459
+ case 'failed':
460
+ logErrors(response.errors) // [{ code, message }]
461
+ retryOrEscalate(response)
462
+ break
463
+ }
464
+
465
+ // Step 2: Later, resume — just pass the runId
466
+ const resumed = await engine.resume({
467
+ runId: response.runId,
468
+ gateAnswer: 'Approved by manager',
469
+ })
470
+ ```
471
+
472
+ ---
473
+
474
+ ## Core Concepts
475
+
476
+ ### The Run Lifecycle
477
+
478
+ ```
479
+ engine.run({ runId, nodeId, task })
480
+
481
+ ├─ Build: storage, client, tools, memory, prompt, transcript
482
+ ├─ preRun hook
483
+ ├─ agentLoop:
484
+ │ while (!done) {
485
+ │ normalize messages (strip blocks, ensure alternation, tool pairing)
486
+ │ API streamMessage (with reactive recovery on max_tokens/413)
487
+ │ collect text + thinking + tool_use blocks
488
+ │ dispatch tools via StreamingToolExecutor (parallel safe, serial unsafe)
489
+ │ truncate results > 100K chars
490
+ │ postTurn + stopHooks (can prevent continuation)
491
+ │ }
492
+ ├─ postRun hook (always fires)
493
+ └─ return: done | paused | failed
494
+ ```
495
+
496
+ ### Storage Adapter
497
+
498
+ Two backends, same interface:
499
+
500
+ | Adapter | Backend | Use |
501
+ |---------|---------|-----|
502
+ | `LocalStorageAdapter` | `node:fs/promises` (lazy import) | Dev, tests |
503
+ | `R2StorageAdapter` | Cloudflare R2 via S3 protocol | Node / anywhere with S3 creds |
504
+ | `R2BindingStorageAdapter` | Cloudflare R2 native binding (`env.BUCKET`) | Cloudflare Workers (`provider: 'r2-binding'`) |
505
+
506
+ ### Smart Memory
507
+
508
+ Per-workspace learning across runs:
509
+
510
+ - **Profile** — agent identity
511
+ - **Rules** — behavioral constraints (always/never/when)
512
+ - **Lessons** — facts learned from prior runs (token-budgeted)
513
+ - **Episodes** — session-level observations (JSONL per session)
514
+
515
+ Modes: `off` (stateless), `read-only` (recall only), `read-write` (self-improving).
516
+
517
+ ### Skills
518
+
519
+ Markdown docs the model can pull on demand via the `SkillPage` tool. Two resolution modes, both drive the same runtime contract:
520
+
521
+ **1. Disk-backed (default)** — one directory per skill:
522
+
523
+ ```
524
+ {storage-root}/workspaces/{ws}/.claude/skills/
525
+ ├── memo-style/
526
+ │ ├── SKILL.md ← required — name + description + body
527
+ │ └── pages/
528
+ │ └── examples.md ← optional multi-page skill
529
+ └── brand-voice/
530
+ └── SKILL.md
531
+ ```
532
+
533
+ Enable via `config.skills.autoload: true`. The engine lists directory entries at run start, emits `name + description` into the system prompt, and lazy-loads bodies when the model calls `SkillPage`.
534
+
535
+ **2. Per-run override** — bind a specific skill bundle to one `engine.run()` / `engine.resumeAsync()` call without touching storage:
536
+
537
+ ```ts
538
+ await engine.run({
539
+ runId, nodeId, task,
540
+ skills: [
541
+ {
542
+ name: 'memo-style',
543
+ description: 'Internal memo format.',
544
+ body: '# memo-style\n\n## TL;DR\n...', // inline = zero-latency
545
+ },
546
+ {
547
+ name: 'brand-voice',
548
+ description: 'Company tone and voice.',
549
+ url: 'https://cdn.acme.com/skills/brand-voice/v3/SKILL.md', // lazy fetch, cached per run
550
+ headers: { Authorization: 'Bearer ...' },
551
+ pages: {
552
+ examples: { url: 'https://cdn.acme.com/skills/brand-voice/v3/examples.md' },
553
+ },
554
+ },
555
+ ],
556
+ })
557
+ ```
558
+
559
+ Override **replaces** disk discovery for that run — the model sees exactly the skills you list, nothing from `config.skills.path`. Useful in per-node workflow engines where each node needs a different bundle.
560
+
561
+ Security: set `config.skills.allowedHosts` (e.g. `['cdn.acme.com']`) to restrict URL fetches. Undefined = open (dev default). Requests outside the allowlist throw before hitting the network.
562
+
563
+ Caching: within one run, each URL is fetched at most once — subsequent `SkillPage` calls for the same skill/page are served from memory. Cache is per `InlineSkillSource` instance, so a fresh `engine.run()` always re-reads.
564
+
565
+ ### Tools (22 built-in)
566
+
567
+ | Tool | Safe? | Description |
568
+ |------|-------|-------------|
569
+ | Bash | No | Shell execution via `/bin/sh -c` (Node.js only) |
570
+ | Read | Yes | File read with line numbers, PDF, images |
571
+ | Write | No | Atomic file write |
572
+ | Edit | No | String replacement with uniqueness check |
573
+ | Glob | Yes | File pattern matching |
574
+ | Grep | Yes | Regex search (ripgrep when available, JS fallback) |
575
+ | WebFetch | Yes | HTTP fetch with HTML-to-text |
576
+ | WebSearch | Yes | DuckDuckGo web search |
577
+ | Agent | No | Spawn subagent (depth-bounded) |
578
+ | SendMessage | No | Inter-agent communication |
579
+ | Sleep | Yes | Delay for rate limiting |
580
+ | ToolSearch | Yes | Search registered tools |
581
+ | Memorize | No | Write to smart memory |
582
+ | Recall | Yes | Read from smart memory |
583
+ | TaskCreate/Get/List/Update | Mixed | Task tracking |
584
+ | NotebookEdit | No | Jupyter notebook editing |
585
+ | ListMcpResources | Yes | MCP resource browsing |
586
+ | ReadMcpResource | Yes | MCP resource reading |
587
+ | SkillPage | Yes | Lazy skill page loading |
588
+
589
+ "Safe" = `isConcurrencySafe` — safe tools run in parallel via the StreamingToolExecutor.
590
+
591
+ ### Hooks (8 slots)
592
+
593
+ | Hook | When | Can block? |
594
+ |------|------|-----------|
595
+ | `preRun` | Before agent loop starts | No |
596
+ | `postRun` | After run completes (always fires) | No |
597
+ | `preTurn` | Before each API call | No |
598
+ | `postTurn` | After each tool dispatch | No |
599
+ | `preToolCall` | Before each tool execution | No |
600
+ | `postToolCall` | After each tool execution | No |
601
+ | `gateBeforeTool` | Before tool dispatch — can pause the run | Yes (pause) |
602
+ | `stopHooks` | After each turn — can stop the run | Yes (stop) |
603
+
604
+ ---
605
+
606
+ ## Async API
607
+
608
+ `engine.run()` and `engine.resume()` are **synchronous** — they block until the run reaches a terminal state (`done` | `paused` | `failed`). For long-running work (multi-minute tasks, HITL workflows with human wait time, Cloudflare Workers / Durable Object hosts), the engine ships a parallel **async API**.
609
+
610
+ The async API is **additive**: sync calls still work exactly as before. Async just adds dispatch, polling, webhooks, and durable state.
611
+
612
+ ### Methods
613
+
614
+ | Method | Purpose |
615
+ |---|---|
616
+ | `engine.start(opts)` | Schedule a run in the background. Returns `{ runId, nodeId, status }` immediately. |
617
+ | `engine.resumeAsync(opts)` | Async version of `resume()`. Same options + optional `webhook`. |
618
+ | `engine.getStatus(runId, nodeId?)` | Read current state. Returns `EngineResponse` (provisional while running, final when terminal). |
619
+ | `engine.waitFor(runId, opts?)` | Poll until terminal. Returns the final `EngineResponse`. Respects `timeoutMs`. |
620
+ | `engine.cancelRun(runId, nodeId?)` | Abort a running run. Marks state as `cancelled`. |
621
+ | `engine.retryWebhook(runId, deliveryId)` | Re-fire a past webhook delivery (useful after downstream downtime). |
622
+ | `engine.recoverOrphanedRuns({ staleThresholdMs })` | Scan `state.json` files on startup and mark stale-heartbeat runs as `failed`. |
623
+
624
+ ### state.json — durable per-run state
625
+
626
+ Every async run writes a `state.json` file alongside the transcript:
627
+
628
+ ```
629
+ projects/{runId}/nodes/{nodeId}/
630
+ ├── 000000.jsonl # transcript shards
631
+ ├── meta.json # transcript metadata
632
+ ├── snapshot.json # pause snapshot (if paused)
633
+ └── state.json # async run state + full response
634
+ ```
635
+
636
+ Shape:
637
+
638
+ ```ts
639
+ {
640
+ version: 1,
641
+ runId: 'run_abc',
642
+ nodeId: 'node_1',
643
+ status: 'queued' | 'running' | 'paused' | 'done' | 'failed' | 'cancelled' | 'not_found',
644
+ startedAt: 1700000000000,
645
+ lastHeartbeat: 1700000012345,
646
+ progress: {
647
+ turns: number, // advances as the agent loop runs
648
+ tokensUsed: { input, output }, // cumulative across turns
649
+ currentActivity: // what the loop is doing RIGHT NOW
650
+ 'idle' | 'streaming' | 'tool_dispatch' | 'compacting',
651
+ lastTool?: string, // set when currentActivity === 'tool_dispatch'
652
+ },
653
+ response: EngineResponse | null, // populated on terminal; same shape as sync run()
654
+ webhook?: {
655
+ url, events, secret?, headers?,
656
+ deliveries: [{ id, event, attempt, status, httpCode?, error? }, ...]
657
+ }
658
+ }
659
+ ```
660
+
661
+ `getStatus()` reads this file and returns:
662
+ - the embedded `response` once the run is terminal,
663
+ - a provisional snapshot with real-time `progress` fields while work is in flight,
664
+ - `status: 'not_found'` with `errors[0].code === 'NOT_FOUND'` if no state file exists.
665
+
666
+ **Heartbeat**: the agent loop updates `progress` at each turn boundary (streaming start, tool dispatch, turn end). Writes are throttled to at most one per 500ms AND only when activity changes, so R2 costs stay predictable even on long runs.
667
+
668
+ ### Webhooks
669
+
670
+ Pass a `webhook` object to `start()` / `resumeAsync()` and the engine will POST the final `EngineResponse` to your URL on the configured events.
671
+
672
+ ```ts
673
+ await engine.start({
674
+ runId: 'run_abc',
675
+ nodeId: 'node_1',
676
+ task: 'long task',
677
+ webhook: {
678
+ url: 'https://your-app.com/hooks/la-machina',
679
+ secret: 'shared-hmac-secret', // optional — enables X-LaMachina-Signature
680
+ events: ['paused', 'done', 'failed'], // default: all three
681
+ headers: { 'X-Tenant': 'acme' }, // optional — passed through
682
+ },
683
+ })
684
+ ```
685
+
686
+ **Request headers:**
687
+
688
+ | Header | Value |
689
+ |---|---|
690
+ | `Content-Type` | `application/json` |
691
+ | `X-LaMachina-Event` | `status.paused` \| `status.done` \| `status.failed` |
692
+ | `X-LaMachina-RunId` | Run ID from your `start()` call |
693
+ | `X-LaMachina-Delivery` | Unique UUID per delivery attempt |
694
+ | `X-LaMachina-Timestamp` | Unix ms (used in HMAC input) |
695
+ | `X-LaMachina-Signature` | `sha256=<hex>` — HMAC over `${timestamp}.${body}` (only if `secret` set) |
696
+
697
+ **Retry schedule** (exponential-ish):
698
+
699
+ ```
700
+ attempt 1: immediate
701
+ attempt 2: +10s
702
+ attempt 3: +60s
703
+ attempt 4: +5min
704
+ attempt 5: +30min
705
+ then give up
706
+ ```
707
+
708
+ Retry decisions:
709
+
710
+ | HTTP | Retry? |
711
+ |---|---|
712
+ | 2xx | No (delivered) |
713
+ | 408 Request Timeout | Yes |
714
+ | 429 Rate Limited | Yes |
715
+ | 5xx | Yes |
716
+ | 410 Gone | **No** (permanent — resource removed) |
717
+ | Other 4xx | No (client bug — don't retry) |
718
+ | Network error / timeout | Yes |
719
+
720
+ Every attempt is appended to `state.webhook.deliveries[]` for audit.
721
+
722
+ ### Node.js example — sync HITL and async HITL together
723
+
724
+ ```ts
725
+ import { initEngine, Engine } from 'la-machina-engine'
726
+
727
+ const { config } = initEngine({
728
+ model: { provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY },
729
+ storage: { provider: 'r2', rootPath: 'tenants/acme', workspaceId: 'default', r2: { ... } },
730
+ hooks: {
731
+ gateBeforeTool: (toolName) =>
732
+ toolName === 'Write' ? { allow: false, reason: 'human approval' } : { allow: true },
733
+ },
734
+ })
735
+ const engine = new Engine(config)
736
+
737
+ // Async run with webhook — returns immediately
738
+ const { runId } = await engine.start({
739
+ runId: 'run_' + Date.now(),
740
+ nodeId: 'n1',
741
+ task: 'Refactor the config module.',
742
+ webhook: { url: 'https://app.example.com/hooks', secret: process.env.HOOK_SECRET },
743
+ })
744
+
745
+ // Later: client polls
746
+ const current = await engine.getStatus(runId, 'n1')
747
+ if (current.status === 'paused') {
748
+ // Human approves → resume (with no gate) asynchronously
749
+ await engine.resumeAsync({
750
+ runId,
751
+ nodeId: 'n1',
752
+ snapshot: current.meta.snapshot,
753
+ webhook: { url: 'https://app.example.com/hooks', secret: process.env.HOOK_SECRET },
754
+ })
755
+ const final = await engine.waitFor(runId, { nodeId: 'n1', timeoutMs: 300_000 })
756
+ console.log('final:', final.status, final.data)
757
+ }
758
+
759
+ // Startup: recover any runs that crashed mid-execution
760
+ const orphaned = await engine.recoverOrphanedRuns({ staleThresholdMs: 5 * 60_000 })
761
+ console.log('recovered', orphaned.length, 'orphaned runs')
762
+ ```
763
+
764
+ ### MCP — auth refresh + sampling
765
+
766
+ Two opt-in features for MCP server integrations (Plan 018):
767
+
768
+ **`headersProvider`** — refresh OAuth tokens between requests:
769
+
770
+ ```ts
771
+ mcp: {
772
+ servers: {
773
+ github: {
774
+ type: 'http',
775
+ url: 'https://mcp.github.example.com',
776
+ headers: { 'X-Tenant': 'acme' }, // static
777
+ headersProvider: async () => ({ // dynamic, called per send
778
+ Authorization: `Bearer ${await refreshGithubToken()}`,
779
+ }),
780
+ },
781
+ },
782
+ }
783
+ ```
784
+
785
+ The provider is called before every MCP request; its result merges over the static `headers`. On HTTP 401 the engine invokes the provider a second time and retries the request once. Without this hook, a long run dies the moment its bearer token expires (~1 hour for OAuth).
786
+
787
+ **`allowSampling`** — let an MCP server request LLM completions through the engine:
788
+
789
+ ```ts
790
+ mcp: {
791
+ servers: {
792
+ research_tools: {
793
+ type: 'http',
794
+ url: 'https://mcp.research.example.com',
795
+ allowSampling: true, // OFF by default
796
+ },
797
+ },
798
+ }
799
+
800
+ // Optional — provide a custom handler. Default routes to engine's own ModelAdapter.
801
+ new Engine(config, {
802
+ samplingHandler: async (request, context) => {
803
+ // request.messages, request.maxTokens, request.systemPrompt, ...
804
+ // context.serverName, context.depth, context.runId
805
+ return {
806
+ role: 'assistant',
807
+ model: 'cheap-model-for-mcp',
808
+ content: { type: 'text', text: '...' },
809
+ stopReason: 'endTurn',
810
+ }
811
+ },
812
+ })
813
+ ```
814
+
815
+ When `allowSampling: false` (the default), the engine omits the `sampling` capability from its MCP handshake — servers that try to call `sampling/createMessage` get a "method not supported" error from the SDK directly.
816
+
817
+ When `allowSampling: true`, the engine installs a request handler that routes to either your custom `samplingHandler` or a built-in default that uses the engine's own model. The default handler refuses recursive sampling past `DEFAULT_SAMPLING_MAX_DEPTH = 3` to prevent loops. Token usage from sampling counts against any `tokenBudget` you've set on the parent run.
818
+
819
+ Off-by-default is deliberate: sampling consumes your LLM budget. Opt in per server only when you've vetted the MCP.
820
+
821
+ ### Cloudflare Workers — three building blocks
822
+
823
+ A Worker deployment needs three pieces beyond the standard engine:
824
+
825
+ 1. **Storage: native R2 binding** via `storage.provider: 'r2-binding'` — avoids the `@aws-sdk/client-s3` bundle and its ListObjectsV2 hang on the Workers runtime.
826
+ 2. **Agent loop lifetime: Durable Objects** — the default fire-and-forget executor can't survive a Worker request return, so wrap work in `ctx.waitUntil()` inside a DO, or provide a custom `BackgroundExecutor`.
827
+ 3. **MCP transport: `preferBindingTransport: true`** — makes the engine's MCP client use plain POST JSON-RPC instead of the SDK's Streamable-HTTP SSE client (which hangs on Workers after `initialize`).
828
+
829
+ #### Storage — R2 binding provider
830
+
831
+ ```ts
832
+ import { initEngine, Engine } from 'la-machina-engine'
833
+
834
+ const { config } = initEngine({
835
+ model: { provider: 'anthropic', apiKey: env.ANTHROPIC_API_KEY },
836
+ storage: {
837
+ provider: 'r2-binding',
838
+ rootPath: 'tenants/acme',
839
+ workspaceId: 'default',
840
+ r2Binding: env.STORAGE, // the R2Bucket binding from wrangler.toml
841
+ },
842
+ })
843
+ const engine = new Engine(config)
844
+ ```
845
+
846
+ No S3 credentials, no endpoint URL — the binding handles auth. Works with `wrangler dev --local` (Miniflare emulates R2 in-memory).
847
+
848
+ `wrangler.toml`:
849
+ ```toml
850
+ [[r2_buckets]]
851
+ binding = "STORAGE"
852
+ bucket_name = "la-machina"
853
+ preview_bucket_name = "la-machina-preview"
854
+ ```
855
+
856
+ #### Agent loop lifetime — Durable Objects
857
+
858
+ Each `runId` maps to a DO via `idFromName(runId)`. The DO calls `engine.start()` inside `ctx.waitUntil()`, which keeps the isolate alive past the Worker request's return. Resumes route to the same DO so they pick up the paused snapshot.
859
+
860
+ ```ts
861
+ export class RunDurableObject extends DurableObject<Env> {
862
+ override async fetch(req: Request): Promise<Response> {
863
+ const body = await req.json()
864
+ this.ctx.waitUntil(this.doRun(body)) // keeps DO alive until done
865
+ return new Response(null, { status: 202 })
866
+ }
867
+
868
+ private async doRun(body: StartBody): Promise<void> {
869
+ const engine = buildEngine(this.env, body.rootPath)
870
+ await engine.start({
871
+ runId: body.runId,
872
+ nodeId: body.nodeId,
873
+ task: body.task,
874
+ ...(body.webhook ? { webhook: body.webhook } : {}),
875
+ })
876
+ await engine.waitFor(body.runId, { nodeId: body.nodeId, pollIntervalMs: 500 })
877
+ }
878
+ }
879
+ ```
880
+
881
+ Alternative (advanced): implement `BackgroundExecutor` and pass it via `EngineInternals.backgroundExecutor` if you want `engine.start()` itself to schedule into a DO from the Worker fetch handler. See `examples/cloudflare-worker-ts/src/runDO.ts` for the common-case pattern.
882
+
883
+ #### MCP on Workers — `preferBindingTransport`
884
+
885
+ ```ts
886
+ initEngine({
887
+ // ...
888
+ mcp: {
889
+ servers: {
890
+ flow: {
891
+ type: 'http',
892
+ url: 'https://your-mcp-server.com/mcp',
893
+ preferBindingTransport: true, // ← Workers-safe
894
+ },
895
+ },
896
+ },
897
+ })
898
+ ```
899
+
900
+ When this flag is set, the engine's MCP client uses `BindingHttpTransport` — a stateless POST-only JSON-RPC transport. No long-lived SSE reader, no streaming notifications (not needed for tool calling).
901
+
902
+ On Node, leave the flag off to keep the full Streamable-HTTP feature set.
903
+
904
+ #### Working reference
905
+
906
+ A complete TypeScript example is at `examples/cloudflare-worker-ts/`:
907
+ - `src/env.ts` — builds an Engine with `r2-binding` + `preferBindingTransport`
908
+ - `src/runDO.ts` — `RunDurableObject` with `ctx.waitUntil()`
909
+ - `src/index.ts` — `POST /sync`, `POST /async/start`, `GET /async/status/:runId`, `POST /async/resume/:runId`, `POST /demo/webhook` receiver with HMAC verification
910
+ - `mcp-server/server.mjs` — local HTTP MCP server for the memo-pipeline demo
911
+ - `test-client.sh` — end-to-end curl demo
912
+
913
+ Run:
914
+ ```
915
+ cd examples/cloudflare-worker-ts
916
+ cp .dev.vars.example .dev.vars && $EDITOR .dev.vars
917
+ bunx wrangler dev --local
918
+ ./test-client.sh
919
+ ```
920
+
921
+ Everything else (state.json, webhooks, polling, resume, recovery) works unchanged.
922
+
923
+ ### Sync vs. async — when to use which
924
+
925
+ | Scenario | Use |
926
+ |---|---|
927
+ | Simple task, < 60s | `engine.run()` (sync) |
928
+ | HITL where you can block the caller | `engine.run()` + `engine.resume()` |
929
+ | Long task, client can't block | `engine.start()` + `getStatus` / `waitFor` |
930
+ | HITL in a web app (user closes tab) | `engine.start()` + webhook on `paused` |
931
+ | Cloudflare Workers (any non-trivial run) | `storage.provider: 'r2-binding'` + DO + `preferBindingTransport` |
932
+ | Server crash recovery | `engine.recoverOrphanedRuns()` on startup |
933
+
934
+ ---
935
+
936
+ ## Agent Hierarchy
937
+
938
+ Three execution modes:
939
+
940
+ ### Normal Mode (`engine.run()`)
941
+ - Parent gets all tools, children get all except Agent
942
+ - Depth bounded by `maxSubagentDepth` (default 5)
943
+
944
+ ### Coordinator Mode (`coordinator.enabled: true`)
945
+ - **Code-enforced** tool split — coordinator can only delegate
946
+ - Coordinator: Agent, SendMessage, Tasks, Memory, ToolSearch
947
+ - Workers: Bash, Read, Write, Edit, Glob, Grep
948
+
949
+ ### Orchestrator Mode (`engine.orchestrate()`)
950
+ - Deterministic state machine: Plan → Research → Implement → Verify → Finalize
951
+ - Parallel researchers (`maxParallelResearchers`, default 3)
952
+ - Retry policies per phase with file snapshot rollback
953
+
954
+ ### Features
955
+
956
+ | Feature | Description |
957
+ |---------|-------------|
958
+ | **Fork subagent** | Child inherits parent's full message context (placeholder tool_results for cache sharing) |
959
+ | **Background agents** | `run_in_background: true` — fire-and-forget, results drain at turn boundary |
960
+ | **SendMessage** | Inter-agent message queue, routed by name or agentId |
961
+ | **Parallel batching** | Concurrent-safe tools execute via `Promise.all` with semaphore |
962
+ | **Bash error cascading** | Bash error aborts sibling tools via `AbortController` |
963
+ | **Subagent gate propagation** | Opt-in: parent gate threads into child loops; child denial pauses the parent with `pendingSubagent` set |
964
+
965
+ ### Subagent gate propagation (opt-in)
966
+
967
+ By default, `hooks.gateBeforeTool` applies ONLY to the parent's direct tool dispatch. Subagents spawned via the `Agent` tool run their own inner loops with no gate. That's fine when your risky tool calls live on the parent, but fails when a subagent itself wants to call a risky tool.
968
+
969
+ Set `hooks.propagateGateToSubagents: true` and the parent's gate is threaded into every subagent's loop. When a subagent's gate denies a tool, the child pauses → the engine surfaces a `SubagentPausedError` → the parent's loop catches it and produces its own `paused` result with a new snapshot field:
970
+
971
+ ```ts
972
+ snapshot.pendingSubagent = {
973
+ subagentType: 'researcher', // which agent type ran
974
+ parentToolUseId: 'agent_abc', // the parent's Agent tool_use
975
+ childSnapshot: { … nested RunSnapshot … },
976
+ }
977
+ ```
978
+
979
+ `snapshot.pauseReason` becomes `'subagent_gate_required'` (distinct from the plain `'gate_required'` case so clients can distinguish). The parent's `pendingToolCall` still points at the Agent invocation, so resume semantics re-spawn the subagent.
980
+
981
+ **Scope in v0.1**: one level of nesting. Deeper trees still propagate the gate into every level, but the snapshot only records the immediate child. Resume after a subagent pause re-runs the subagent from scratch — no mid-subagent continuation yet.
982
+
983
+ ---
984
+
985
+ ## Configuration Reference
986
+
987
+ 15 sections, all optional. Full defaults in `src/config/defaults.ts`.
988
+
989
+ ```ts
990
+ initEngine({
991
+ model: {
992
+ provider: 'anthropic', // 'anthropic' | 'openai' | 'google' | 'openai-compatible' | 'proxy'
993
+ modelId: 'claude-opus-4-6',
994
+ apiKey: '', // or ANTHROPIC_API_KEY env
995
+ baseURL: undefined,
996
+ maxTokens: 8192,
997
+ temperature: 1,
998
+ maxRetries: 2,
999
+ },
1000
+ storage: {
1001
+ provider: 'local', // 'local' | 'r2'
1002
+ rootPath: '~/.claude',
1003
+ workspaceId: 'default',
1004
+ r2: { bucket, region, accessKeyId, secretAccessKey, endpoint },
1005
+ },
1006
+ memory: { mode: 'off', scope: 'workspace' },
1007
+ tools: { enabled: ['*'], disabled: [], custom: [] },
1008
+ agents: { builtins: ['general-purpose'], customPath: undefined },
1009
+ skills: { path: undefined, autoload: false },
1010
+ execution: {
1011
+ maxTurns: 50,
1012
+ maxSubagentDepth: 5,
1013
+ turnTimeoutMs: 300_000,
1014
+ runTimeoutMs: 1_800_000,
1015
+ contextLimit: 200_000,
1016
+ maxToolConcurrency: 10,
1017
+ },
1018
+ transcript: { enabled: true, flushPolicy: 'turn-end', idleFlushMs: 2000 },
1019
+ hooks: { preRun: [], postRun: [], preTurn: [], postTurn: [], preToolCall: [], postToolCall: [], gateBeforeTool: undefined, stopHooks: [] },
1020
+ logging: { level: 'warn', sink: 'stderr' },
1021
+ mcp: { servers: {}, connectTimeoutMs: 10_000, callTimeoutMs: 60_000, shutdownTimeoutMs: 5_000 },
1022
+ permissions: { mode: 'open', rules: [] },
1023
+ compaction: { strategy: 'auto', threshold: 0.85, keepLast: 6, summaryMaxTokens: 4096, microcompact: true, microcompactAgeMs: 300_000 },
1024
+ coordinator: { enabled: false, workerTools: ['Bash','Read','Write','Edit','Glob','Grep'], maxConcurrentWorkers: 5 },
1025
+ orchestrator: { enabled: false, retries: { plan, research, implement, verify, review }, maxParallelResearchers: 3, enableReview: false, enableRollback: true, maxPlanSteps: 20, agentMaxTurns: 15 },
1026
+ })
1027
+ ```
1028
+
1029
+ ---
1030
+
1031
+ ## What's Implemented
1032
+
1033
+ All features ported 1:1 from La-Machina's production runtime. Pure JS, Workers-compatible.
1034
+
1035
+ ### Core Agent Loop
1036
+ - [x] Multi-turn agent loop with streaming
1037
+ - [x] Reactive recovery — max_tokens 3-stage (escalate 8K→64K → recovery message 3x → fail)
1038
+ - [x] 413 prompt-too-long recovery (emergency compact → retry)
1039
+ - [x] API retry with exponential backoff (429/500/529 + Retry-After + network errors)
1040
+ - [x] Message normalization (merge same-role, strip empty, alternation enforcement, tool pairing)
1041
+ - [x] Internal block stripping (advisor/signature/tool_reference/caller)
1042
+ - [x] Extended thinking preservation (in transcript, stripped before API)
1043
+ - [x] Tool result truncation (100K char cap)
1044
+ - [x] Duplicate tool_use ID deduction
1045
+ - [x] Empty assistant response handling
1046
+ - [x] Token budget enforcement (graceful stop)
1047
+
1048
+ ### Tool Execution
1049
+ - [x] StreamingToolExecutor — stateful queue, concurrency control, ordered results
1050
+ - [x] Parallel tool batching (`isConcurrencySafe` flag per tool)
1051
+ - [x] Bash error cascading (AbortController aborts sibling tools)
1052
+ - [x] 22 built-in tools
1053
+ - [x] Custom tool registration via `defineTool()`
1054
+ - [x] Device path blocking (/dev/zero, /dev/random, /proc/kcore)
1055
+
1056
+ ### Agent Hierarchy
1057
+ - [x] Subagent spawning with depth tracking (SubagentRegistry)
1058
+ - [x] Fork subagent (context inheritance, placeholder tool_results, recursion guard)
1059
+ - [x] Background agents (fire-and-forget, result drain at turn boundary)
1060
+ - [x] SendMessage (inter-agent message queue)
1061
+ - [x] Coordinator mode (code-enforced tool split)
1062
+ - [x] Orchestrator state machine (plan → research → implement → verify → finalize)
1063
+ - [x] Parallel researchers (configurable count)
1064
+ - [x] Agent registry persistence (toJSON/fromJSON, metadata sidecar)
1065
+
1066
+ ### Infrastructure
1067
+ - [x] Storage adapters (Local + R2) with atomic writes
1068
+ - [x] Transcript system (NDJSON shards, snapshot/restore, corruption recovery)
1069
+ - [x] Smart memory (profile, rules, lessons, episodes)
1070
+ - [x] Skills loader with multi-page lazy loading
1071
+ - [x] MCP client (stdio/SSE/HTTP) with instruction delta tracking
1072
+ - [x] Compaction (microcompact, summarize, drop-middle)
1073
+ - [x] Permissions (open/rules/locked modes)
1074
+ - [x] System prompt (12 sections, git injection, memory/skills injection)
1075
+ - [x] 8-slot hook system (including stop hooks with continuation control)
1076
+ - [x] Enriched PostRunEvent (toolCallCount, transcriptPath)
1077
+ - [x] Cross-runtime UUID (Web Crypto API fallback)
1078
+ - [x] Workers compatibility (zero top-level node: imports)
1079
+
1080
+ ### Testing
1081
+ - [x] 960 tests across 86 files
1082
+ - [x] 10 live workflow tests (W1-W10) against OpenRouter + R2
1083
+ - [x] Coverage: 81% lines, 85% branches, 91% functions
1084
+ - [x] CI pipeline (lint + typecheck + test + coverage gates)
1085
+
1086
+ ---
1087
+
1088
+ ## What's Deferred
1089
+
1090
+ Features intentionally not ported — either Anthropic-only, CLI-specific, or design choices for headless library.
1091
+
1092
+ ### Design Choices (not bugs)
1093
+
1094
+ | Feature | La-Machina | Engine | Why |
1095
+ |---------|-----------|--------|-----|
1096
+ | CLAUDE.md loading | Loads project/user/global files | None | Library — caller provides context via task |
1097
+ | Full resume reconstruction | Transcript replay + orphaned thinking cleanup | Snapshot-based (Phase 10 TODO) | Simpler model, covers 90% of use cases |
1098
+ | Streaming events to caller | Yields per-event for UI | Returns single RunResult | Headless — no UI to stream to |
1099
+ | Plugin system | Dynamic marketplace + MCP bundle | Skills + MCP only | Library doesn't need runtime plugins |
1100
+ | Multi-turn CLI session | Full conversation persistence | CLI-only history (in cli.mjs) | Not applicable to headless runs |
1101
+
1102
+ ### Anthropic-Only (not applicable to multi-provider)
1103
+
1104
+ | Feature | What it does | Why skipped |
1105
+ |---------|-------------|-------------|
1106
+ | System prompt cache_control | Marks sections with `cache_control: { type: 'ephemeral' }` | Ignored by non-Anthropic providers |
1107
+ | Beta headers | Sends prompt-caching, extended-thinking betas | Provider-specific |
1108
+ | Tool schema caching | Caches tool definitions for prompt cache stability | Anthropic optimization |
1109
+ | Task budget (output_config) | Sends token budget to API so model self-paces | Anthropic-only API field |
1110
+ | Fast mode / speed param | `speed: 'fast'` for faster inference | Anthropic-only |
1111
+ | Effort param | `output_config.effort` controls reasoning depth | Anthropic-only |
1112
+ | Advisor model | Secondary reviewer model | GrowthBook-gated, Anthropic-only |
1113
+
1114
+ ### Stubbed in La-Machina Too
1115
+
1116
+ | Feature | Status |
1117
+ |---------|--------|
1118
+ | Reactive compact | `feature('REACTIVE_COMPACT')` = false, file doesn't exist |
1119
+ | Context collapse | `feature('CONTEXT_COLLAPSE')` = false, stub returns `undefined` |
1120
+ | Tool use summaries | SDK/mobile UI only, doesn't affect agent behavior |
1121
+ | Bash classifier | Stub returns false always (ANT-ONLY) |
1122
+
1123
+ ---
1124
+
1125
+ ## Architecture
1126
+
1127
+ ```
1128
+ ┌──────────────────────────┐
1129
+ │ Engine │
1130
+ │ │
1131
+ │ engine.run() │
1132
+ │ │
1133
+ │ 1. storage adapter │◄── createEngineStorage(config.storage)
1134
+ │ 2. API client │◄── createModelAdapter(config.model)
1135
+ │ 3. tool registry │◄── buildToolRegistry(config.tools)
1136
+ │ 4. smart memory │◄── createSmartMemory(config.memory)
1137
+ │ 5. system prompt │◄── buildSystemPrompt(memory + skills + git)
1138
+ │ 6. transcript writer │
1139
+ │ 7. run context │
1140
+ │ │
1141
+ │ agentLoop: │
1142
+ │ ┌──────────────────┐ │
1143
+ │ │ normalizeMessages│ │ ← strip blocks, ensure alternation, tool pairing
1144
+ │ │ streamMessage │──┼──► ModelAdapter ──► Anthropic / OpenRouter / AI SDK
1145
+ │ │ StreamingToolExec│──┼──► ToolExecutor ──► 22 tools (parallel safe batch)
1146
+ │ │ truncateResult │ │
1147
+ │ │ stopHooks │ │ ← can prevent continuation
1148
+ │ │ reactive recovery│ │ ← max_tokens, 413, retry with backoff
1149
+ │ └──────────────────┘ │
1150
+ │ │
1151
+ │ return RunResult │
1152
+ └──────────────────────────┘
1153
+
1154
+
1155
+ { done | paused | failed }
1156
+ ```
1157
+
1158
+ ---
1159
+
1160
+ ## Development
1161
+
1162
+ ```bash
1163
+ npm install
1164
+ npm run build # tsup → dist/ (ESM + CJS + .d.ts)
1165
+ npm test # 1214 tests (~12s with bun)
1166
+ npm run test:watch # watch mode
1167
+ npm run test:coverage # with coverage gates
1168
+ npm run typecheck # TypeScript strict
1169
+ npm run lint # ESLint
1170
+ npm run ci # lint + typecheck + test + coverage
1171
+ ```
1172
+
1173
+ ### Releasing
1174
+
1175
+ Releases are fully automated via `.github/workflows/publish.yml`. The
1176
+ workflow runs on every push to `main` and publishes only when
1177
+ `package.json#version` differs from the version on npm.
1178
+
1179
+ ```bash
1180
+ # 1. Make changes + commit
1181
+ git commit -am "feat: add X"
1182
+
1183
+ # 2. Bump version (auto-tags + commits)
1184
+ npm version patch # 0.3.0 → 0.3.1
1185
+ # or `minor` / `major` / `0.4.0 --no-git-tag-version` for explicit
1186
+
1187
+ # 3. Push
1188
+ git push && git push --tags
1189
+
1190
+ # CI now:
1191
+ # - Builds + typechecks
1192
+ # - npm publish --access public --provenance
1193
+ # - Pushes a v{version} tag
1194
+ # - Creates a GitHub Release with auto-generated notes
1195
+ ```
1196
+
1197
+ Required GitHub repo secret: `NPM_TOKEN` — granular access token with
1198
+ publish permission on `la-machina-engine` and "Bypass 2FA" enabled.
1199
+
1200
+ ### Test Counts
1201
+
1202
+ | Category | Files | Tests |
1203
+ |----------|-------|-------|
1204
+ | Unit | 70+ | ~870 |
1205
+ | Integration | 15+ | ~130 |
1206
+ | E2E | 5 | ~30 |
1207
+ | Coverage additions | 20+ | ~130 |
1208
+ | **Total** | **115+** | **1214** (current `bun test` count; 8 pre-existing Bun timer failures unrelated) |
1209
+
1210
+ ### Live Workflow Tests
1211
+
1212
+ | Test | Description | Turns |
1213
+ |------|-------------|-------|
1214
+ | W1 | Invoice data transformer | 3 |
1215
+ | W2 | Cross-run memory learning | 4 |
1216
+ | W3 | Gated approval (pause/resume) | 5 |
1217
+ | W4 | Permission-locked analyst | 6 |
1218
+ | W5 | Research with skills | 5 |
1219
+ | W6 | MCP integration | 4 |
1220
+ | W7 | Coordinator delegation | 6 |
1221
+ | W8 | Orchestrator (plan→verify) | 8 |
1222
+ | W9 | Client onboarding (5 phases) | 22 |
1223
+ | W10 | SaaS dashboard (long-running) | 9 |
1224
+ | W11 | Async HITL (sync + async + webhook) | 6 |
1225
+ | W12 | Multi-agent + MCP + skills + HITL (parent gates child's publish) | 4 |
1226
+ | W13 | Per-run skill override (inline body + URL + fetch cache) | 4 |
1227
+ | W14 | MCP auth refresh + sampling round-trip (stdio + http) | n/a (integration) |
1228
+
1229
+ ---
1230
+
1231
+ ## License
1232
+
1233
+ MIT
1234
+
1235
+ ## Acknowledgments
1236
+
1237
+ - [La-Machina](https://github.com/zahidhasanaunto/La-Machina) — the reference implementation this engine was ported from
1238
+ - [Anthropic](https://anthropic.com) for Claude and the Messages API
1239
+ - [Vercel AI SDK](https://sdk.vercel.ai) for multi-provider abstraction