workhorse-agent 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ j.json
2
+ **/.DS_Store
3
+ **/.venv
4
+ **/venv
5
+ **/.env
6
+ tmp
7
+ __pycache__
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Gabriel Côté
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,525 @@
1
+ Metadata-Version: 2.4
2
+ Name: workhorse-agent
3
+ Version: 0.1.0
4
+ Summary: Fail-soft runner for YAML-defined agent workflows — drives the Claude CLI through a workflow graph unattended for days.
5
+ Project-URL: Homepage, https://github.com/GabrielCpp/vigilant-octo
6
+ Project-URL: Repository, https://github.com/GabrielCpp/vigilant-octo
7
+ Project-URL: Issues, https://github.com/GabrielCpp/vigilant-octo/issues
8
+ Author: Gabriel Côté
9
+ License-Expression: MIT
10
+ License-File: LICENSE
11
+ Keywords: agent,automation,claude,llm,orchestration,workflow
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Programming Language :: Python :: 3.13
17
+ Classifier: Topic :: Software Development :: Build Tools
18
+ Classifier: Topic :: Utilities
19
+ Requires-Python: >=3.12
20
+ Requires-Dist: jinja2>=3.1
21
+ Requires-Dist: pydantic>=2.0
22
+ Requires-Dist: pyyaml>=6.0
23
+ Description-Content-Type: text/markdown
24
+
25
+ # local-worker
26
+
27
+ A Dockerized agent controller that runs YAML-defined workflows using the Claude CLI. Each workflow is a graph of `agent`, `script`, and `branch` nodes. The controller walks the graph, renders Jinja2 prompts, invokes Claude or shell scripts, extracts JSON outputs, and writes run artifacts.
28
+
29
+ ## Intent
30
+
31
+ The local-worker exists to run long, multi-step agent workflows **unattended** —
32
+ the design target is a single run that survives for a week without a human
33
+ babysitting it. That goal drives the two defining properties of this tool:
34
+
35
+ - **Resilience is the default, not a mode.** A single flaky node (an empty
36
+ Claude response, a rate limit, a spending cap, an unparseable output) must
37
+ never crash the whole run. The runner retries transient failures, reframes the
38
+ prompt, and finally defaults a node's outputs so the graph advances to its
39
+ `next` rather than aborting. See [docs/GUARDRAILS.md](docs/GUARDRAILS.md) for the full
40
+ recovery ladder and its tuning knobs.
41
+ - **Reproducibility and isolation.** The agent works against its own clones
42
+ inside the container (never a host working tree), all state lives in persistent
43
+ named volumes, and every step is recorded as a run artifact. A run can be
44
+ resumed from its checkpoint after a crash or reboot.
45
+
46
+ It is repository-agnostic: the same image runs any workflow against any repo a
47
+ workflow's `setup.sh` chooses to clone.
48
+
49
+ ## Prerequisites
50
+
51
+ - Docker Desktop (or Docker Engine + Compose plugin)
52
+ - A logged-in Claude **subscription** on the host (`~/.claude/.credentials.json`
53
+ present — i.e. you have run `claude` and authenticated). This is the default
54
+ auth path and matches what your interactive Claude CLI uses.
55
+
56
+ No Python, `uv`, or Claude CLI installation is required on the host — everything runs inside the container.
57
+
58
+ ## Authentication
59
+
60
+ By default the worker uses your **Claude subscription**. At startup
61
+ `entrypoint.sh` seeds `~/.claude/.credentials.json` from the host (mounted
62
+ read-only) into the persistent `claude-state` volume **once**; the CLI then
63
+ refreshes/rotates the token in-volume across runs and reboots. A minimal
64
+ `~/.claude.json` onboarding stub is written so headless runs don't prompt.
65
+
66
+ Alternatives:
67
+
68
+ - **Long-lived OAuth token** — run `claude setup-token` on the host and export
69
+ `CLAUDE_CODE_OAUTH_TOKEN` before `run.sh` (or put it in a `.env` beside
70
+ `compose.yaml`). This skips the credentials-file seed.
71
+ - **Bedrock** — uncomment the `CLAUDE_CODE_USE_BEDROCK`/`AWS_PROFILE` env and the
72
+ `~/.aws` mount in `compose.yaml`.
73
+
74
+ To re-seed credentials after re-authenticating on the host, clear the
75
+ `claude-state` volume (`docker volume rm local-worker_claude-state`).
76
+
77
+ ## Quick start
78
+
79
+ ```bash
80
+ # From this directory
81
+ ./run.sh ../workflows/hello-world
82
+ ```
83
+
84
+ `run.sh` resolves the workflow path to absolute, validates that `workflow.yaml` exists, and launches the container via `compose.yaml`. Calling it with no arguments prints the available workflows.
85
+
86
+ ## Running any workflow
87
+
88
+ ```bash
89
+ ./run.sh <path-to-workflow-dir> [docker compose flags]
90
+
91
+ # Examples
92
+ ./run.sh ../workflows/story-coder
93
+ ./run.sh ../workflows/refactor
94
+ ./run.sh ../workflows/delphi-ci
95
+
96
+ # Force a full image rebuild
97
+ ./run.sh ../workflows/hello-world --build
98
+
99
+ # Workflows installed into a target repo by install.py
100
+ ./run.sh /path/to/repo/.agents/workflows/story-coder
101
+ ```
102
+
103
+ The workflow directory must contain a `workflow.yaml` file. Any `prompts/` and `scripts/` subdirectories are mounted alongside it and are accessible from within the container.
104
+
105
+ ## Environment variables
106
+
107
+ | Variable | Default | Description |
108
+ |---|---|---|
109
+ | `WORKFLOW_DIR` | _(required, set by `run.sh`)_ | Absolute path to the workflow directory |
110
+ | `CLAUDE_CODE_OAUTH_TOKEN` | _(unset)_ | Optional long-lived OAuth token (`claude setup-token`); skips the credentials-file seed |
111
+ | `AGENT_RUNS_DIR` | `/runs` | Where to write run artifacts (set to the persistent `runs` volume by `compose.yaml`) |
112
+ | `AGENT_CLI` | `claude` | Which agent CLI drives the run: `claude`, `codex`, or `copilot`. Overridden by `--cli`. See [Choosing the agent CLI backend](#choosing-the-agent-cli-backend) |
113
+ | `AGENT_MODEL` | _(unset)_ | Overrides every node's model for the run (a node's own `model:` still wins). Interpreted by the active backend |
114
+ | `CODEX_PROFILE` | _(unset)_ | Run-level default codex config profile (e.g. `openrouter`, `local`). A node that names its own profile wins. Codex only |
115
+ | `AWS_PROFILE` | `default` | AWS profile — only when using the Bedrock alternative |
116
+
117
+ ## Choosing the agent CLI backend
118
+
119
+ The controller drives one agent CLI per run, behind a backend facade
120
+ (`workhorse/runner/backends.py`). Selection is **per-run**, not per-node:
121
+
122
+ ```bash
123
+ ./run.sh ../workflows/story-coder # claude (default)
124
+ AGENT_CLI=codex ./run.sh ../workflows/story-coder
125
+ AGENT_CLI=copilot ./run.sh ../workflows/story-coder
126
+ # Direct controller invocation also accepts --cli {claude,codex,copilot}
127
+ ```
128
+
129
+ | Backend | CLI | Default model | In-place compaction |
130
+ |---|---|---|---|
131
+ | `claude` | `claude -p` (stream-json) | `sonnet` | yes (`/compact`) |
132
+ | `codex` | `codex exec --json` | CLI/profile default | no — ladder reframes on overflow |
133
+ | `copilot` | `copilot -p --output-format json` | CLI default | no — ladder reframes on overflow |
134
+
135
+ ### Node model selection
136
+
137
+ A node's optional `model:` field is interpreted by the active backend. When unset,
138
+ the backend's own default applies (so workflows need not hard-code a Claude alias):
139
+
140
+ ```yaml
141
+ nodes:
142
+ - id: lead_review
143
+ type: agent
144
+ model: opus # claude: alias; codex: a config profile (see below)
145
+ ```
146
+
147
+ ### Codex config profiles (`<profile>@<model-slug>`)
148
+
149
+ For the `codex` backend, `model:` selects a [codex config profile](https://github.com/openai/codex)
150
+ (from `~/.codex/config.toml`) — which bundles provider, auth and a pinned model —
151
+ plus an optional model override, written as `<profile>[@<model-slug>]`. `@` is the
152
+ delimiter because `/` and `:` already appear inside model slugs:
153
+
154
+ | `model:` value | Resulting codex flags |
155
+ |---|---|
156
+ | `local` | `--profile local` (the profile pins the model) |
157
+ | `openrouter@deepseek/deepseek-chat-v3.1` | `--profile openrouter -m deepseek/deepseek-chat-v3.1` |
158
+ | `openrouter@` | `--profile openrouter` |
159
+ | `@gpt-5.5` | `-m gpt-5.5` (no profile; falls back to `CODEX_PROFILE`) |
160
+ | _(unset)_ | `CODEX_PROFILE` if set, else codex's own default |
161
+
162
+ `CODEX_PROFILE` is the run-level default; a node's own `<profile>@…` always wins.
163
+ This lets one workflow tier per node — e.g. a lead node on
164
+ `openrouter@anthropic/claude-sonnet-4.5` and bookkeeping nodes on `local` (a local
165
+ Qwen server) — the same way Claude nodes tier across `opus`/`sonnet`/`haiku`.
166
+
167
+ ```yaml
168
+ nodes:
169
+ - id: lead_review
170
+ type: agent
171
+ model: openrouter@anthropic/claude-sonnet-4.5
172
+ - id: record
173
+ type: agent
174
+ model: local # the local profile's pinned model
175
+ ```
176
+
177
+ > Profiles live in `~/.codex/config.toml`. Each names a `model_provider`
178
+ > (`base_url` + `env_key`) and a model; codex 0.128+ requires `wire_api = "responses"`.
179
+
180
+ ## Mounts and volumes
181
+
182
+ | Source | Target | Type | Purpose |
183
+ |---|---|---|---|
184
+ | `~/.claude/.credentials.json` | `/mnt/claude-credentials.json` | bind, read-only | Subscription auth — seeded into `claude-state` once at startup |
185
+ | `~/.claude/settings.json` | `/mnt/claude-settings.json` | bind, read-only | Optional host Claude config (commented out by default) |
186
+ | `$WORKFLOW_DIR` | `/workflow` | bind | Workflow definition (yaml, prompts, scripts) |
187
+ | `workspace` volume | `/workspace` | named volume | **Agent working tree** — repo clones, branches, and commits; persists across reboots |
188
+ | `claude-state` volume | `/claude-state` | named volume | Claude sessions + seeded credentials + onboarding stub; persists across reboots |
189
+ | `runs` volume | `/runs` | named volume | Run artifacts; persists across reboots |
190
+
191
+ ### Persistence across reboots
192
+
193
+ All three named volumes (`workspace`, `claude-state`, `runs`) persist across
194
+ container restarts and host reboots, so the agent's work is never lost when the
195
+ container stops:
196
+
197
+ - **`workspace`** holds the cloned repo and the agent's committed branch (e.g.
198
+ `hrnet-research/auto`). Even if a push out of the container fails, committed
199
+ work survives here. (A workflow's `setup.sh` typically `reset --hard`s the base
200
+ branch on re-run, so commit work to a side branch — as the workflows do.)
201
+ - **`claude-state`** keeps Claude session history and the refreshed auth token,
202
+ isolated from your host installation. (Note: each node runs with a *clean
203
+ context* — see "Sessions" under Development — so this is not one growing
204
+ cross-node conversation.)
205
+ - **`runs`** keeps all run artifacts.
206
+
207
+ ## Resuming and run identity
208
+
209
+ The controller is **auto-resume-in-place** by default. Each `(workflow, run-id)`
210
+ pair maps to one stable run dir (`<workflow>-<run-id>`, run-id defaults to
211
+ `default`). On start the controller looks for a checkpoint there:
212
+
213
+ - **No checkpoint** → start fresh from the `start` node in that dir.
214
+ - **Checkpoint present** → resume from the checkpointed node, restoring the saved
215
+ context. A node that finished but didn't advance the cursor (killed in the gap)
216
+ is fast-forwarded past rather than re-run, so side effects like git commits
217
+ aren't duplicated.
218
+
219
+ This is what lets an unattended run survive a crash or reboot: relaunching the
220
+ same workflow continues where it left off. To start over, delete the run dir (or
221
+ the `runs` volume). To keep independent runs of the same workflow side by side,
222
+ pass distinct run ids.
223
+
224
+ Controller flags (passed to `workhorse`; `--resume-*` are manual overrides
225
+ of the auto behavior above):
226
+
227
+ | Flag | Purpose |
228
+ |---|---|
229
+ | `--run-id <id>` | Name the stable run dir (`<workflow>-<id>`); default `default` |
230
+ | `--resume-run <path-or-name>` | Resume a specific run dir from its checkpoint |
231
+ | `--resume-latest` | Resume the most recent unfinished run under `--runs-dir` |
232
+ | `--params '<json>'` / `--params-file <path>` | Override workflow `vars` on a fresh start |
233
+
234
+ "Survives reboot" therefore covers both the *work products* (commits, sessions,
235
+ artifacts) **and** graph position — an interrupted graph auto-resumes mid-run.
236
+
237
+ ## Run artifacts
238
+
239
+ Each workflow execution writes a timestamped directory:
240
+
241
+ ```
242
+ runs/
243
+ └── <workflow-name>-<timestamp>-<id>/
244
+ ├── run.json # start/end time, terminal state
245
+ ├── context.json # final context snapshot
246
+ ├── <step-id>/
247
+ │ ├── prompt.md # rendered Jinja2 prompt sent to Claude
248
+ │ ├── output.json # extracted JSON outputs
249
+ │ └── context_after.json # context state after this step
250
+ └── <branch-id>/
251
+ └── branch.json # { path, value, next }
252
+ ```
253
+
254
+ `compose.yaml` sets `AGENT_RUNS_DIR=/runs` so artifacts are written to the
255
+ persistent `runs` named volume (they survive reboots and don't pollute the
256
+ host working tree). To pull them out, copy from the volume — e.g. from the
257
+ assembler repo: `make research-artifacts`.
258
+
259
+ ## Repository isolation
260
+
261
+ The local-worker is repository-agnostic. **Never add repo-specific bind mounts to `compose.yaml`** — the agent must work against its own checkout of the target repository, not a host working tree.
262
+
263
+ If a workflow needs to operate on source code (read, edit, build, test), include a `setup.sh` script in the workflow directory. The script runs as the first node and clones the required repositories into the container at a known path (e.g. `/workspace/<repo>`). This ensures:
264
+
265
+ - The agent always works from a clean, versioned state
266
+ - No host working tree is mutated by accident
267
+ - The workflow is reproducible on any machine
268
+
269
+ See `workflows/case-dev/scripts/setup.sh` for an example.
270
+
271
+ ## Resetting state
272
+
273
+ ```bash
274
+ # Wipe Claude session history + seeded credentials (re-seed auth on next run)
275
+ docker volume rm local-worker_claude-state
276
+
277
+ # Wipe all run artifacts in the volume
278
+ docker volume rm local-worker_runs
279
+
280
+ # Wipe the agent's working tree (clones/commits) — only if you want a clean clone
281
+ docker volume rm local-worker_workspace
282
+
283
+ # Wipe everything
284
+ docker compose down -v
285
+ ```
286
+
287
+ ## Writing a workflow
288
+
289
+ A workflow is a directory with this layout:
290
+
291
+ ```
292
+ my-workflow/
293
+ ├── workflow.yaml # Graph definition
294
+ ├── prompts/ # Jinja2 .md templates
295
+ │ └── step.md
296
+ └── scripts/ # Shell or Python scripts (must output JSON to stdout)
297
+ └── check.sh
298
+ ```
299
+
300
+ **`workflow.yaml` schema:**
301
+
302
+ ```yaml
303
+ name: my-workflow
304
+ vars:
305
+ my_var: "default value" # Initial context variables
306
+
307
+ start: first_node
308
+
309
+ nodes:
310
+ - id: first_node
311
+ type: agent # agent | script | branch | terminal | fail
312
+ prompt: prompts/step.md
313
+ args:
314
+ key: "{{ my_var }}" # Jinja2 — rendered against context before sending
315
+ outputs:
316
+ - key: result # Extract this key from the agent's JSON response
317
+ default: {status: ok} # Optional: emitted if the node exhausts all retries
318
+ # (see "Unattended resilience" below). Unset → null.
319
+ next: check_result
320
+
321
+ - id: check_result
322
+ type: branch
323
+ path: result.status # Dot-path into context
324
+ cases:
325
+ ok: done
326
+ error: done
327
+ default: done
328
+
329
+ - id: done
330
+ type: terminal
331
+ ```
332
+
333
+ **Branch operators** — in addition to `cases` (equality map), you can use `conditions` for numeric comparisons:
334
+
335
+ ```yaml
336
+ - id: decide
337
+ type: branch
338
+ path: result.count
339
+ conditions:
340
+ - op: ">="
341
+ value: "10"
342
+ next: bulk_path
343
+ default: single_path
344
+ ```
345
+
346
+ Supported operators: `==`, `!=`, `<`, `>`, `<=`, `>=`.
347
+
348
+ **Agent prompts** must output JSON containing the declared output keys:
349
+
350
+ ```markdown
351
+ Do the thing.
352
+
353
+ Output JSON only:
354
+
355
+ ```json
356
+ {"result": {"status": "ok", "count": 5}}
357
+ ```
358
+ ```
359
+
360
+ **Scripts** receive Jinja2-rendered args as positional arguments and must print JSON to stdout:
361
+
362
+ ```bash
363
+ #!/bin/bash
364
+ echo "{\"result\": {\"status\": \"ok\"}}"
365
+ ```
366
+
367
+ ### Unattended resilience (output `default`)
368
+
369
+ Because runs are meant to survive a week without supervision, the controller
370
+ will, as a last resort, **default an agent node's outputs and advance to `next`**
371
+ rather than crash when Claude can't be coaxed into a usable answer (after
372
+ transient retries and prompt reframing — see [docs/GUARDRAILS.md](docs/GUARDRAILS.md)).
373
+
374
+ The runner is generic and doesn't know what your outputs mean, so **you** declare
375
+ the safe fallback per output via `default`:
376
+
377
+ ```yaml
378
+ outputs:
379
+ - key: decision
380
+ default: continue # branch-safe value if this node never answers
381
+ - key: review
382
+ default: {status: auto_approved}
383
+ - key: notes # no default → emitted as null
384
+ ```
385
+
386
+ Choose defaults that keep the graph moving sensibly (e.g. a branch `path` that
387
+ lands on a safe route). An output with no `default` is emitted as `null`. To
388
+ disable defaulting entirely and hard-fail instead, set
389
+ `AGENT_USE_DEFAULT_OUTPUTS=false`.
390
+
391
+ ## Development
392
+
393
+ This section is for working on the **controller itself** (the Python that runs
394
+ workflows), not on individual workflows.
395
+
396
+ ### Project layout
397
+
398
+ ```
399
+ local-worker/
400
+ ├── workhorse/ # The workhorse Python package (entrypoint: workhorse:main)
401
+ │ ├── main.py # CLI + the graph walk loop: checkpoint → run node → advance
402
+ │ ├── templates.py # Jinja2 rendering (resilient: missing vars render empty, not raise)
403
+ │ ├── artifacts.py # ArtifactWriter: run dir, checkpoints, per-step artifacts
404
+ │ ├── graph/
405
+ │ │ ├── nodes.py # Pydantic node models (AgentNode/ScriptNode/BranchNode/TerminalNode) + Graph
406
+ │ │ ├── loader.py # Parse + validate workflow.yaml into a Graph
407
+ │ │ └── context.py # WorkflowContext: the key→value bag + dot-path lookup for branches
408
+ │ └── runner/
409
+ │ ├── agent.py # Invoke Claude CLI; the retry → reframe → default resilience ladder
410
+ │ ├── script.py # Run a ScriptNode, capture JSON stdout
411
+ │ └── branch.py # Evaluate a BranchNode (cases / numeric conditions / default)
412
+ ├── tests/ # Standalone test files (see below)
413
+ ├── compose.yaml # Service, env, mounts, named volumes
414
+ ├── Dockerfile # Ubuntu + uv + Claude CLI + the controller package
415
+ ├── entrypoint.sh # Auth seeding, perms, exec `workhorse`
416
+ ├── run.sh # Host launcher: resolve workflow dir, `docker compose up`
417
+ ├── pyproject.toml / uv.lock # Python deps (jinja2, pyyaml, pydantic); managed with uv
418
+ ├── README.md # This file (usage + development)
419
+ ├── CLAUDE.md # Agent entry point; imports README.md + docs/
420
+ └── docs/
421
+ └── GUARDRAILS.md # The resilience/error-recovery design and env-var reference
422
+ ```
423
+
424
+ ### How the controller works (the loop)
425
+
426
+ `main.run()` is a single loop over graph nodes. For each node it:
427
+
428
+ 1. **Checkpoints** the current node id + context (`ArtifactWriter.write_checkpoint`) so a crash here is resumable.
429
+ 2. **Dispatches** by node type to a runner: `runner/agent.py`, `runner/script.py`, or `runner/branch.py`.
430
+ 3. **Merges** the node's outputs into the `WorkflowContext`.
431
+ 4. **Writes** a per-step artifact and advances `current_id` to `node.next` (or the branch target).
432
+
433
+ A `terminal`/`fail` node ends the loop. The resilience for `agent` nodes lives
434
+ entirely in `runner/agent.py::run_agent` — see [docs/GUARDRAILS.md](docs/GUARDRAILS.md).
435
+
436
+ ### Sessions (per-node clean context)
437
+
438
+ **Each node runs as a fresh prompt with a clean Claude context.** The controller
439
+ does *not* chain one node's conversation into the next — node N does not inherit
440
+ node N‑1's messages. Concretely, `run_agent` drops any persisted `.session_id`
441
+ before a node's first attempt, and a reframed attempt also starts fresh.
442
+
443
+ The persisted session is `--resume`d in exactly one situation: **continuing the
444
+ same node that was interrupted.** When the controller resumes from a checkpoint
445
+ and re-enters a node that was killed mid-run (not fast-forwarded), it calls
446
+ `run_agent(..., resume_session=True)` for that one node so Claude picks up where
447
+ it left off; every node the run then advances to starts clean again.
448
+
449
+ **Context overflow → compact & continue.** If a node exhausts the model's
450
+ context window mid-run (the headless CLI returns instead of auto-compacting),
451
+ `run_agent` runs `/compact` on that node's session and retries the *same* prompt
452
+ on it, preserving the node's progress (bounded by `AGENT_MAX_COMPACT_ATTEMPTS`;
453
+ falls back to a fresh-session reframe if `/compact` can't help). Verified against
454
+ Claude Code 2.1.x. See the recovery ladder in [docs/GUARDRAILS.md](docs/GUARDRAILS.md).
455
+
456
+ > Not yet implemented: a configurable *per-node turn limit* (`--max-turns`) that
457
+ > proactively compacts before the window is exhausted. Today compaction is
458
+ > reactive — triggered when an overflow is detected.
459
+
460
+ ### Running tests
461
+
462
+ Tests live in `tests/` and are **dependency-free**: each file runs standalone
463
+ (`python tests/test_x.py` prints PASS/FAIL and exits non-zero on failure) and is
464
+ also pytest-compatible. There is no pytest in the venv by default; run them with
465
+ the project's Python:
466
+
467
+ ```bash
468
+ # One file
469
+ .venv/bin/python tests/test_agent_recovery.py
470
+
471
+ # All of them
472
+ for t in tests/test_*.py; do .venv/bin/python "$t"; done
473
+ ```
474
+
475
+ If a `.venv` isn't present, create one with `uv sync` (or `uv run python tests/...`).
476
+
477
+ **Where to put tests.** Add a `tests/test_<area>.py`, mirroring the existing
478
+ style: a `if __name__ == "__main__"` runner that iterates `test_*` functions, and
479
+ unit tests that patch the CLI boundary (`_run_claude_cli` / `_invoke_claude`) and
480
+ sleeping so nothing hits the network or waits in real time. Group by concern:
481
+ `test_agent_cap.py` (cap/transient handling), `test_agent_recovery.py` (reframe →
482
+ default ladder), `test_branch_guardrail.py`, `test_resume_auto.py`,
483
+ `test_idempotency.py`, `test_templates_resilient.py`.
484
+
485
+ ### Where docs go
486
+
487
+ - **Tool/usage + development docs** → this `README.md` (root).
488
+ - **Design notes** (resilience/error recovery, and any future deep-dives) →
489
+ `docs/`, e.g. `docs/GUARDRAILS.md`. Put new long-form design docs here rather
490
+ than at the root.
491
+ - **`CLAUDE.md`** (root) is the agent entry point and stays at the root so Claude
492
+ Code auto-loads it; it `@`-imports `README.md` and `docs/GUARDRAILS.md`.
493
+ - **Per-workflow docs** → inside that workflow's own directory (under
494
+ `../workflows/<name>/`), not here. The controller is workflow-agnostic; keep
495
+ workflow-specific knowledge with the workflow.
496
+
497
+ Keep these docs current when you change behavior — they are the contract for
498
+ operators running week-long jobs, and `CLAUDE.md` imports them, so updating them
499
+ keeps agent context accurate too.
500
+
501
+ ### Conventions
502
+
503
+ - **Python 3.12**, `from __future__ import annotations` at the top of each module.
504
+ - **Pydantic** models for anything parsed from YAML (see `graph/nodes.py`); add a
505
+ new node type by extending the discriminated `Node` union and handling it in
506
+ `main.run()` plus a `runner/`.
507
+ - **Fail soft for unattended runs.** New failure paths in agent handling should
508
+ slot into the existing retry → reframe → default ladder rather than raising, so
509
+ one bad node can't end a week-long run. Reserve hard raises for genuinely
510
+ unrecoverable, deterministic errors.
511
+ - **Comments explain *why*.** Match the existing density — the tricky invariants
512
+ (checkpoint/fast-forward idempotency, cap-vs-transient classification) are
513
+ documented inline; keep them that way.
514
+
515
+ ### Editing the container
516
+
517
+ The image bundles the Claude CLI and the controller package. After changing
518
+ `Dockerfile`, `pyproject.toml`, or anything that affects the image, rebuild:
519
+
520
+ ```bash
521
+ ./run.sh ../workflows/hello-world --build
522
+ ```
523
+
524
+ Pure controller `.py` edits are picked up on the next run only after a rebuild
525
+ too, since `workhorse/` is `COPY`d into the image (it is not bind-mounted).