workhorse-agent 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- workhorse_agent-0.1.0/.gitignore +7 -0
- workhorse_agent-0.1.0/LICENSE +21 -0
- workhorse_agent-0.1.0/PKG-INFO +525 -0
- workhorse_agent-0.1.0/README.md +501 -0
- workhorse_agent-0.1.0/docs/GUARDRAILS.md +173 -0
- workhorse_agent-0.1.0/pyproject.toml +42 -0
- workhorse_agent-0.1.0/workhorse/__init__.py +3 -0
- workhorse_agent-0.1.0/workhorse/artifacts.py +149 -0
- workhorse_agent-0.1.0/workhorse/graph/__init__.py +0 -0
- workhorse_agent-0.1.0/workhorse/graph/context.py +41 -0
- workhorse_agent-0.1.0/workhorse/graph/loader.py +32 -0
- workhorse_agent-0.1.0/workhorse/graph/nodes.py +93 -0
- workhorse_agent-0.1.0/workhorse/main.py +352 -0
- workhorse_agent-0.1.0/workhorse/runner/__init__.py +0 -0
- workhorse_agent-0.1.0/workhorse/runner/agent.py +941 -0
- workhorse_agent-0.1.0/workhorse/runner/backends.py +397 -0
- workhorse_agent-0.1.0/workhorse/runner/branch.py +74 -0
- workhorse_agent-0.1.0/workhorse/runner/script.py +73 -0
- workhorse_agent-0.1.0/workhorse/templates.py +60 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Gabriel Côté
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,525 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: workhorse-agent
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Fail-soft runner for YAML-defined agent workflows — drives the Claude CLI through a workflow graph unattended for days.
|
|
5
|
+
Project-URL: Homepage, https://github.com/GabrielCpp/vigilant-octo
|
|
6
|
+
Project-URL: Repository, https://github.com/GabrielCpp/vigilant-octo
|
|
7
|
+
Project-URL: Issues, https://github.com/GabrielCpp/vigilant-octo/issues
|
|
8
|
+
Author: Gabriel Côté
|
|
9
|
+
License-Expression: MIT
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Keywords: agent,automation,claude,llm,orchestration,workflow
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
17
|
+
Classifier: Topic :: Software Development :: Build Tools
|
|
18
|
+
Classifier: Topic :: Utilities
|
|
19
|
+
Requires-Python: >=3.12
|
|
20
|
+
Requires-Dist: jinja2>=3.1
|
|
21
|
+
Requires-Dist: pydantic>=2.0
|
|
22
|
+
Requires-Dist: pyyaml>=6.0
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
|
|
25
|
+
# local-worker
|
|
26
|
+
|
|
27
|
+
A Dockerized agent controller that runs YAML-defined workflows using the Claude CLI. Each workflow is a graph of `agent`, `script`, and `branch` nodes. The controller walks the graph, renders Jinja2 prompts, invokes Claude or shell scripts, extracts JSON outputs, and writes run artifacts.
|
|
28
|
+
|
|
29
|
+
## Intent
|
|
30
|
+
|
|
31
|
+
The local-worker exists to run long, multi-step agent workflows **unattended** —
|
|
32
|
+
the design target is a single run that survives for a week without a human
|
|
33
|
+
babysitting it. That goal drives the two defining properties of this tool:
|
|
34
|
+
|
|
35
|
+
- **Resilience is the default, not a mode.** A single flaky node (an empty
|
|
36
|
+
Claude response, a rate limit, a spending cap, an unparseable output) must
|
|
37
|
+
never crash the whole run. The runner retries transient failures, reframes the
|
|
38
|
+
prompt, and finally defaults a node's outputs so the graph advances to its
|
|
39
|
+
`next` rather than aborting. See [docs/GUARDRAILS.md](docs/GUARDRAILS.md) for the full
|
|
40
|
+
recovery ladder and its tuning knobs.
|
|
41
|
+
- **Reproducibility and isolation.** The agent works against its own clones
|
|
42
|
+
inside the container (never a host working tree), all state lives in persistent
|
|
43
|
+
named volumes, and every step is recorded as a run artifact. A run can be
|
|
44
|
+
resumed from its checkpoint after a crash or reboot.
|
|
45
|
+
|
|
46
|
+
It is repository-agnostic: the same image runs any workflow against any repo a
|
|
47
|
+
workflow's `setup.sh` chooses to clone.
|
|
48
|
+
|
|
49
|
+
## Prerequisites
|
|
50
|
+
|
|
51
|
+
- Docker Desktop (or Docker Engine + Compose plugin)
|
|
52
|
+
- A logged-in Claude **subscription** on the host (`~/.claude/.credentials.json`
|
|
53
|
+
present — i.e. you have run `claude` and authenticated). This is the default
|
|
54
|
+
auth path and matches what your interactive Claude CLI uses.
|
|
55
|
+
|
|
56
|
+
No Python, `uv`, or Claude CLI installation is required on the host — everything runs inside the container.
|
|
57
|
+
|
|
58
|
+
## Authentication
|
|
59
|
+
|
|
60
|
+
By default the worker uses your **Claude subscription**. At startup
|
|
61
|
+
`entrypoint.sh` seeds `~/.claude/.credentials.json` from the host (mounted
|
|
62
|
+
read-only) into the persistent `claude-state` volume **once**; the CLI then
|
|
63
|
+
refreshes/rotates the token in-volume across runs and reboots. A minimal
|
|
64
|
+
`~/.claude.json` onboarding stub is written so headless runs don't prompt.
|
|
65
|
+
|
|
66
|
+
Alternatives:
|
|
67
|
+
|
|
68
|
+
- **Long-lived OAuth token** — run `claude setup-token` on the host and export
|
|
69
|
+
`CLAUDE_CODE_OAUTH_TOKEN` before `run.sh` (or put it in a `.env` beside
|
|
70
|
+
`compose.yaml`). This skips the credentials-file seed.
|
|
71
|
+
- **Bedrock** — uncomment the `CLAUDE_CODE_USE_BEDROCK`/`AWS_PROFILE` env and the
|
|
72
|
+
`~/.aws` mount in `compose.yaml`.
|
|
73
|
+
|
|
74
|
+
To re-seed credentials after re-authenticating on the host, clear the
|
|
75
|
+
`claude-state` volume (`docker volume rm local-worker_claude-state`).
|
|
76
|
+
|
|
77
|
+
## Quick start
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
# From this directory
|
|
81
|
+
./run.sh ../workflows/hello-world
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
`run.sh` resolves the workflow path to absolute, validates that `workflow.yaml` exists, and launches the container via `compose.yaml`. Calling it with no arguments prints the available workflows.
|
|
85
|
+
|
|
86
|
+
## Running any workflow
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
./run.sh <path-to-workflow-dir> [docker compose flags]
|
|
90
|
+
|
|
91
|
+
# Examples
|
|
92
|
+
./run.sh ../workflows/story-coder
|
|
93
|
+
./run.sh ../workflows/refactor
|
|
94
|
+
./run.sh ../workflows/delphi-ci
|
|
95
|
+
|
|
96
|
+
# Force a full image rebuild
|
|
97
|
+
./run.sh ../workflows/hello-world --build
|
|
98
|
+
|
|
99
|
+
# Workflows installed into a target repo by install.py
|
|
100
|
+
./run.sh /path/to/repo/.agents/workflows/story-coder
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
The workflow directory must contain a `workflow.yaml` file. Any `prompts/` and `scripts/` subdirectories are mounted alongside it and are accessible from within the container.
|
|
104
|
+
|
|
105
|
+
## Environment variables
|
|
106
|
+
|
|
107
|
+
| Variable | Default | Description |
|
|
108
|
+
|---|---|---|
|
|
109
|
+
| `WORKFLOW_DIR` | _(required, set by `run.sh`)_ | Absolute path to the workflow directory |
|
|
110
|
+
| `CLAUDE_CODE_OAUTH_TOKEN` | _(unset)_ | Optional long-lived OAuth token (`claude setup-token`); skips the credentials-file seed |
|
|
111
|
+
| `AGENT_RUNS_DIR` | `/runs` | Where to write run artifacts (set to the persistent `runs` volume by `compose.yaml`) |
|
|
112
|
+
| `AGENT_CLI` | `claude` | Which agent CLI drives the run: `claude`, `codex`, or `copilot`. Overridden by `--cli`. See [Choosing the agent CLI backend](#choosing-the-agent-cli-backend) |
|
|
113
|
+
| `AGENT_MODEL` | _(unset)_ | Overrides every node's model for the run (a node's own `model:` still wins). Interpreted by the active backend |
|
|
114
|
+
| `CODEX_PROFILE` | _(unset)_ | Run-level default codex config profile (e.g. `openrouter`, `local`). A node that names its own profile wins. Codex only |
|
|
115
|
+
| `AWS_PROFILE` | `default` | AWS profile — only when using the Bedrock alternative |
|
|
116
|
+
|
|
117
|
+
## Choosing the agent CLI backend
|
|
118
|
+
|
|
119
|
+
The controller drives one agent CLI per run, behind a backend facade
|
|
120
|
+
(`workhorse/runner/backends.py`). Selection is **per-run**, not per-node:
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
./run.sh ../workflows/story-coder # claude (default)
|
|
124
|
+
AGENT_CLI=codex ./run.sh ../workflows/story-coder
|
|
125
|
+
AGENT_CLI=copilot ./run.sh ../workflows/story-coder
|
|
126
|
+
# Direct controller invocation also accepts --cli {claude,codex,copilot}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
| Backend | CLI | Default model | In-place compaction |
|
|
130
|
+
|---|---|---|---|
|
|
131
|
+
| `claude` | `claude -p` (stream-json) | `sonnet` | yes (`/compact`) |
|
|
132
|
+
| `codex` | `codex exec --json` | CLI/profile default | no — ladder reframes on overflow |
|
|
133
|
+
| `copilot` | `copilot -p --output-format json` | CLI default | no — ladder reframes on overflow |
|
|
134
|
+
|
|
135
|
+
### Node model selection
|
|
136
|
+
|
|
137
|
+
A node's optional `model:` field is interpreted by the active backend. When unset,
|
|
138
|
+
the backend's own default applies (so workflows need not hard-code a Claude alias):
|
|
139
|
+
|
|
140
|
+
```yaml
|
|
141
|
+
nodes:
|
|
142
|
+
- id: lead_review
|
|
143
|
+
type: agent
|
|
144
|
+
model: opus # claude: alias; codex: a config profile (see below)
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Codex config profiles (`<profile>@<model-slug>`)
|
|
148
|
+
|
|
149
|
+
For the `codex` backend, `model:` selects a [codex config profile](https://github.com/openai/codex)
|
|
150
|
+
(from `~/.codex/config.toml`) — which bundles provider, auth and a pinned model —
|
|
151
|
+
plus an optional model override, written as `<profile>[@<model-slug>]`. `@` is the
|
|
152
|
+
delimiter because `/` and `:` already appear inside model slugs:
|
|
153
|
+
|
|
154
|
+
| `model:` value | Resulting codex flags |
|
|
155
|
+
|---|---|
|
|
156
|
+
| `local` | `--profile local` (the profile pins the model) |
|
|
157
|
+
| `openrouter@deepseek/deepseek-chat-v3.1` | `--profile openrouter -m deepseek/deepseek-chat-v3.1` |
|
|
158
|
+
| `openrouter@` | `--profile openrouter` |
|
|
159
|
+
| `@gpt-5.5` | `-m gpt-5.5` (no profile; falls back to `CODEX_PROFILE`) |
|
|
160
|
+
| _(unset)_ | `CODEX_PROFILE` if set, else codex's own default |
|
|
161
|
+
|
|
162
|
+
`CODEX_PROFILE` is the run-level default; a node's own `<profile>@…` always wins.
|
|
163
|
+
This lets one workflow tier per node — e.g. a lead node on
|
|
164
|
+
`openrouter@anthropic/claude-sonnet-4.5` and bookkeeping nodes on `local` (a local
|
|
165
|
+
Qwen server) — the same way Claude nodes tier across `opus`/`sonnet`/`haiku`.
|
|
166
|
+
|
|
167
|
+
```yaml
|
|
168
|
+
nodes:
|
|
169
|
+
- id: lead_review
|
|
170
|
+
type: agent
|
|
171
|
+
model: openrouter@anthropic/claude-sonnet-4.5
|
|
172
|
+
- id: record
|
|
173
|
+
type: agent
|
|
174
|
+
model: local # the local profile's pinned model
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
> Profiles live in `~/.codex/config.toml`. Each names a `model_provider`
|
|
178
|
+
> (`base_url` + `env_key`) and a model; codex 0.128+ requires `wire_api = "responses"`.
|
|
179
|
+
|
|
180
|
+
## Mounts and volumes
|
|
181
|
+
|
|
182
|
+
| Source | Target | Type | Purpose |
|
|
183
|
+
|---|---|---|---|
|
|
184
|
+
| `~/.claude/.credentials.json` | `/mnt/claude-credentials.json` | bind, read-only | Subscription auth — seeded into `claude-state` once at startup |
|
|
185
|
+
| `~/.claude/settings.json` | `/mnt/claude-settings.json` | bind, read-only | Optional host Claude config (commented out by default) |
|
|
186
|
+
| `$WORKFLOW_DIR` | `/workflow` | bind | Workflow definition (yaml, prompts, scripts) |
|
|
187
|
+
| `workspace` volume | `/workspace` | named volume | **Agent working tree** — repo clones, branches, and commits; persists across reboots |
|
|
188
|
+
| `claude-state` volume | `/claude-state` | named volume | Claude sessions + seeded credentials + onboarding stub; persists across reboots |
|
|
189
|
+
| `runs` volume | `/runs` | named volume | Run artifacts; persists across reboots |
|
|
190
|
+
|
|
191
|
+
### Persistence across reboots
|
|
192
|
+
|
|
193
|
+
All three named volumes (`workspace`, `claude-state`, `runs`) persist across
|
|
194
|
+
container restarts and host reboots, so the agent's work is never lost when the
|
|
195
|
+
container stops:
|
|
196
|
+
|
|
197
|
+
- **`workspace`** holds the cloned repo and the agent's committed branch (e.g.
|
|
198
|
+
`hrnet-research/auto`). Even if a push out of the container fails, committed
|
|
199
|
+
work survives here. (A workflow's `setup.sh` typically `reset --hard`s the base
|
|
200
|
+
branch on re-run, so commit work to a side branch — as the workflows do.)
|
|
201
|
+
- **`claude-state`** keeps Claude session history and the refreshed auth token,
|
|
202
|
+
isolated from your host installation. (Note: each node runs with a *clean
|
|
203
|
+
context* — see "Sessions" under Development — so this is not one growing
|
|
204
|
+
cross-node conversation.)
|
|
205
|
+
- **`runs`** keeps all run artifacts.
|
|
206
|
+
|
|
207
|
+
## Resuming and run identity
|
|
208
|
+
|
|
209
|
+
The controller is **auto-resume-in-place** by default. Each `(workflow, run-id)`
|
|
210
|
+
pair maps to one stable run dir (`<workflow>-<run-id>`, run-id defaults to
|
|
211
|
+
`default`). On start the controller looks for a checkpoint there:
|
|
212
|
+
|
|
213
|
+
- **No checkpoint** → start fresh from the `start` node in that dir.
|
|
214
|
+
- **Checkpoint present** → resume from the checkpointed node, restoring the saved
|
|
215
|
+
context. A node that finished but didn't advance the cursor (killed in the gap)
|
|
216
|
+
is fast-forwarded past rather than re-run, so side effects like git commits
|
|
217
|
+
aren't duplicated.
|
|
218
|
+
|
|
219
|
+
This is what lets an unattended run survive a crash or reboot: relaunching the
|
|
220
|
+
same workflow continues where it left off. To start over, delete the run dir (or
|
|
221
|
+
the `runs` volume). To keep independent runs of the same workflow side by side,
|
|
222
|
+
pass distinct run ids.
|
|
223
|
+
|
|
224
|
+
Controller flags (passed to `workhorse`; `--resume-*` are manual overrides
|
|
225
|
+
of the auto behavior above):
|
|
226
|
+
|
|
227
|
+
| Flag | Purpose |
|
|
228
|
+
|---|---|
|
|
229
|
+
| `--run-id <id>` | Name the stable run dir (`<workflow>-<id>`); default `default` |
|
|
230
|
+
| `--resume-run <path-or-name>` | Resume a specific run dir from its checkpoint |
|
|
231
|
+
| `--resume-latest` | Resume the most recent unfinished run under `--runs-dir` |
|
|
232
|
+
| `--params '<json>'` / `--params-file <path>` | Override workflow `vars` on a fresh start |
|
|
233
|
+
|
|
234
|
+
"Survives reboot" therefore covers both the *work products* (commits, sessions,
|
|
235
|
+
artifacts) **and** graph position — an interrupted graph auto-resumes mid-run.
|
|
236
|
+
|
|
237
|
+
## Run artifacts
|
|
238
|
+
|
|
239
|
+
Each workflow execution writes a timestamped directory:
|
|
240
|
+
|
|
241
|
+
```
|
|
242
|
+
runs/
|
|
243
|
+
└── <workflow-name>-<timestamp>-<id>/
|
|
244
|
+
├── run.json # start/end time, terminal state
|
|
245
|
+
├── context.json # final context snapshot
|
|
246
|
+
├── <step-id>/
|
|
247
|
+
│ ├── prompt.md # rendered Jinja2 prompt sent to Claude
|
|
248
|
+
│ ├── output.json # extracted JSON outputs
|
|
249
|
+
│ └── context_after.json # context state after this step
|
|
250
|
+
└── <branch-id>/
|
|
251
|
+
└── branch.json # { path, value, next }
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
`compose.yaml` sets `AGENT_RUNS_DIR=/runs` so artifacts are written to the
|
|
255
|
+
persistent `runs` named volume (they survive reboots and don't pollute the
|
|
256
|
+
host working tree). To pull them out, copy from the volume — e.g. from the
|
|
257
|
+
assembler repo: `make research-artifacts`.
|
|
258
|
+
|
|
259
|
+
## Repository isolation
|
|
260
|
+
|
|
261
|
+
The local-worker is repository-agnostic. **Never add repo-specific bind mounts to `compose.yaml`** — the agent must work against its own checkout of the target repository, not a host working tree.
|
|
262
|
+
|
|
263
|
+
If a workflow needs to operate on source code (read, edit, build, test), include a `setup.sh` script in the workflow directory. The script runs as the first node and clones the required repositories into the container at a known path (e.g. `/workspace/<repo>`). This ensures:
|
|
264
|
+
|
|
265
|
+
- The agent always works from a clean, versioned state
|
|
266
|
+
- No host working tree is mutated by accident
|
|
267
|
+
- The workflow is reproducible on any machine
|
|
268
|
+
|
|
269
|
+
See `workflows/case-dev/scripts/setup.sh` for an example.
|
|
270
|
+
|
|
271
|
+
## Resetting state
|
|
272
|
+
|
|
273
|
+
```bash
|
|
274
|
+
# Wipe Claude session history + seeded credentials (re-seed auth on next run)
|
|
275
|
+
docker volume rm local-worker_claude-state
|
|
276
|
+
|
|
277
|
+
# Wipe all run artifacts in the volume
|
|
278
|
+
docker volume rm local-worker_runs
|
|
279
|
+
|
|
280
|
+
# Wipe the agent's working tree (clones/commits) — only if you want a clean clone
|
|
281
|
+
docker volume rm local-worker_workspace
|
|
282
|
+
|
|
283
|
+
# Wipe everything
|
|
284
|
+
docker compose down -v
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
## Writing a workflow
|
|
288
|
+
|
|
289
|
+
A workflow is a directory with this layout:
|
|
290
|
+
|
|
291
|
+
```
|
|
292
|
+
my-workflow/
|
|
293
|
+
├── workflow.yaml # Graph definition
|
|
294
|
+
├── prompts/ # Jinja2 .md templates
|
|
295
|
+
│ └── step.md
|
|
296
|
+
└── scripts/ # Shell or Python scripts (must output JSON to stdout)
|
|
297
|
+
└── check.sh
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
**`workflow.yaml` schema:**
|
|
301
|
+
|
|
302
|
+
```yaml
|
|
303
|
+
name: my-workflow
|
|
304
|
+
vars:
|
|
305
|
+
my_var: "default value" # Initial context variables
|
|
306
|
+
|
|
307
|
+
start: first_node
|
|
308
|
+
|
|
309
|
+
nodes:
|
|
310
|
+
- id: first_node
|
|
311
|
+
type: agent # agent | script | branch | terminal | fail
|
|
312
|
+
prompt: prompts/step.md
|
|
313
|
+
args:
|
|
314
|
+
key: "{{ my_var }}" # Jinja2 — rendered against context before sending
|
|
315
|
+
outputs:
|
|
316
|
+
- key: result # Extract this key from the agent's JSON response
|
|
317
|
+
default: {status: ok} # Optional: emitted if the node exhausts all retries
|
|
318
|
+
# (see "Unattended resilience" below). Unset → null.
|
|
319
|
+
next: check_result
|
|
320
|
+
|
|
321
|
+
- id: check_result
|
|
322
|
+
type: branch
|
|
323
|
+
path: result.status # Dot-path into context
|
|
324
|
+
cases:
|
|
325
|
+
ok: done
|
|
326
|
+
error: done
|
|
327
|
+
default: done
|
|
328
|
+
|
|
329
|
+
- id: done
|
|
330
|
+
type: terminal
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
**Branch operators** — in addition to `cases` (equality map), you can use `conditions` for numeric comparisons:
|
|
334
|
+
|
|
335
|
+
```yaml
|
|
336
|
+
- id: decide
|
|
337
|
+
type: branch
|
|
338
|
+
path: result.count
|
|
339
|
+
conditions:
|
|
340
|
+
- op: ">="
|
|
341
|
+
value: "10"
|
|
342
|
+
next: bulk_path
|
|
343
|
+
default: single_path
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
Supported operators: `==`, `!=`, `<`, `>`, `<=`, `>=`.
|
|
347
|
+
|
|
348
|
+
**Agent prompts** must output JSON containing the declared output keys:
|
|
349
|
+
|
|
350
|
+
```markdown
|
|
351
|
+
Do the thing.
|
|
352
|
+
|
|
353
|
+
Output JSON only:
|
|
354
|
+
|
|
355
|
+
```json
|
|
356
|
+
{"result": {"status": "ok", "count": 5}}
|
|
357
|
+
```
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
**Scripts** receive Jinja2-rendered args as positional arguments and must print JSON to stdout:
|
|
361
|
+
|
|
362
|
+
```bash
|
|
363
|
+
#!/bin/bash
|
|
364
|
+
echo "{\"result\": {\"status\": \"ok\"}}"
|
|
365
|
+
```
|
|
366
|
+
|
|
367
|
+
### Unattended resilience (output `default`)
|
|
368
|
+
|
|
369
|
+
Because runs are meant to survive a week without supervision, the controller
|
|
370
|
+
will, as a last resort, **default an agent node's outputs and advance to `next`**
|
|
371
|
+
rather than crash when Claude can't be coaxed into a usable answer (after
|
|
372
|
+
transient retries and prompt reframing — see [docs/GUARDRAILS.md](docs/GUARDRAILS.md)).
|
|
373
|
+
|
|
374
|
+
The runner is generic and doesn't know what your outputs mean, so **you** declare
|
|
375
|
+
the safe fallback per output via `default`:
|
|
376
|
+
|
|
377
|
+
```yaml
|
|
378
|
+
outputs:
|
|
379
|
+
- key: decision
|
|
380
|
+
default: continue # branch-safe value if this node never answers
|
|
381
|
+
- key: review
|
|
382
|
+
default: {status: auto_approved}
|
|
383
|
+
- key: notes # no default → emitted as null
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
Choose defaults that keep the graph moving sensibly (e.g. a branch `path` that
|
|
387
|
+
lands on a safe route). An output with no `default` is emitted as `null`. To
|
|
388
|
+
disable defaulting entirely and hard-fail instead, set
|
|
389
|
+
`AGENT_USE_DEFAULT_OUTPUTS=false`.
|
|
390
|
+
|
|
391
|
+
## Development
|
|
392
|
+
|
|
393
|
+
This section is for working on the **controller itself** (the Python that runs
|
|
394
|
+
workflows), not on individual workflows.
|
|
395
|
+
|
|
396
|
+
### Project layout
|
|
397
|
+
|
|
398
|
+
```
|
|
399
|
+
local-worker/
|
|
400
|
+
├── workhorse/ # The workhorse Python package (entrypoint: workhorse:main)
|
|
401
|
+
│ ├── main.py # CLI + the graph walk loop: checkpoint → run node → advance
|
|
402
|
+
│ ├── templates.py # Jinja2 rendering (resilient: missing vars render empty, not raise)
|
|
403
|
+
│ ├── artifacts.py # ArtifactWriter: run dir, checkpoints, per-step artifacts
|
|
404
|
+
│ ├── graph/
|
|
405
|
+
│ │ ├── nodes.py # Pydantic node models (AgentNode/ScriptNode/BranchNode/TerminalNode) + Graph
|
|
406
|
+
│ │ ├── loader.py # Parse + validate workflow.yaml into a Graph
|
|
407
|
+
│ │ └── context.py # WorkflowContext: the key→value bag + dot-path lookup for branches
|
|
408
|
+
│ └── runner/
|
|
409
|
+
│ ├── agent.py # Invoke Claude CLI; the retry → reframe → default resilience ladder
|
|
410
|
+
│ ├── script.py # Run a ScriptNode, capture JSON stdout
|
|
411
|
+
│ └── branch.py # Evaluate a BranchNode (cases / numeric conditions / default)
|
|
412
|
+
├── tests/ # Standalone test files (see below)
|
|
413
|
+
├── compose.yaml # Service, env, mounts, named volumes
|
|
414
|
+
├── Dockerfile # Ubuntu + uv + Claude CLI + the controller package
|
|
415
|
+
├── entrypoint.sh # Auth seeding, perms, exec `workhorse`
|
|
416
|
+
├── run.sh # Host launcher: resolve workflow dir, `docker compose up`
|
|
417
|
+
├── pyproject.toml / uv.lock # Python deps (jinja2, pyyaml, pydantic); managed with uv
|
|
418
|
+
├── README.md # This file (usage + development)
|
|
419
|
+
├── CLAUDE.md # Agent entry point; imports README.md + docs/
|
|
420
|
+
└── docs/
|
|
421
|
+
└── GUARDRAILS.md # The resilience/error-recovery design and env-var reference
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
### How the controller works (the loop)
|
|
425
|
+
|
|
426
|
+
`main.run()` is a single loop over graph nodes. For each node it:
|
|
427
|
+
|
|
428
|
+
1. **Checkpoints** the current node id + context (`ArtifactWriter.write_checkpoint`) so a crash here is resumable.
|
|
429
|
+
2. **Dispatches** by node type to a runner: `runner/agent.py`, `runner/script.py`, or `runner/branch.py`.
|
|
430
|
+
3. **Merges** the node's outputs into the `WorkflowContext`.
|
|
431
|
+
4. **Writes** a per-step artifact and advances `current_id` to `node.next` (or the branch target).
|
|
432
|
+
|
|
433
|
+
A `terminal`/`fail` node ends the loop. The resilience for `agent` nodes lives
|
|
434
|
+
entirely in `runner/agent.py::run_agent` — see [docs/GUARDRAILS.md](docs/GUARDRAILS.md).
|
|
435
|
+
|
|
436
|
+
### Sessions (per-node clean context)
|
|
437
|
+
|
|
438
|
+
**Each node runs as a fresh prompt with a clean Claude context.** The controller
|
|
439
|
+
does *not* chain one node's conversation into the next — node N does not inherit
|
|
440
|
+
node N‑1's messages. Concretely, `run_agent` drops any persisted `.session_id`
|
|
441
|
+
before a node's first attempt, and a reframed attempt also starts fresh.
|
|
442
|
+
|
|
443
|
+
The persisted session is `--resume`d in exactly one situation: **continuing the
|
|
444
|
+
same node that was interrupted.** When the controller resumes from a checkpoint
|
|
445
|
+
and re-enters a node that was killed mid-run (not fast-forwarded), it calls
|
|
446
|
+
`run_agent(..., resume_session=True)` for that one node so Claude picks up where
|
|
447
|
+
it left off; every node the run then advances to starts clean again.
|
|
448
|
+
|
|
449
|
+
**Context overflow → compact & continue.** If a node exhausts the model's
|
|
450
|
+
context window mid-run (the headless CLI returns instead of auto-compacting),
|
|
451
|
+
`run_agent` runs `/compact` on that node's session and retries the *same* prompt
|
|
452
|
+
on it, preserving the node's progress (bounded by `AGENT_MAX_COMPACT_ATTEMPTS`;
|
|
453
|
+
falls back to a fresh-session reframe if `/compact` can't help). Verified against
|
|
454
|
+
Claude Code 2.1.x. See the recovery ladder in [docs/GUARDRAILS.md](docs/GUARDRAILS.md).
|
|
455
|
+
|
|
456
|
+
> Not yet implemented: a configurable *per-node turn limit* (`--max-turns`) that
|
|
457
|
+
> proactively compacts before the window is exhausted. Today compaction is
|
|
458
|
+
> reactive — triggered when an overflow is detected.
|
|
459
|
+
|
|
460
|
+
### Running tests
|
|
461
|
+
|
|
462
|
+
Tests live in `tests/` and are **dependency-free**: each file runs standalone
|
|
463
|
+
(`python tests/test_x.py` prints PASS/FAIL and exits non-zero on failure) and is
|
|
464
|
+
also pytest-compatible. There is no pytest in the venv by default; run them with
|
|
465
|
+
the project's Python:
|
|
466
|
+
|
|
467
|
+
```bash
|
|
468
|
+
# One file
|
|
469
|
+
.venv/bin/python tests/test_agent_recovery.py
|
|
470
|
+
|
|
471
|
+
# All of them
|
|
472
|
+
for t in tests/test_*.py; do .venv/bin/python "$t"; done
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
If a `.venv` isn't present, create one with `uv sync` (or `uv run python tests/...`).
|
|
476
|
+
|
|
477
|
+
**Where to put tests.** Add a `tests/test_<area>.py`, mirroring the existing
|
|
478
|
+
style: a `if __name__ == "__main__"` runner that iterates `test_*` functions, and
|
|
479
|
+
unit tests that patch the CLI boundary (`_run_claude_cli` / `_invoke_claude`) and
|
|
480
|
+
sleeping so nothing hits the network or waits in real time. Group by concern:
|
|
481
|
+
`test_agent_cap.py` (cap/transient handling), `test_agent_recovery.py` (reframe →
|
|
482
|
+
default ladder), `test_branch_guardrail.py`, `test_resume_auto.py`,
|
|
483
|
+
`test_idempotency.py`, `test_templates_resilient.py`.
|
|
484
|
+
|
|
485
|
+
### Where docs go
|
|
486
|
+
|
|
487
|
+
- **Tool/usage + development docs** → this `README.md` (root).
|
|
488
|
+
- **Design notes** (resilience/error recovery, and any future deep-dives) →
|
|
489
|
+
`docs/`, e.g. `docs/GUARDRAILS.md`. Put new long-form design docs here rather
|
|
490
|
+
than at the root.
|
|
491
|
+
- **`CLAUDE.md`** (root) is the agent entry point and stays at the root so Claude
|
|
492
|
+
Code auto-loads it; it `@`-imports `README.md` and `docs/GUARDRAILS.md`.
|
|
493
|
+
- **Per-workflow docs** → inside that workflow's own directory (under
|
|
494
|
+
`../workflows/<name>/`), not here. The controller is workflow-agnostic; keep
|
|
495
|
+
workflow-specific knowledge with the workflow.
|
|
496
|
+
|
|
497
|
+
Keep these docs current when you change behavior — they are the contract for
|
|
498
|
+
operators running week-long jobs, and `CLAUDE.md` imports them, so updating them
|
|
499
|
+
keeps agent context accurate too.
|
|
500
|
+
|
|
501
|
+
### Conventions
|
|
502
|
+
|
|
503
|
+
- **Python 3.12**, `from __future__ import annotations` at the top of each module.
|
|
504
|
+
- **Pydantic** models for anything parsed from YAML (see `graph/nodes.py`); add a
|
|
505
|
+
new node type by extending the discriminated `Node` union and handling it in
|
|
506
|
+
`main.run()` plus a `runner/`.
|
|
507
|
+
- **Fail soft for unattended runs.** New failure paths in agent handling should
|
|
508
|
+
slot into the existing retry → reframe → default ladder rather than raising, so
|
|
509
|
+
one bad node can't end a week-long run. Reserve hard raises for genuinely
|
|
510
|
+
unrecoverable, deterministic errors.
|
|
511
|
+
- **Comments explain *why*.** Match the existing density — the tricky invariants
|
|
512
|
+
(checkpoint/fast-forward idempotency, cap-vs-transient classification) are
|
|
513
|
+
documented inline; keep them that way.
|
|
514
|
+
|
|
515
|
+
### Editing the container
|
|
516
|
+
|
|
517
|
+
The image bundles the Claude CLI and the controller package. After changing
|
|
518
|
+
`Dockerfile`, `pyproject.toml`, or anything that affects the image, rebuild:
|
|
519
|
+
|
|
520
|
+
```bash
|
|
521
|
+
./run.sh ../workflows/hello-world --build
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
Pure controller `.py` edits are picked up on the next run only after a rebuild
|
|
525
|
+
too, since `workhorse/` is `COPY`d into the image (it is not bind-mounted).
|