agent-conveyor 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (85) hide show
  1. package/README.md +1123 -0
  2. package/dist/cli/main.d.ts +2 -0
  3. package/dist/cli/main.js +19 -0
  4. package/dist/cli/main.js.map +1 -0
  5. package/dist/cli/program-name.d.ts +2 -0
  6. package/dist/cli/program-name.js +12 -0
  7. package/dist/cli/program-name.js.map +1 -0
  8. package/dist/cli/typescript-runtime.d.ts +52 -0
  9. package/dist/cli/typescript-runtime.js +18009 -0
  10. package/dist/cli/typescript-runtime.js.map +1 -0
  11. package/dist/index.d.ts +37 -0
  12. package/dist/index.js +20 -0
  13. package/dist/index.js.map +1 -0
  14. package/dist/runtime/audit.d.ts +96 -0
  15. package/dist/runtime/audit.js +298 -0
  16. package/dist/runtime/audit.js.map +1 -0
  17. package/dist/runtime/classify.d.ts +8 -0
  18. package/dist/runtime/classify.js +128 -0
  19. package/dist/runtime/classify.js.map +1 -0
  20. package/dist/runtime/codex-session.d.ts +103 -0
  21. package/dist/runtime/codex-session.js +408 -0
  22. package/dist/runtime/codex-session.js.map +1 -0
  23. package/dist/runtime/commands.d.ts +92 -0
  24. package/dist/runtime/commands.js +408 -0
  25. package/dist/runtime/commands.js.map +1 -0
  26. package/dist/runtime/dispatch.d.ts +74 -0
  27. package/dist/runtime/dispatch.js +669 -0
  28. package/dist/runtime/dispatch.js.map +1 -0
  29. package/dist/runtime/export.d.ts +22 -0
  30. package/dist/runtime/export.js +77 -0
  31. package/dist/runtime/export.js.map +1 -0
  32. package/dist/runtime/ingest.d.ts +28 -0
  33. package/dist/runtime/ingest.js +177 -0
  34. package/dist/runtime/ingest.js.map +1 -0
  35. package/dist/runtime/loop-evidence.d.ts +87 -0
  36. package/dist/runtime/loop-evidence.js +448 -0
  37. package/dist/runtime/loop-evidence.js.map +1 -0
  38. package/dist/runtime/manager-config.d.ts +20 -0
  39. package/dist/runtime/manager-config.js +34 -0
  40. package/dist/runtime/manager-config.js.map +1 -0
  41. package/dist/runtime/manager-permissions.d.ts +7 -0
  42. package/dist/runtime/manager-permissions.js +85 -0
  43. package/dist/runtime/manager-permissions.js.map +1 -0
  44. package/dist/runtime/notifications.d.ts +89 -0
  45. package/dist/runtime/notifications.js +208 -0
  46. package/dist/runtime/notifications.js.map +1 -0
  47. package/dist/runtime/replay.d.ts +29 -0
  48. package/dist/runtime/replay.js +331 -0
  49. package/dist/runtime/replay.js.map +1 -0
  50. package/dist/runtime/tasks.d.ts +54 -0
  51. package/dist/runtime/tasks.js +195 -0
  52. package/dist/runtime/tasks.js.map +1 -0
  53. package/dist/runtime/tmux.d.ts +61 -0
  54. package/dist/runtime/tmux.js +189 -0
  55. package/dist/runtime/tmux.js.map +1 -0
  56. package/dist/runtime/visual-diff.d.ts +23 -0
  57. package/dist/runtime/visual-diff.js +234 -0
  58. package/dist/runtime/visual-diff.js.map +1 -0
  59. package/dist/state/database.d.ts +21 -0
  60. package/dist/state/database.js +142 -0
  61. package/dist/state/database.js.map +1 -0
  62. package/dist/state/files.d.ts +38 -0
  63. package/dist/state/files.js +73 -0
  64. package/dist/state/files.js.map +1 -0
  65. package/dist/state/schema-v22.d.ts +1 -0
  66. package/dist/state/schema-v22.js +566 -0
  67. package/dist/state/schema-v22.js.map +1 -0
  68. package/dist/state/sqlite-contract.d.ts +4 -0
  69. package/dist/state/sqlite-contract.js +78 -0
  70. package/dist/state/sqlite-contract.js.map +1 -0
  71. package/dist/state/status.d.ts +12 -0
  72. package/dist/state/status.js +40 -0
  73. package/dist/state/status.js.map +1 -0
  74. package/docs/typescript-migration/cli-contract.md +147 -0
  75. package/docs/typescript-migration/dashboard-contract.md +76 -0
  76. package/docs/typescript-migration/package-install-contract.md +98 -0
  77. package/docs/typescript-migration/qa-gate-matrix.md +103 -0
  78. package/docs/typescript-migration/sqlite-state-contract.md +92 -0
  79. package/docs/typescript-migration/t005-runtime-parity.md +47 -0
  80. package/package.json +88 -0
  81. package/scripts/capture-static-html-screenshot.mjs +88 -0
  82. package/skills/codex-review/SKILL.md +116 -0
  83. package/skills/codex-review/scripts/codex-review +344 -0
  84. package/skills/manage-codex-workers/SKILL.md +696 -0
  85. package/skills/manage-codex-workers/agents/openai.yaml +5 -0
package/README.md ADDED
@@ -0,0 +1,1123 @@
1
+ # Codex Terminal Manager
2
+
3
+ A Mac-first prototype for letting one Codex session supervise and gently steer
4
+ another Codex session running in a terminal.
5
+
6
+ The goal is not full autonomy. The goal is lightweight supervision for Codex
7
+ tasks that mostly need progress checks, occasional nudges, test reruns, or
8
+ clean stop/resume handling.
9
+
10
+ ## Motivating Principles
11
+
12
+ - **Project workflows often need nudging.** Some Codex tasks do not need a
13
+ second implementer; they need a manager that keeps watching, asks for status
14
+ at the right time, unblocks predictable terminal prompts, and gently steers
15
+ the worker back toward the user's goal.
16
+ - **Useful acceptance criteria often emerge during the work.** Even when the
17
+ user starts with a plan, implementation reveals new edge cases, missing tests,
18
+ unclear polish requirements, and follow-up decisions. The manager should help
19
+ discover, record, defer, satisfy, and audit these emergent acceptance criteria
20
+ instead of assuming the whole checklist can be known up front.
21
+ - **Supervision should be durable and replayable.** Manager observations,
22
+ nudges, interrupts, handoffs, compaction requests, and final decisions should
23
+ leave enough structured history to understand why the worker was pushed and
24
+ what evidence supported finishing or continuing the task.
25
+
26
+ ## Burden Of Proof
27
+
28
+ Before declaring work complete, try to disprove the change. Identify the
29
+ strongest realistic failure mode, verify it with a command, test, trace,
30
+ screenshot, audit record, diff, or direct inspection, and include that evidence
31
+ in the final handoff. Treat `done`, `tests passed`, worker claims, passing
32
+ happy-path tests, generated summaries, and optimistic UI as claims, not proof.
33
+ Treat unverified assumptions as blockers or explicit follow-ups.
34
+
35
+ See `docs/agent-evidence-playbook.md` for the repo-specific evidence ladder,
36
+ receipt options, and final handoff shape agents should use when closing out
37
+ work.
38
+
39
+ ## Architecture
40
+
41
+ Supervision is built on three primitives: **sessions**, **tasks**, and
42
+ **bindings**.
43
+
44
+ - **Worker session.** A Codex session running inside a named `tmux` session.
45
+ Workers own a rollout JSONL on disk (`~/.codex/sessions/.../rollout-*.jsonl`)
46
+ which Agent Conveyor ingests for state inference.
47
+ - **Manager session.** A Codex session running anywhere — Ghostty, iTerm2,
48
+ Terminal.app, or a web terminal. The manager does not need to run inside
49
+ tmux. Its job is to call `conveyor` commands, read their JSON output, and
50
+ decide whether to nudge, interrupt, finish, or wait.
51
+ - **Task.** A unit of supervised work. A task has a goal and optional
52
+ summary/manager instructions.
53
+ - **Binding.** A row that ties one worker session and one manager session to
54
+ one task. Bindings are explicit and durable.
55
+
56
+ The manager Codex drives the supervision loop by calling
57
+ `conveyor cycle <task>` repeatedly. Each cycle ingests new events from the
58
+ worker's rollout, captures the worker's tmux pane as a shadow signal, and
59
+ returns structured JSON. The manager reads that JSON and decides what to do.
60
+
61
+ `tmux` owns the worker PTY. Ghostty, iTerm2, Terminal.app, or `ttyd` can be
62
+ viewers, but they should not be the source of truth for orchestration.
63
+
64
+ ```text
65
+ manager terminal
66
+ Codex manager session
67
+ |
68
+ | runs conveyor commands (cycle, session-nudge, ...)
69
+ v
70
+ tmux session: codex-worker-a
71
+ pane 1: Codex worker session
72
+ pane 2: optional dev server, tests, or logs
73
+
74
+ .codex-workers/
75
+ workerctl.db <- authoritative SQLite control plane
76
+ worker-a/ <- ignored runtime artifacts
77
+ status.json
78
+ transcript.txt
79
+ events.jsonl
80
+ ```
81
+
82
+ ## Non-Goals
83
+
84
+ - Cross-platform support.
85
+ - Browser-first orchestration.
86
+ - Full terminal emulator automation.
87
+ - Autonomous merging or destructive git actions.
88
+ - Managing many workers at once.
89
+
90
+ ## Install
91
+
92
+ For users, install the published Agent Conveyor package with npm:
93
+
94
+ ```bash
95
+ npm install -g agent-conveyor
96
+ conveyor install-skills
97
+ conveyor doctor
98
+ ```
99
+
100
+ The package name is `agent-conveyor`. The installed CLI exposes both
101
+ `conveyor` and the compatibility command `workerctl`. `conveyor install-skills`
102
+ installs the `manage-codex-workers` and `codex-review` skills into
103
+ `$CODEX_HOME/skills` or `~/.codex/skills`. The `codex-review` install includes
104
+ the guarded review helper used by the QA and PR closeout flows.
105
+
106
+ For contributors working from this checkout, use the local installer instead:
107
+
108
+ ```bash
109
+ scripts/install-local --write
110
+ export PATH="$PWD/bin:$PATH"
111
+ ```
112
+
113
+ To test unreleased packaging changes before publish, install a local npm
114
+ tarball into a temporary prefix:
115
+
116
+ ```bash
117
+ npm run build
118
+ npm pack
119
+ tmp_prefix="$(mktemp -d)"
120
+ npm install -g --prefix "$tmp_prefix" ./agent-conveyor-*.tgz
121
+ PATH="$tmp_prefix/bin:$PATH" conveyor --help
122
+ PATH="$tmp_prefix/bin:$PATH" workerctl --help
123
+ ```
124
+
125
+ `conveyor doctor` reports local dependency health (tmux, codex, etc.).
126
+ `conveyor db-doctor` initializes and checks the SQLite control-plane
127
+ database.
128
+ Before publishing `agent-conveyor` to npm, use
129
+ [`docs/package-release.md`](docs/package-release.md).
130
+
131
+ After install, the intended Codex app entry point is natural language. Open a
132
+ new Codex app session in the target repo and say:
133
+
134
+ ```text
135
+ Use the manage-codex-workers skill.
136
+
137
+ Set up a Codex app Ralph loop for issue CTL.
138
+ Require adversarial proof before another worker iteration.
139
+ ```
140
+
141
+ The installed skill should call the `conveyor` CLI, choose names, create the
142
+ no-tmux binding with `create-disposable-binding`, point the worker at
143
+ `worker-inbox`, and use `loop-status` plus telemetry receipts before reporting
144
+ that the loop is ready.
145
+
146
+ Dispatch is core infrastructure for supervised worker/manager pairs. The
147
+ `pair` workflow starts a detached Dispatch watch process by default so worker
148
+ completion is routed to the bound manager mechanically. For manually bound
149
+ pairs, run Dispatch in a separate shell:
150
+
151
+ ```bash
152
+ conveyor dispatch --watch --dispatcher-id dispatch-local
153
+ ```
154
+
155
+ Use `conveyor qa-plan dispatch-completion` for a bounded verification flow, or
156
+ `conveyor qa-plan ralph-loop` for the repeated PR/CI/merge/context-clear
157
+ dogfood loop.
158
+ Use `conveyor qa-plan adversarial-triggers` to verify natural-language
159
+ manager prompts activate Ralph-loop adversarial gates.
160
+ Use `conveyor qa-plan goalbuddy-conveyor` when a broad request should become
161
+ sequential GoalBuddy child boards with PR/CI/merge receipts.
162
+ For manual QA, launch the dashboard with Dispatch enforcement so the page can
163
+ show live proof:
164
+
165
+ ```bash
166
+ conveyor dashboard --task <task> --ensure-dispatch --dispatcher-id qa-dispatch-dashboard
167
+ ```
168
+
169
+ ## Quickstart
170
+
171
+ The fastest way to start a worker and register it is a single command:
172
+
173
+ ```bash
174
+ # One command: spawn codex in tmux, wait for it to come up, register as worker
175
+ conveyor start-worker --name foo --cwd "$PWD" --task "Refactor auth"
176
+
177
+ # Register a manager. Managers do not need to run inside tmux.
178
+ MGR_PID=$$ # if your current shell is the manager; otherwise find its pid
179
+ conveyor register-manager --name foo-mgr --pid $MGR_PID --cwd "$PWD"
180
+
181
+ # Create a task and bind the pair to it.
182
+ conveyor tasks --create my-task --goal "Refactor auth"
183
+ conveyor bind --task my-task --worker foo --manager foo-mgr
184
+
185
+ # Start Dispatch in another shell so worker completion wakes the manager.
186
+ conveyor dispatch --watch --dispatcher-id dispatch-local
187
+
188
+ # One observation cycle. Returns JSON.
189
+ conveyor cycle my-task
190
+
191
+ # Optionally nudge the worker through its tmux pane.
192
+ conveyor session-nudge foo "What's your current state?"
193
+
194
+ # When the task is complete:
195
+ conveyor finish-task my-task --reason "auth refactor merged" --capture-transcript-before-stop --stop-manager --stop-worker
196
+ conveyor unbind --task my-task
197
+ conveyor deregister foo
198
+ conveyor deregister foo-mgr
199
+ ```
200
+
201
+ For manual registration of a pre-existing Codex session:
202
+
203
+ ```bash
204
+ # Start a Codex worker inside a fresh tmux session.
205
+ tmux new-session -d -s codex-foo
206
+ tmux send-keys -t codex-foo "codex" Enter
207
+ # (Wait a moment for codex to come up.)
208
+ WORKER_PID=$(pgrep -f "codex.*--sandbox" | head -1)
209
+
210
+ # Register the worker. lsof auto-discovers the rollout JSONL from the pid.
211
+ conveyor register-worker --name foo --pid $WORKER_PID \
212
+ --cwd "$PWD" --tmux-session codex-foo
213
+ ```
214
+
215
+ If `lsof` discovery fails (e.g. the codex session was started ephemerally),
216
+ pass the rollout path explicitly with `--codex-session
217
+ ~/.codex/sessions/.../rollout-...-<uuid>.jsonl`.
218
+
219
+ To register a manager session that's already running:
220
+
221
+ ```bash
222
+ # If the codex is already running and you know its pid:
223
+ conveyor register-manager --name my-mgr --pid 28975
224
+
225
+ # register-manager runs `lsof -p <pid>` to find the rollout JSONL.
226
+ # If the codex hasn't written its rollout yet (no input typed),
227
+ # you'll get a hint asking you to type something in the codex prompt and retry.
228
+
229
+ # Or pass --codex-session explicitly to bypass the lsof probe:
230
+ conveyor register-manager --name my-mgr --pid 28975 \
231
+ --codex-session /path/to/rollout.jsonl
232
+ ```
233
+
234
+ Note: `lsof` is the canonical pid→rollout lookup. `find -newermt` is unreliable because
235
+ filesystem mtime resolution and parsing of "X minutes ago" varies — `lsof` reads the open fd directly.
236
+
237
+ For low-risk verification without a real task, `conveyor start-test
238
+ <name>` creates a worker, asks it to update only its ignored
239
+ `status.json`, and leaves the tmux session attached:
240
+
241
+ ```bash
242
+ conveyor start-test live-test --cwd "$PWD" --accept-trust --open
243
+ tmux attach -t codex-live-test
244
+ ```
245
+
246
+ ## Commands
247
+
248
+ ### Sessions and binding
249
+
250
+ - `start-worker --name N [--cwd D] [--task "..."] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N]` —
251
+ Spawn Codex in a fresh tmux session and register it as a worker in one call.
252
+ The fastest way to start a supervised worker. Internally: `tmux new-session`
253
+ + `codex` + poll for rollout + `register-worker`.
254
+ - `start-manager --name N [--cwd D] [--task T] [--task-goal G] [--worker W] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N]` —
255
+ Spawn Codex in a fresh tmux session and register it as a manager in one call.
256
+ Mirrors `start-worker` but uses a manager bootstrap prompt instead of a worker
257
+ task prompt. When `--task`, `--task-goal`, and `--worker` are supplied, the
258
+ bootstrap is ready for late attach: it names the task, goal, worker session,
259
+ and concrete `manager-config`, `cycle`, `manager-ack`, and `worker-ack`
260
+ commands. If manager config has already been recorded for the task, the
261
+ bootstrap tells the manager to start with `cycle` instead of asking setup
262
+ questions again. Without those flags, the bootstrap asks the manager to
263
+ collect the missing supervision details before cycling.
264
+ - `pair --task T --worker-name W --manager-name M [--cwd D] [--task-prompt PROMPT] [--task-goal GOAL] [--task-summary S] [--manager-objective O] [--manager-guideline G ...] [--manager-acceptance A ...] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N] [--dispatcher-id ID] [--no-dispatch]` —
265
+ One-shot: spawn worker + manager and bind to a task in a single command. Combines
266
+ `start-worker` + `start-manager` + `bind`. The task is looked up or created (if
267
+ `--task-goal` is provided); if the task does not exist and no goal is given, an
268
+ error is raised with a hint. The worker receives the optional `--task-prompt` as
269
+ its initial Codex prompt; the manager receives a manager bootstrap prompt with
270
+ the task, goal, worker name, manager configuration status, and `cycle` commands.
271
+ `pair` records a default guided manager config before launching the manager, so
272
+ retries against an existing task do not fall back into setup-question mode.
273
+ If manager config flags are supplied (`--manager-mode`,
274
+ `--manager-objective`, repeated `--manager-guideline`,
275
+ `--manager-acceptance`, `--manager-reference`, or manager permission flags),
276
+ those values are merged into the seeded config and the bootstrap tells the
277
+ manager to start supervising with `cycle` instead of asking setup questions
278
+ first. Manager acceptance entries are also seeded into the living
279
+ acceptance criteria ledger when they do not already exist for the task. By
280
+ default `pair` starts a detached `dispatch --watch` process after successful
281
+ worker/manager setup, bind, and run creation. Use `--dispatcher-id` to set its
282
+ identity or `--no-dispatch` for isolated/manual workflows. A live dispatch
283
+ heartbeat is reused only when it has the same dispatcher id; otherwise `pair`
284
+ starts the requested dispatcher so audit receipts keep the configured
285
+ identity.
286
+ If the manager or bind fails after the worker is spawned, the worker remains
287
+ registered and can be cleaned up with `conveyor deregister`.
288
+ Use `--accept-trust` only for directories you intentionally trust; it retries
289
+ Enter during startup discovery so fresh workspaces do not stall before
290
+ registration.
291
+ - `register-worker --name N [--pid P | --codex-session PATH] [--cwd D] [--tmux-session S]` —
292
+ Register an already-running Codex session as a worker. Rollout JSONL is
293
+ auto-discovered from the pid via `lsof` unless `--codex-session` is given.
294
+ - `register-manager --name N ...` — Same arguments; tmux is not required.
295
+ Both registration commands print a `communication` object. When
296
+ `--tmux-session` is present, `communication.session_kind='tmux'`,
297
+ `receive_style='push'`, and `delivery_mode='push'`; without tmux but with a
298
+ Codex rollout identity, `session_kind='codex_app'`, `receive_style='pull'`,
299
+ and `delivery_mode='pull_required'`, with the role-specific inbox polling
300
+ command template.
301
+ - `deregister <name>` — Mark a session gone. Refuses if the session is bound
302
+ to an active task.
303
+ - `sessions [--role worker|manager] [--state active|gone|all] [--include-legacy]
304
+ [--name N ...] [--redact-identity-token]` — List registered sessions.
305
+ By default, `sessions` shows active registered sessions and hides Phase 1 backfill rows (legacy pre-redesign workers/managers, identified by `pid IS NULL`) plus rows marked `state='gone'`. Pass `--state all` to show every row, or `--state gone` to inspect only gone rows:
306
+ ```bash
307
+ conveyor sessions # active registered sessions only
308
+ conveyor sessions --state active # explicit equivalent of the default
309
+ conveyor sessions --state gone # gone sessions only
310
+ conveyor sessions --state all # active, gone, and legacy rows
311
+ conveyor sessions --name <session> --redact-identity-token
312
+ ```
313
+ For shareable QA evidence, prefer repeating `--name` for just the sessions in
314
+ scope and include `--redact-identity-token`; unfiltered output can include
315
+ unrelated active sessions and their registration tokens.
316
+ Each session row includes the same `communication` block emitted by
317
+ registration, so managers can detect whether a worker or manager is
318
+ tmux-push capable or must poll its mailbox.
319
+ - `tasks [--create NAME --goal G --summary S]` — List or create tasks.
320
+ - `create-disposable-binding TASK [--worker NAME] [--manager NAME] [--template TEMPLATE | --required-before-continue TYPE] [--adversarial]` —
321
+ Create a no-tmux manager/worker binding for real Ralph-loop slices. The
322
+ helper creates the task when missing, marks it managed, writes valid Codex
323
+ rollout JSONL files, registers worker and manager sessions with
324
+ `tmux_session=null`, binds them, optionally creates a template-backed or
325
+ custom Ralph-loop policy run, and prints replay commands for Dispatch,
326
+ `loop-status`, per-session `communication` metadata, plus a `worker_handoff`
327
+ prompt that tells Codex app workers to keep polling their worker inbox
328
+ through the bounded loop.
329
+ - `discover [QUERY] [--all] [--limit N]` / `search [QUERY]` — Search tasks,
330
+ registered sessions, active bindings, and recent telemetry in one JSON result.
331
+ Use this for conversational setup when a manager or Codex session needs to
332
+ present likely worker/manager/task connection options instead of asking the
333
+ user for generated names:
334
+ ```bash
335
+ conveyor discover dashboard
336
+ conveyor search "auth refactor"
337
+ ```
338
+ The output includes `tasks`, `sessions`, `bindings`, `telemetry`, and
339
+ `suggestions`; suggestions may include a ready-to-run `conveyor bind`
340
+ command or next-step prompts to register the missing worker or manager.
341
+ - `handoff <task> --summary S [--next-step N ...] [--payload-json JSON]` —
342
+ Persist a compact worker handoff for the task. Use this when a worker is
343
+ becoming managed or before a long context transition so the manager can read
344
+ progress and likely next steps from SQLite.
345
+ - `manager-config <task> [--mode light|guided|strict] [--objective O]
346
+ [--guideline G ...] [--acceptance A ...] [--reference R ...]
347
+ [--permit CATEGORY.ACTION ...] [--tool TOOL ...]
348
+ [--epilogue STEP ...] [--nudge-on-completion MODE] [--require-acks]
349
+ [--allow-pr] [--allow-merge-green] [--allow-worker-compact-clear]` —
350
+ Persist the manager's supervision contract: what to check against, how
351
+ structured the loop should be, acceptance criteria, source references, and
352
+ categorized permissions. With no recorded config it creates the default
353
+ guided config; with no mutating flags after that it prints the current config.
354
+ Use `--questions` from a manager Codex session to get a stable JSON question
355
+ schema to ask the user in chat, then save the answers with noninteractive
356
+ flags. Use `--interactive` only as a terminal fallback when a human is
357
+ running `conveyor` directly.
358
+ `--permit` grants taxonomy permissions such as `repo.open_pr`,
359
+ `verification.run_pytest`, `context.spawn_reviewer`,
360
+ `communication.notify_operator`, or `worker_session.compact`. Use `--tool`
361
+ to record expected verification/context tools, `--epilogue` for required
362
+ built-in finish steps (`run-tools`, `draft-pr`, `subagent-review`,
363
+ `record-handoff`), `--nudge-on-completion` for continuation review behavior
364
+ (`off`, `ask-operator`, `auto-review`, `auto-proceed`), and `--require-acks`
365
+ when `cycle`/`finish-task` should fail closed until both sides acknowledge.
366
+ Legacy flat flags and `--permissions-json` keys (`create_pr`,
367
+ `merge_green_pr`, `worker_compact_clear`, plus older `allow_*` aliases) are
368
+ still accepted and normalized into the categorized taxonomy.
369
+ - `criteria <task>` — Track emergent acceptance criteria discovered during
370
+ supervision. Managers should add useful proposed criteria, accept must-have
371
+ items, defer follow-ups, and mark criteria satisfied only when worker
372
+ receipts and verification cover them.
373
+ ```bash
374
+ conveyor criteria my-task --list
375
+ conveyor criteria my-task --list --status accepted
376
+ conveyor criteria my-task --add --criterion "..." --source worker_proposed --status proposed
377
+ conveyor criteria my-task --add --criterion "..." --source manager_inferred --status accepted
378
+ conveyor criteria my-task --accept 12 --rationale "Must-have for this task"
379
+ conveyor criteria my-task --satisfy <id> --evidence-json '{"command":"...","status":"pass"}'
380
+ conveyor criteria my-task --defer 13 --rationale "Follow-up after this task"
381
+ conveyor criteria my-task --reject 14 --rationale "Duplicate or out of scope"
382
+ ```
383
+ Replace placeholder `...` values with the actual criterion and verification
384
+ command. Use `worker_proposed` for criteria proposed by the worker. Use
385
+ `manager_inferred` for criteria inferred from manager config, cycle evidence,
386
+ or manager inspection; `manager_config` is not a valid criteria source.
387
+ To add a criterion and satisfy that same row after verification:
388
+ ```bash
389
+ criterion_id=$(conveyor criteria my-task --add --criterion "Targeted prompt tests pass" --source worker_proposed --status proposed | python3 -c 'import json,sys; print(json.load(sys.stdin)["affected_criterion"]["id"])')
390
+ conveyor criteria my-task --satisfy "$criterion_id" --evidence-json '{"command":"python3 -m unittest tests.test_workerctl.ManagerBootstrapPromptTests -v","status":"pass"}'
391
+ ```
392
+ For mutation responses, treat `affected_criterion` as the authoritative
393
+ receipt for the row changed by that command. When a manager applies multiple
394
+ criteria changes, run `criteria <task> --list` before final audit or other
395
+ decisions; the list command is the canonical task-level criteria state.
396
+ - `criteria-plan <task> --from-text ...|--from-worker-response PATH|--from-stdin
397
+ [--json]` — Draft reviewed `criteria --add` commands from a worker response
398
+ that separates must-have current-task criteria from deferred follow-ups. This
399
+ helper is read-only: it resolves the task and prints suggestions, but does not
400
+ mutate acceptance criteria, events, or commands.
401
+ ```bash
402
+ conveyor criteria-plan my-task --from-worker-response response.md --json
403
+ ```
404
+ - `manager-permission <task> <CATEGORY.ACTION|CATEGORY> [--list]
405
+ [--require] [--require-handoff]` — Check and audit whether the saved manager
406
+ config allows a categorized action, or list granted actions in a category.
407
+ Use `--require` when a manager command should fail closed. Use
408
+ `--require-handoff` before worker compact/clear style instructions so visible
409
+ context is persisted first.
410
+ - `worker-ack <task> --from-stdin|--json [--correlation-id ID]` /
411
+ `manager-ack <task> --from-stdin|--json [--correlation-id ID]` — Persist or
412
+ read the latest structured acknowledgement from the worker or manager. Acks
413
+ are revisioned and exposed to `cycle`, `replay`, and `audit` so startup
414
+ contract drift can be distinguished from later drift.
415
+ - `continuation <task> --submit worker|manager --from-stdin
416
+ [--correlation-id ID]` — Record independent worker/manager "what's next"
417
+ proposals for a completion turn. The worker proposal must be written first,
418
+ and manager-side reads are redacted until the manager submits its own
419
+ proposal.
420
+ - `continuation <task> --review --from-stdin [--correlation-id ID]` — Record a
421
+ structured reviewer verdict over the paired continuation proposals. This
422
+ requires `context.spawn_reviewer` permission and reviewer separation metadata
423
+ (`subagent_run.reviewer_session_id` distinct from the manager and
424
+ `manager_rollout_access=false`). Divergent reviews are routed for operator
425
+ attention unless `--nudge-on-completion auto-proceed` is configured.
426
+ - `continuation-reviewer <task> --correlation-id ID --reviewer-session-id ID
427
+ --manager-session-id ID --reviewer-command ...` — Run a reviewer command with
428
+ the allowed read-only context on stdin, capture reviewer metadata, and persist
429
+ the structured review. The context includes paired proposals, acceptance
430
+ criteria, manager config summary, diff metadata, and recent PR metadata; it
431
+ does not include manager rollout context. Reviewer commands run from an
432
+ isolated temporary cwd with a stripped environment and, on macOS, through
433
+ `sandbox-exec`. The sandbox keeps the targeted denial of bound
434
+ worker/manager rollout files plus the active control database and sidecars,
435
+ and also denies direct reads of the active `.codex-workers` state root so
436
+ legacy session files, transcripts, capture metadata, task state, and exports
437
+ are not available through filesystem reads. The allowed reviewer context still
438
+ arrives on stdin, and replay/audit/export commands outside this reviewer
439
+ subprocess are unchanged. Sandbox setup failures, reviewer command failures,
440
+ timeouts, or invalid JSON are recorded as `verdict=stop`, not silent approvals.
441
+ Use `--dry-run` to inspect the exact context without running the command.
442
+ - `continuation <task> --list [--as-role all|worker|manager|reviewer]
443
+ [--include-payload]` — List continuation proposals and reviews with
444
+ role-aware payload redaction.
445
+ - `record-decision <task> <wait|nudge|interrupt|escalate|stop|inspect>
446
+ --reason R [--cycle-id N] [--payload-json JSON]` — Persist a manager
447
+ decision and print its id. Use this before strict mutating commands that
448
+ require `--decision-id`.
449
+ - `compact-worker <task> --reason R [--clear] [--prompt-only]` — Convenience
450
+ wrapper that records a `nudge` manager decision, then sends Codex `/compact`
451
+ to the worker through the same strict audited path as
452
+ `request-worker-compact`. Use `--clear` to send `/clear`.
453
+ - `request-worker-compact <task> --decision-id N --strict-decisions` — Send
454
+ Codex `/compact` to the worker through the audited path. Use `--clear` to
455
+ send `/clear`, or `--prompt-only` to send an explanatory prompt instead.
456
+ Fails closed unless `worker_compact_clear` is enabled in manager config and
457
+ a worker handoff exists. Records a durable command and audit events
458
+ before/after sending the worker instruction. `--dry-run` still records the
459
+ command in `commands`, `replay`, and `mutation-audit` with `dry_run: true`
460
+ and `sent: false`.
461
+ - `bind --task T --worker W --manager M` — Create the task binding.
462
+ - `unbind --task T` — End the active binding for a task.
463
+ - `finish-task <task> [--reason R] [--require-criteria-audit]
464
+ [--require-acks] [--require-epilogue] [--require-adversarial-proof]
465
+ [--stop-manager] [--stop-worker]
466
+ [--capture-transcript-before-stop]` — Mark a task done.
467
+ Leaves the manager terminal open by default for review. With
468
+ `--require-criteria-audit`, fails before finishing if any acceptance criteria
469
+ for the task are still `accepted`; `proposed`, `satisfied`, `deferred`, and
470
+ `rejected` criteria do not block. With `--require-acks`, fails if worker or
471
+ manager acknowledgement is missing. With `--require-epilogue`, fails if any
472
+ configured epilogue step is not succeeded. With `--require-adversarial-proof`,
473
+ fails before finishing unless the task has at least one satisfied criterion
474
+ with `evidence_type=adversarial_check` and non-empty `failure_mode`, `check`,
475
+ and `result` fields; use this when `tests passed` is not enough by itself.
476
+ With `--capture-transcript-before-stop`, captures transcript segments for any
477
+ worker/manager sessions being stopped before killing tmux sessions; capture
478
+ failure fails before stop side effects.
479
+ - `stop-task <task> [--reason R] [--stop-worker]` — Force-stop a task's
480
+ manager (and optionally the worker), recording the reason in the audit
481
+ payload.
482
+ - `stop <session>` — Stop a tmux-backed worker or manager session by name. This
483
+ works for both legacy worker records and session-table workers/managers. For a
484
+ completed task with an active binding, prefer an idempotent cleanup pass with
485
+ `finish-task <task> --stop-manager --stop-worker` so the task audit records
486
+ the cleanup against the binding.
487
+
488
+ ### Observation
489
+
490
+ - `dashboard [--task T] [--ensure-dispatch] [--dispatcher-id ID]
491
+ [--host 127.0.0.1] [--port 8797]` — Launch the
492
+ local live supervision cockpit. The dashboard binds to loopback by default,
493
+ uses the TypeScript backend to shell out to `conveyor` JSON commands, and
494
+ attaches interactive terminals to tmux-backed worker/manager sessions through
495
+ a WebSocket PTY bridge. It includes browser bootstrap controls for creating a
496
+ task, starting a worker/manager pair with `conveyor pair`, auto-attaching the
497
+ terminals, attach/bind controls, and audited action receipts for cycle,
498
+ nudge, interrupt, finish, and export. With `--ensure-dispatch`, launch also
499
+ ensures a Dispatch watch process using the supplied `--dispatcher-id` when
500
+ provided, reusing only a fresh heartbeat from that same dispatcher id. Use
501
+ `--dry-run --json` to inspect the launch command.
502
+ - `cycle <task> [--busy-wait-seconds N]` — One observation cycle. Idempotent. Runs `ingest`, computes
503
+ worker state from the JSON event stream, captures the tmux pane as a shadow
504
+ signal, writes a `manager_cycles` row, and returns a JSON dict the manager
505
+ Codex consumes. The `status_payload` includes:
506
+ - `worker_alive` / `manager_alive` — booleans computed by probing the registered session pids (`os.kill(pid, 0)`). `False` when the session's pid is `NULL` (legacy backfill) or the process has exited — useful for detecting silently-dead workers between cycles.
507
+ - `last_event_subtype` — the subtype of the most recent `codex_events` row for the worker, or `null` if no events exist.
508
+ - `task_completed` — `true` iff `last_event_subtype` is `"task_complete"`. Disambiguates "worker finished cleanly" from "worker idle but never started."
509
+ - `manager_context` — the latest `manager-config`, worker/manager
510
+ acknowledgements, `handoff`, and `acceptance_criteria` records for the
511
+ task, so each manager loop can reference the saved objective, living
512
+ acceptance criteria, categorized permissions, expected tools, acked
513
+ contract, worker progress, and next steps.
514
+ `manager_context.acceptance_criteria`
515
+ groups criteria by status, includes summary counts, and exposes `open` as
516
+ accepted criteria that still need proof before finishing.
517
+ `manager_context.criteria_negotiation` is advisory: when `needed` is true,
518
+ the manager should ask the worker for must-have current-task criteria versus
519
+ follow-up criteria, then record the result with `conveyor criteria`. The
520
+ field does not send nudges or mutate criteria automatically.
521
+
522
+ The `cycle` subcommand accepts `--busy-wait-seconds N` (default: 90) to tune the pane-signal classifier's stuck-busy threshold. Lower values flag stalls faster but increase false positives on long-running real work:
523
+ ```bash
524
+ conveyor cycle my-task # default 90s threshold
525
+ conveyor cycle my-task --busy-wait-seconds 30 # tighter detection
526
+ ```
527
+ - `ingest <session>` — Pull new events from a session's rollout JSONL into
528
+ the `codex_events` table. Tracks a byte offset, so subsequent runs only
529
+ pick up new events.
530
+ - `tail <session> [--limit N] [--subtype T] [--include-content]` — Print the
531
+ most recent events for a session, newest first. Text payload fields are
532
+ redacted by default; use `--include-content` only when stdout is redirected
533
+ or verbatim text is intentionally needed.
534
+ - `divergences <task> [--limit N]` — Cycles whose shadow pane signal flagged
535
+ a notable pattern (trust prompt, rate-limit prompt, approval prompt, etc.).
536
+ Useful for auditing the shadow signal against the JSON state.
537
+ - `dispatch [--once|--watch] [--limit N] [--interval SECONDS]
538
+ [--dispatcher-id ID] [--type notify_manager|nudge_worker|worker_task_complete]
539
+ [--watch-iterations N] [--lease-seconds N] [--dry-run] [--json]` — Run
540
+ Dispatch, the mechanical routing/actuation role.
541
+ `worker_task_complete` routing reads from `codex_events`, records
542
+ deduplicated `routed_notifications` keyed by the source event, and notifies
543
+ the bound manager without deciding task success. Explicit `notify_manager`
544
+ and `nudge_worker` command rows are atomically claimed, executed, and recorded
545
+ through `command_attempts` with conservative side-effect metadata. `--watch`
546
+ repeats polling with heartbeat telemetry; `--watch-iterations` bounds a watch
547
+ run for scripts and verification; `--lease-seconds` tunes command claim
548
+ recovery; `--once` performs one pass.
549
+ - `enqueue-notify-manager <task> --message "..." [--correlation-id C]
550
+ [--required-permission P] [--idempotency-key K] [--json]` — Queue a `notify_manager` command row for
551
+ Dispatch to claim and deliver to the bound manager.
552
+ - `enqueue-nudge-worker <task> --message "..." [--correlation-id C]
553
+ [--required-permission P] [--idempotency-key K] [--json]` — Queue a `nudge_worker` command row for
554
+ Dispatch to claim and deliver to the bound worker. Use this dispatcher-backed
555
+ route instead of `session-nudge` when the worker is registered without tmux;
556
+ the worker then receives the message through `worker-inbox`.
557
+ - `session-inbox <session> [--consume-next] [--wait] [--timeout N]
558
+ [--interval N] [--limit N] [--json]` — List or consume unconsumed routed
559
+ notifications addressed to a registered session. Text output includes the
560
+ pending count, signal type, delivery mode, source/target sessions, delivered
561
+ timestamp, and correlation id. Use `--consume-next --wait --json` for Codex
562
+ app long-polling; consumed items emit `dispatch_inbox_consumed` telemetry.
563
+ - `manager-inbox <task> [--consume-next] [--wait] [--timeout N] [--interval N]
564
+ [--limit N] [--json]` — Resolve the task's bound manager session and read its
565
+ dispatcher inbox.
566
+ - `worker-inbox <task> [--consume-next] [--wait] [--timeout N] [--interval N]
567
+ [--limit N] [--json]` — Resolve the task's bound worker session and read its
568
+ dispatcher inbox.
569
+
570
+ ### Actuation
571
+
572
+ - `session-nudge <name> "<text>" [--dry-run]` — Send text plus Enter to the
573
+ session's tmux pane. Requires the session to have been registered with
574
+ `--tmux-session`. Managers running outside tmux cannot receive nudges; only
575
+ workers do.
576
+ - `session-interrupt <name> [--key K] [--followup T] [--dry-run]` — Send an
577
+ interrupt key (default `C-c`). Optional `--followup` text after the
578
+ interrupt.
579
+
580
+ ### Audit
581
+
582
+ - `audit <task>` — Events history for a task. Lists `events`-table rows only.
583
+ `audit --json` redacts stored terminal/transcript content unless
584
+ `--include-content` is passed.
585
+ - `replay <task> [--format compact|timeline|transcript|full-transcript]
586
+ [--role all|worker|manager] [--limit N] [--include-content]` — Render a
587
+ chronological, human-readable reconstruction of the task. Cycle entries
588
+ include `[pane pattern: <pattern_id>]` when the shadow signal flagged
589
+ something. `full-transcript` is blocked unless `--include-content` is passed.
590
+ - `mutation-audit <task>` — Manager decisions and their consequences.
591
+ - `events <name>` — Worker events log.
592
+ - `commands [--task T] [--type T] [--state S] [--attempts]` — Durable
593
+ side-effect commands log. Use `--attempts` to include per-dispatcher attempt
594
+ history.
595
+ - `epilogue <task> --step run-tools|draft-pr|subagent-review|record-handoff
596
+ [--json] [--correlation-id ID]` — Run one configured epilogue step and record
597
+ its durable state. Use `--list` or `--status` to inspect configured steps and
598
+ latest run results.
599
+ - `telemetry [--run RUN] [--task TASK] [--search QUERY] [--summary] [--json]`
600
+ — Query local structured telemetry events, search them with SQLite FTS, or
601
+ print aggregate counts for a run/task. `telemetry snapshot --task <task>
602
+ --json` prints the task-scoped dashboard overview contract.
603
+ - `telemetry task <task> --json` — Print a task-scoped telemetry triage view:
604
+ recent cycle history, last successful cycle, worker/manager liveness,
605
+ decisions, commands, failed cycles/commands, ingest skipped/error summaries,
606
+ pane capture failures and notable patterns, open criteria counts, telemetry
607
+ counts, and retained storage counts. Raw transcript, pane, prompt, criterion,
608
+ command payload, and command result bodies are not included.
609
+ - `telemetry failures --json` — Print an operator failure triage view across
610
+ tasks: recent failed cycles, failed commands, ingest errors/skipped lines,
611
+ pane capture failures, open accepted criteria, active task/session health, and
612
+ retained storage counts without raw transcript or prompt content. Use `--task`,
613
+ `--run`, `--active-only`, or `--window 2h` to narrow the failure view for
614
+ recency or active-task triage.
615
+ - `telemetry metrics --window 24h --json` — Print bounded JSON rollups for
616
+ local telemetry and related tables: active tasks/sessions, cycle and command
617
+ success/failure counts, ingest/skipped-line totals, criteria counts,
618
+ reconcile drift counts, export counts, and retained capture/transcript bytes.
619
+ - `export-task <task> [--zip]` — Dump task status, audit, prompts, and
620
+ transcript metadata into an export bundle. Exports include
621
+ `telemetry-events.json`, `telemetry-summary.json`, and
622
+ `telemetry-report.md`; see `docs/local-telemetry-workflow.md`.
623
+
624
+ ### Administration
625
+
626
+ - `doctor` — Local dependency and tmux health check.
627
+ - `doctor-self` — Verify the current Codex session can self-register.
628
+ - `db-doctor` — SQLite schema health check.
629
+ - `reconcile [--apply] [--stale-cycles-seconds N]` — Report (and optionally
630
+ fix) dead-pid sessions, dangling bindings, and stuck tasks. Default
631
+ stale-cycle threshold is 3600 seconds (1h); override with
632
+ `--stale-cycles-seconds N` to catch tasks where the manager has been silent
633
+ for shorter intervals. JSON output.
634
+ - `prune [--keep-latest N] [--dry-run]` — Drop old transcript content while
635
+ preserving metadata.
636
+ - `transcript-prune <task> [--keep-latest N]` — Same, scoped to a task.
637
+ - `transcript-capture <task> [--role R] [--mode M]` — Capture deduplicated
638
+ transcript segments. JSON output redacts raw captured terminal output by
639
+ default.
640
+ - `transcript-show <task> [--role R] [--include-content]` — Show stored
641
+ transcript segment metadata. Segment text is redacted unless
642
+ `--include-content` is passed.
643
+ - `qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor>` — Print a
644
+ repeatable manual QA checklist.
645
+ - `qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop> --receipt-output RECEIPT.json [--path DB]` —
646
+ Run a deterministic no-tmux QA harness and save a JSON receipt.
647
+ `ralph-loop-guardrails` proves max-iteration cutoff, missing-evidence
648
+ cutoff, fresh retry delivery after structured `adversarial_check` evidence,
649
+ and the `pr_ci_merge_loop` preset evidence gate. `generic-loop-template`
650
+ proves the `visual_diff_loop` template blocks before visual evidence,
651
+ rejects unstructured adversarial evidence, and delivers only after required
652
+ visual receipts plus structured adversarial proof exist.
653
+ `generic-loop-template-browser` runs the same `visual_diff_loop` gate proof
654
+ with a browser-rendered static HTML candidate screenshot, recording browser
655
+ backend, viewport, candidate HTML, screenshot, visual diff, and structured
656
+ adversarial evidence in the saved receipt. It uses the repo's Node
657
+ Playwright dependency and requires Chromium to be installed and launchable;
658
+ when unavailable, it fails with the browser-backed QA helper message.
659
+ `test-coverage-loop` proves the `test_coverage_loop` template blocks before
660
+ coverage evidence, rejects malformed adversarial evidence, and delivers only
661
+ after a structured coverage receipt plus adversarial proof exist.
662
+ `build-clear-loop` proves the non-coverage `build_then_clear` template
663
+ blocks before `build_passed` and `cleanup` receipts, still blocks after build
664
+ evidence alone, and delivers only after both build and cleanup evidence exist.
665
+ - `loop-triggers --list|--classify PROMPT [--json]` — List the controlled
666
+ natural-language loop triggers or classify a manager/operator prompt before
667
+ creating a loop policy or continuation gate. Approved trigger phrases include
668
+ the adversarial Ralph-loop, iteration, finish, worker-directed proof, and
669
+ manager-created adversarial criteria gates; generic caution like "be careful
670
+ and run tests" intentionally does not arm those gates.
671
+ - `loop-templates --list|--show TEMPLATE|--create-run TASK --template TEMPLATE` —
672
+ List generic loop templates or create a template-backed loop policy run.
673
+ Template-backed runs use the same Dispatch guardrails as Ralph-loop presets:
674
+ `max_iterations` prevents over-looping, and `required_before_continue`
675
+ evidence blocks a manager continuation before worker delivery until matching
676
+ satisfied criterion evidence exists. `ralph-loop-presets` remains as a
677
+ compatibility alias for the current Ralph-loop QA flows. The built-in
678
+ `visual_diff_loop` template requires `reference_artifact`,
679
+ `candidate_screenshot`, `visual_diff_report`, `diff_below_threshold`, and
680
+ `adversarial_check` evidence before a manager-requested next visual pass can
681
+ reach the worker. Quality-oriented templates (`pr_ci_merge_loop`,
682
+ `test_coverage_loop`, and `visual_diff_loop`) also expose an
683
+ `artifact_requirements["adversarial_check"]` object requiring
684
+ `failure_mode`, `check`, and `result` fields.
685
+ - `loop-status TASK --run RUN [--json]` — Summarize a Ralph-loop run for manager
686
+ review: policy template, iteration bounds, command states, routed
687
+ notifications, worker inbox backlog, evidence types, consumed-inbox
688
+ telemetry, failure counts, and a recommendation.
689
+
690
+ For real vertical slices, start with the Ralph loop operator guide in
691
+ `docs/qa/ralph-loop-operator-guide.md`. It explains the controlled
692
+ natural-language triggers, Dispatch authority model, worker inbox polling,
693
+ required evidence, adversarial proof, `loop-status`, and telemetry review pass
694
+ bar.
695
+ Use `create-disposable-binding` when the manager and worker are Codex app or
696
+ other no-tmux sessions and you want the same Dispatch rails without manual
697
+ task/session/bind setup.
698
+ - `enqueue-continue-iteration TASK --loop-run RUN --requested-iteration N` —
699
+ Queue a manager-requested next loop pass for Dispatch. The command refuses
700
+ same/current iteration requests before they become pending queue rows, while
701
+ Dispatch also blocks any stale same/current iteration command that reaches
702
+ the queue. Max-iteration and missing-evidence refusals remain Dispatch policy receipts.
703
+ JSON output includes `loop_policy`; delivered manager/worker inbox payloads
704
+ include the same `loop_policy` plus enriched `ralph_loop` metadata so tmux and
705
+ Codex app sessions can see the template, cleanup policy, required evidence,
706
+ artifact requirements, and recommended tools.
707
+ - `loop-evidence add TASK --loop-run RUN --iteration N --evidence-type TYPE` —
708
+ Record a run-qualified evidence receipt for a loop policy. Use
709
+ `loop-evidence visual-diff` to compare PNG screenshots, write an optional
710
+ diff/report artifact, and record `visual_diff_report` plus
711
+ `diff_below_threshold` as satisfied only when the computed score is within
712
+ threshold.
713
+ - `loop-evidence adversarial-check TASK --loop-run RUN --iteration N --failure-mode F --check C --result R` —
714
+ Record first-class adversarial proof for a loop iteration. Use it when a
715
+ manager or worker tried to disprove the iteration before continuing. The
716
+ receipt is stored as `evidence_type=adversarial_check` with structured
717
+ `failure_mode`, `check`, and `result` metadata and can satisfy Ralph-loop
718
+ continuation policy. See `docs/qa/adversarial-proof.md` for the receipt
719
+ shape and how it maps to manager prompts, Ralph-loop evidence, Dispatch
720
+ blocking, and audited finish.
721
+ - `qa-plan goalbuddy-conveyor` — Print the reusable natural-language starter
722
+ prompt and QA contract for autonomous GoalBuddy conveyor runs. Use it when a
723
+ manager should split broad work into one parent board plus sequential
724
+ vertical-slice child boards with PR/CI/merge receipts, satisfied-on-main
725
+ proof, and adversarial review gates. See `docs/qa/goalbuddy-conveyor.md`.
726
+ - `ralph-loop-presets --list|--show PRESET|--create-run TASK --preset PRESET` —
727
+ List saved Ralph-loop guardrail templates or create a preset-backed
728
+ `ralph_loop` policy run.
729
+ - `import-compat` — Dry-run or import existing `.codex-workers/<worker>/`
730
+ artifacts into SQLite.
731
+
732
+ ### Worker setup
733
+
734
+ - `create <name> --cwd D --task "..."` — Full worker creation: spawn a tmux
735
+ session, start Codex, send the initial worker contract.
736
+ - `start <name> --cwd D` — Start a plain Codex session inside tmux without
737
+ registering a worker. Useful when you want to register it manually later.
738
+ - `start-test <name>` — Low-risk verification worker that only updates its
739
+ ignored `status.json`.
740
+
741
+ ### QA Plans
742
+
743
+ Print repeatable live QA checklists from the CLI:
744
+
745
+ ```bash
746
+ conveyor qa-plan self-management
747
+ conveyor qa-plan emergent-criteria
748
+ conveyor qa-plan emergent-criteria --json
749
+ conveyor qa-plan tmux-errors
750
+ conveyor qa-plan dispatch-completion
751
+ conveyor qa-plan ralph-loop
752
+ conveyor qa-plan adversarial-triggers
753
+ conveyor qa-plan goalbuddy-conveyor
754
+ conveyor qa-run ralph-loop-guardrails --receipt-output /tmp/ralph-loop-guardrails-receipt.json --json
755
+ conveyor qa-run generic-loop-template --receipt-output /tmp/generic-loop-template-receipt.json --json
756
+ conveyor qa-run generic-loop-template-browser --receipt-output /tmp/generic-loop-template-browser-receipt.json --json
757
+ conveyor qa-run test-coverage-loop --receipt-output /tmp/test-coverage-loop-receipt.json --json
758
+ conveyor qa-run adversarial-triggers --receipt-output /tmp/adversarial-triggers-receipt.json --json
759
+ conveyor qa-run build-clear-loop --receipt-output /tmp/build-clear-loop-receipt.json --json
760
+ conveyor loop-triggers --classify "Run this as an adversarially gated Ralph loop." --json
761
+ conveyor loop-templates --list --json
762
+ conveyor loop-templates --show visual_diff_loop --json
763
+ conveyor loop-evidence visual-diff qa-task --loop-run "$RUN_ID" --iteration 1 --reference reference.png --candidate candidate.png --threshold 0.02 --report-output visual-diff.json --diff-output visual-diff.png
764
+ conveyor ralph-loop-presets --list --json
765
+ ```
766
+
767
+ General loop templates let operators create policy-backed runs without adding
768
+ bespoke Dispatch behavior for each loop shape. For example,
769
+ `conveyor loop-templates --create-run qa-task --template visual_diff_loop`
770
+ creates a visual-diff loop run whose `required_before_continue` evidence must
771
+ be recorded before the manager's next visual pass can reach the worker.
772
+ Existing `ralph-loop-presets` commands remain compatible aliases over the same
773
+ template-backed guardrails.
774
+
775
+ For natural-language control, run `conveyor loop-triggers --classify
776
+ "<manager prompt>" --json` before automatically creating policy or
777
+ continuation gates. Only a matched controlled trigger should create a
778
+ Ralph-loop policy, require `adversarial_check` before continuation, require
779
+ `finish-task --require-adversarial-proof`, or record worker-proposed
780
+ adversarial proof. The executable receipt check is
781
+ `conveyor qa-run adversarial-triggers --receipt-output
782
+ /tmp/adversarial-triggers-receipt.json --json`.
783
+
784
+ The `emergent-criteria` scenario covers a real worker/manager pair, criteria
785
+ negotiation, audited finish gating, replay/export evidence, and
786
+ `--stop-manager --stop-worker` cleanup verification. It also includes an
787
+ optional `criteria-plan` step for drafting reviewed criteria commands from the
788
+ worker's separated must-have and follow-up response.
789
+
790
+ The `tmux-errors` scenario covers read-only JSON degradation, mutating command
791
+ failures, pane capture degradation, stop failures, and reconcile recovery when
792
+ tmux is unavailable or a disposable tmux target disappears.
793
+
794
+ The `dispatch-completion` scenario covers the issue #113 completion-routing
795
+ flow: a worker `task_complete` signal is read from `codex_events`, Dispatch
796
+ records and deduplicates a routed notification, the bound manager receives a
797
+ mechanical wake-up, duplicate-route races emit suppressed telemetry without an
798
+ extra send, and audit/replay/dashboard surfaces show readable, chronological
799
+ dispatch evidence.
800
+
801
+ The `ralph-loop` scenario covers the issue #152 managed delivery loop: the
802
+ manager runs the same seed prompt through at least two iterations, requires
803
+ criteria and epilogue evidence, gates PR creation, CI monitoring/fixing, green
804
+ merge, handoff, and worker clear on explicit permissions, and proves the second
805
+ iteration starts after audited clear in fresh-worker isolation. Replay iterations
806
+ start with an inspect-first guard: if the previous iteration's work is already
807
+ merged, the worker records that state and stops without making replacement edits
808
+ or opening another PR unless something is actually missing. The same QA plan
809
+ also covers preset-backed guardrails such as `pr_ci_merge_loop`, where Dispatch
810
+ blocks another worker iteration until required `pr_url`, `ci_green`, `merge`,
811
+ and `adversarial_check` evidence exists.
812
+
813
+ ### Terminal helpers
814
+
815
+ - `open <name>` — Open a macOS terminal window attached to a registered
816
+ worker.
817
+ - `open-worker <task>` — Open a terminal window for a task's worker without
818
+ spelling raw tmux session names.
819
+ - `open-manager <task>` — Same, for the task's manager.
820
+
821
+ ### Low-level worker actions (legacy worker-name-keyed)
822
+
823
+ These commands operate against workers by name and predate the manual-binding
824
+ path. They remain useful for direct access against backfilled workers and for
825
+ debugging.
826
+
827
+ - `list` — List known workers.
828
+ - `status <name>` — Print worker status as JSON.
829
+ - `idle-check <name>` — Classify worker freshness and recommend an action.
830
+ - `capture <name> [--include-content]` — Capture recent terminal output.
831
+ Default output is metadata only; pass `--include-content` only when verbatim
832
+ pane text is intentionally needed.
833
+ - `nudge <name> "<text>"` — Legacy worker-directory nudge. For managed
834
+ session pairs, prefer `session-nudge <name> "<text>"`; `nudge` falls back to
835
+ session-name delivery when no legacy worker directory exists.
836
+ - `interrupt <name>` — Send an explicit interrupt key.
837
+ - `stop <name>` — Stop a worker tmux session.
838
+ - `update-status <name>` — Update a worker status contract.
839
+ - `classify --text "..."` — Debug the busy-wait pattern classifier.
840
+
841
+ ## Manager Loop Pattern
842
+
843
+ A manager Codex drives supervision by calling `conveyor cycle <task>`
844
+ repeatedly. Each call:
845
+
846
+ 1. Runs `ingest` against the worker's rollout JSONL (idempotent; picks up only
847
+ new bytes since the last cycle).
848
+ 2. Computes the worker's current state from the JSON event stream
849
+ (`busy`, `idle`, or `unknown`). `task_started`/`user_message` set the
850
+ session to `busy`; `task_complete` sets it to `idle`; everything else does
851
+ not change state.
852
+ 3. Captures the worker's tmux pane (if attached) and runs the legacy
853
+ pattern detector — surfaces `trust_prompt`, `rate_limit_prompt`,
854
+ `enter_to_confirm`, etc. as `notable_pane_pattern`. This is the shadow
855
+ signal: best-effort and supplementary.
856
+ 4. Writes a row to `manager_cycles` so the full observation history is
857
+ replayable via `conveyor replay <task>`.
858
+ 5. Returns a structured JSON dict.
859
+
860
+ The manager parses the JSON, decides whether to act, and optionally calls
861
+ `conveyor session-nudge` / `session-interrupt`. Then it loops.
862
+
863
+ ```bash
864
+ conveyor cycle auth-refactor
865
+ # {
866
+ # "kind": "session_cycle",
867
+ # "task": "auth-refactor",
868
+ # "worker_session": "auth-worker",
869
+ # "manager_session": "auth-mgr",
870
+ # "ingest": { "new_events": 3, "new_offset": 12345 },
871
+ # "state": "busy",
872
+ # "last_state_event_at": "2026-05-11T14:32:11Z",
873
+ # "staleness_seconds": 4.2,
874
+ # "notable_pane_pattern": "trust_prompt",
875
+ # "pane_signal": {
876
+ # "captured": true,
877
+ # "classifier": { "pattern": "trust_prompt", ... },
878
+ # "notable_pattern": "trust_prompt"
879
+ # },
880
+ # "manager_context": {
881
+ # "manager_config": {...},
882
+ # "worker_handoff": {...},
883
+ # "acceptance_criteria": {
884
+ # "summary": {"proposed": 1, "accepted": 2, "satisfied": 0, "deferred": 1, "rejected": 0},
885
+ # "open": [...],
886
+ # "proposed": [...],
887
+ # "satisfied": [...],
888
+ # "deferred": [...],
889
+ # "rejected": [...]
890
+ # },
891
+ # "criteria_negotiation": {
892
+ # "needed": true,
893
+ # "reason": "no_criteria",
894
+ # "prompt": "Please propose 2-4 acceptance criteria for the current slice...",
895
+ # "suggested_actions": [...]
896
+ # }
897
+ # },
898
+ # "cycle_id": 17,
899
+ # ...
900
+ # }
901
+ ```
902
+
903
+ If `notable_pane_pattern` is set the manager can branch on it directly —
904
+ e.g., on `trust_prompt` send Enter via `session-nudge` rather than waiting on
905
+ `staleness_seconds`. On `IngestError` (rollout missing or rotated), `cycle`
906
+ records a `state='failed'` row before re-raising so the audit trail still
907
+ captures the attempt.
908
+
909
+ **Audit convention.** Mutating commands (`session-nudge`,
910
+ `session-interrupt`) write to the `events` table. Observation and
911
+ dedicated-table commands (`cycle`, `ingest`) write to their own tables
912
+ (`manager_cycles`, `codex_events`) — those tables ARE the audit trail. The
913
+ plain-text `conveyor audit <task>` lists `events` rows only; cycle
914
+ observations show up via `conveyor replay <task>` and the `manager_cycles`
915
+ table.
916
+
917
+ ## Phase 6 Polish
918
+
919
+ Recent additions to streamline worker setup and observability:
920
+
921
+ - `start-worker` convenience command for spawn-and-register in one call.
922
+ - `reconcile --stale-cycles-seconds N` to customize the stale-cycle threshold.
923
+ - Observability: `terminal_capture_error` / `terminal_fresh` fields in
924
+ status/idle JSON; `rollback_error` in nudge/interrupt audit payloads;
925
+ `skipped_lines` in `cycle` output's `ingest` field; stderr warnings on
926
+ malformed event lines and audit-insert failures.
927
+
928
+ ## Phase 7 polish (2026-05-11)
929
+
930
+ Three quality-of-life additions following Phase 6 dogfood:
931
+
932
+ - **`sessions --state`** — by default, `conveyor sessions` now hides Phase 1 backfill rows (`pid IS NULL`) and rows marked `state='gone'`. Use `--state all` to inspect every row, `--state gone` for completed/dead registrations, or `--state active` for the default view.
933
+ - **`worker_alive` / `manager_alive` in cycle output** — every `conveyor cycle` JSON now includes these booleans, computed by `os.kill(pid, 0)` against the registered session pids. Surfaces silently-dead workers between cycles.
934
+ - **`cycle --busy-wait-seconds N`** — exposes the pane-signal classifier's stuck-busy threshold (previously hard-coded at 90s) as a per-cycle flag.
935
+
936
+ ## Phase 8 classifier improvements (2026-05-12)
937
+
938
+ - **Recent event suppression for `long_running_interruptible`** — the classifier now weighs `recent_event_count` (from `ingest.new_events`) alongside `status_age_seconds`. When a worker is actively emitting events (>= 10/cycle), the `long_running_interruptible` flag is suppressed—the worker is healthy despite stale status.json. This stops false positives on long-running tools (e.g. test suites, large file reads) that stay busy but quiet on status updates.
939
+
940
+ ## Dispatch and completion contracts
941
+
942
+ Dispatch is the mechanical core infrastructure between workers and managers. It
943
+ routes facts and executes queued side effects; it does not decide whether work
944
+ is correct, finish tasks, satisfy criteria, choose strategy, merge PRs, or route
945
+ to human operators.
946
+
947
+ Current dispatch state:
948
+
949
+ - `dispatch --once` routes bound worker `task_complete` signals from
950
+ `codex_events`, not the pane classifier.
951
+ - Routed completion notifications are deduplicated by source event id, recorded
952
+ in `routed_notifications`, and threaded with `correlation_id`.
953
+ - The session inbox is the same `routed_notifications` stream addressed by
954
+ `target_session_id`: tmux push is optional transport. Codex app-based sessions
955
+ should long-poll with `manager-inbox --consume-next --wait --json` or
956
+ `worker-inbox --consume-next --wait --json`. For disposable Ralph loops, use
957
+ the generated `worker_handoff` prompt so the worker keeps polling until no
958
+ inbox item remains or the loop reaches `max_iterations`.
959
+ - `register-worker`, `register-manager`, `sessions`, `discover`, and
960
+ `create-disposable-binding --json` expose a `communication` block per
961
+ session. Treat `session_kind='tmux'` plus `receive_style='push'` as direct
962
+ tmux-delivery capable; treat `session_kind='codex_app'` plus
963
+ `receive_style='pull'` as mailbox polling required for that worker or
964
+ manager.
965
+ - Template-backed `continue_iteration` deliveries include `loop_policy` in the
966
+ inbox payload, with template name, current/max iteration, cleanup policy,
967
+ required evidence, artifact requirements, and recommended tools. Codex
968
+ app-based workers receive the same policy context by polling that tmux workers
969
+ receive by push.
970
+ - A target with a tmux session records `delivery_mode='push'` after successful
971
+ tmux delivery. A target without tmux records `delivery_mode='pull_required'`
972
+ and remains unconsumed until the addressed session polls and consumes it.
973
+ - Consuming a mailbox item records `dispatch_inbox_consumed` telemetry with the
974
+ notification id, signal type, delivery mode, target session role, and poll
975
+ count, so manager/worker dispatcher handoffs are visible in audit evidence.
976
+ - If `doctor-self --json` reports `workerctl_on_path=false` inside a Codex app
977
+ session, run `conveyor ...` from the repository root or install the
978
+ local wrapper with `scripts/install-local --write`. Its `inside_tmux` check
979
+ describes the shell running `doctor-self`; for Codex app evidence, prefer the
980
+ rollout JSONL path, `lsof` lookup, and the registration role.
981
+ - When a live drill ingests a whole rollout, Dispatch may route older completion
982
+ signals before the target proof turn. Either ingest after the target worker
983
+ turn or have the manager consume/review older completion signals before
984
+ deciding on the current one.
985
+ - Explicit `notify_manager` and `nudge_worker` command rows can be processed by
986
+ Dispatch with atomic claim/lease metadata, durable `command_attempts`,
987
+ invalid-payload failure before side effects, and conservative tmux
988
+ side-effect started/completed flags.
989
+ - `dispatch --watch` continuously repeats the same mechanical polling loop with
990
+ dispatcher identity and heartbeat telemetry; `--watch-iterations N` bounds the
991
+ run and `--lease-seconds N` controls when attempted command claims become
992
+ recoverable.
993
+ - Replay/audit surfaces include routed notifications, command attempts, and
994
+ correlation chains where the data exists. Routed notification replay includes
995
+ delivery mode, source/target sessions, delivered timestamp, consumed-by
996
+ session, and consumed timestamp. The dashboard groups bound-task dispatch
997
+ correlation chains with command state, attempt counts, notification counts,
998
+ inbox pending/consumed counts, decision/cycle ids, source event ids,
999
+ suppressed-signal visibility, chronological ordering, and side-effect risk.
1000
+ - Dashboard manual QA should use
1001
+ `conveyor dashboard --task <task> --ensure-dispatch --dispatcher-id qa-dispatch-dashboard`
1002
+ and visually confirm the Dispatch active banner, dispatcher id, heartbeat age,
1003
+ iteration, processed count, dry-run/live state, completion/routing/cycle
1004
+ conversation lane entries, command claim/attempt/delivery entries, inbox
1005
+ pending/consumed counts, pull-required notification evidence where applicable,
1006
+ and stale or not-observed warnings.
1007
+
1008
+ The adjacent completion-contract surfaces are separate from Dispatch:
1009
+
1010
+ - Worker and manager acknowledgements persist the startup contract and can gate
1011
+ `cycle`/`finish-task`.
1012
+ - Epilogues are named post-completion steps that can gate `finish-task`.
1013
+ - Continuations persist worker-first and manager-independent "what's next"
1014
+ proposals plus a recorded reviewer verdict. The CLI enforces ordering,
1015
+ redaction, permission checks, reviewer separation metadata, and can run an
1016
+ independent restricted-context reviewer command through
1017
+ `continuation-reviewer`. Reviewer execution is additionally isolated with a
1018
+ temporary cwd, stripped environment, and macOS `sandbox-exec` denial of bound
1019
+ rollout/database reads plus direct reads under the active `.codex-workers`
1020
+ state root. That broader state-root denial applies only to the
1021
+ `continuation-reviewer` subprocess; normal replay, audit, export, and
1022
+ telemetry generation paths remain outside that sandbox.
1023
+
1024
+ ## Schema
1025
+
1026
+ SQLite database at `.codex-workers/workerctl.db`. Key tables:
1027
+
1028
+ - `sessions` — Unified worker/manager registration.
1029
+ - `bindings` — Task ↔ worker session ↔ manager session.
1030
+ - `tasks` — Task records.
1031
+ - `codex_events` — Per-session JSONL events ingested from rollout files.
1032
+ - `manager_cycles` — One row per `cycle` invocation, with the full JSON
1033
+ payload as `status_json`.
1034
+ - `events` — Actuation audit log (`session_nudged`, `session_interrupted`,
1035
+ etc.).
1036
+ - `commands` — Durable side-effect command log, including Dispatch claim/lease
1037
+ metadata for queued command execution.
1038
+ - `command_attempts` — Per-dispatcher command execution attempts with
1039
+ side-effect started/completed flags and result/error payloads.
1040
+ - `routed_notifications` — Mechanical worker/manager routed facts and command
1041
+ delivery records, deduped and linked by `correlation_id`.
1042
+ - `task_acknowledgements` — Revisioned worker/manager startup contract
1043
+ acknowledgements.
1044
+ - `epilogue_runs` — Durable state for configured post-completion epilogue
1045
+ steps.
1046
+ - `task_continuations` / `continuation_reviews` — Worker/manager continuation
1047
+ proposals and reviewer verdicts for "what's next" review flows.
1048
+ - `workers`, `managers` — Legacy tables retained for read-only history.
1049
+
1050
+ `conveyor db-doctor` reports schema health. `conveyor reconcile` reports
1051
+ runtime drift (dead-pid sessions, dangling bindings, stuck tasks); add
1052
+ `--apply` to fix.
1053
+
1054
+ ## Migration from the Legacy Path
1055
+
1056
+ Earlier prototypes used a worker-first promotion flow where a worker was
1057
+ created first and a manager was then spawned to supervise it. Those legacy
1058
+ commands have been retired. The new path inverts the model: register two
1059
+ already-running Codex sessions, create a task, bind them, and let the manager
1060
+ Codex drive observation via `cycle`.
1061
+
1062
+ The legacy database tables (`workers`, `managers`) remain readable via
1063
+ `audit`, `replay`, and `export-task` for historical reference, but no kept
1064
+ CLI command writes to them. To resume work on a legacy task, call
1065
+ `finish-task` on it and start fresh via `register-worker` +
1066
+ `register-manager` + `bind`.
1067
+
1068
+ ## Tests
1069
+
1070
+ Release-candidate deterministic gate:
1071
+
1072
+ ```bash
1073
+ scripts/rc-check --skip-live-smoke-repeat
1074
+ ```
1075
+
1076
+ Full local release-candidate gate:
1077
+
1078
+ ```bash
1079
+ scripts/rc-check --with-live-smoke-repeat
1080
+ ```
1081
+
1082
+ Underlying deterministic checks:
1083
+
1084
+ ```bash
1085
+ python3 -m unittest discover -s tests -v
1086
+ scripts/check-resource-warnings
1087
+ python3 -m py_compile scripts/workerctl scripts/check-resource-warnings workerctl/*.py
1088
+ npm run migration:audit:final
1089
+ scripts/package-smoke
1090
+ scripts/release-check
1091
+ ```
1092
+
1093
+ For local parallel experiments, prefer:
1094
+
1095
+ ```bash
1096
+ scripts/run-unittests-isolated
1097
+ ```
1098
+
1099
+ This gives the process a temporary `WORKERCTL_STATE_ROOT` and a test namespace.
1100
+ The standard CI job remains serial.
1101
+
1102
+ GitHub Actions runs `scripts/rc-check --skip-live-smoke-repeat` and
1103
+ `scripts/package-smoke` on every push and pull request. The live smoke repeat
1104
+ remains local/manual because hosted runners may not have `codex`.
1105
+ The ResourceWarning gate intentionally fails on any `ResourceWarning` text in
1106
+ test output so finalization-time resource warnings cannot be hidden by a zero
1107
+ `unittest` exit status.
1108
+
1109
+ Live local smoke gate:
1110
+
1111
+ ```bash
1112
+ scripts/live-smoke
1113
+ ```
1114
+
1115
+ The live smoke requires macOS, `tmux`, `codex`, and `rg`. It starts disposable
1116
+ Codex worker/manager sessions, exercises `pair`, `cycle`, `session-nudge`,
1117
+ criteria mutation, transcript capture before stop, replay, mutation audit, and
1118
+ export, then verifies cleanup with `sessions --state active` and `reconcile`.
1119
+ It writes evidence under `docs/live-qa-artifacts/` and should leave no active
1120
+ smoke sessions, tmux panes, dangling bindings, or stuck tasks.
1121
+
1122
+ For the focused manual coverage pass, use
1123
+ [docs/manual-qa-checklist.md](docs/manual-qa-checklist.md).