agent-conveyor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1123 -0
- package/dist/cli/main.d.ts +2 -0
- package/dist/cli/main.js +19 -0
- package/dist/cli/main.js.map +1 -0
- package/dist/cli/program-name.d.ts +2 -0
- package/dist/cli/program-name.js +12 -0
- package/dist/cli/program-name.js.map +1 -0
- package/dist/cli/typescript-runtime.d.ts +52 -0
- package/dist/cli/typescript-runtime.js +18009 -0
- package/dist/cli/typescript-runtime.js.map +1 -0
- package/dist/index.d.ts +37 -0
- package/dist/index.js +20 -0
- package/dist/index.js.map +1 -0
- package/dist/runtime/audit.d.ts +96 -0
- package/dist/runtime/audit.js +298 -0
- package/dist/runtime/audit.js.map +1 -0
- package/dist/runtime/classify.d.ts +8 -0
- package/dist/runtime/classify.js +128 -0
- package/dist/runtime/classify.js.map +1 -0
- package/dist/runtime/codex-session.d.ts +103 -0
- package/dist/runtime/codex-session.js +408 -0
- package/dist/runtime/codex-session.js.map +1 -0
- package/dist/runtime/commands.d.ts +92 -0
- package/dist/runtime/commands.js +408 -0
- package/dist/runtime/commands.js.map +1 -0
- package/dist/runtime/dispatch.d.ts +74 -0
- package/dist/runtime/dispatch.js +669 -0
- package/dist/runtime/dispatch.js.map +1 -0
- package/dist/runtime/export.d.ts +22 -0
- package/dist/runtime/export.js +77 -0
- package/dist/runtime/export.js.map +1 -0
- package/dist/runtime/ingest.d.ts +28 -0
- package/dist/runtime/ingest.js +177 -0
- package/dist/runtime/ingest.js.map +1 -0
- package/dist/runtime/loop-evidence.d.ts +87 -0
- package/dist/runtime/loop-evidence.js +448 -0
- package/dist/runtime/loop-evidence.js.map +1 -0
- package/dist/runtime/manager-config.d.ts +20 -0
- package/dist/runtime/manager-config.js +34 -0
- package/dist/runtime/manager-config.js.map +1 -0
- package/dist/runtime/manager-permissions.d.ts +7 -0
- package/dist/runtime/manager-permissions.js +85 -0
- package/dist/runtime/manager-permissions.js.map +1 -0
- package/dist/runtime/notifications.d.ts +89 -0
- package/dist/runtime/notifications.js +208 -0
- package/dist/runtime/notifications.js.map +1 -0
- package/dist/runtime/replay.d.ts +29 -0
- package/dist/runtime/replay.js +331 -0
- package/dist/runtime/replay.js.map +1 -0
- package/dist/runtime/tasks.d.ts +54 -0
- package/dist/runtime/tasks.js +195 -0
- package/dist/runtime/tasks.js.map +1 -0
- package/dist/runtime/tmux.d.ts +61 -0
- package/dist/runtime/tmux.js +189 -0
- package/dist/runtime/tmux.js.map +1 -0
- package/dist/runtime/visual-diff.d.ts +23 -0
- package/dist/runtime/visual-diff.js +234 -0
- package/dist/runtime/visual-diff.js.map +1 -0
- package/dist/state/database.d.ts +21 -0
- package/dist/state/database.js +142 -0
- package/dist/state/database.js.map +1 -0
- package/dist/state/files.d.ts +38 -0
- package/dist/state/files.js +73 -0
- package/dist/state/files.js.map +1 -0
- package/dist/state/schema-v22.d.ts +1 -0
- package/dist/state/schema-v22.js +566 -0
- package/dist/state/schema-v22.js.map +1 -0
- package/dist/state/sqlite-contract.d.ts +4 -0
- package/dist/state/sqlite-contract.js +78 -0
- package/dist/state/sqlite-contract.js.map +1 -0
- package/dist/state/status.d.ts +12 -0
- package/dist/state/status.js +40 -0
- package/dist/state/status.js.map +1 -0
- package/docs/typescript-migration/cli-contract.md +147 -0
- package/docs/typescript-migration/dashboard-contract.md +76 -0
- package/docs/typescript-migration/package-install-contract.md +98 -0
- package/docs/typescript-migration/qa-gate-matrix.md +103 -0
- package/docs/typescript-migration/sqlite-state-contract.md +92 -0
- package/docs/typescript-migration/t005-runtime-parity.md +47 -0
- package/package.json +88 -0
- package/scripts/capture-static-html-screenshot.mjs +88 -0
- package/skills/codex-review/SKILL.md +116 -0
- package/skills/codex-review/scripts/codex-review +344 -0
- package/skills/manage-codex-workers/SKILL.md +696 -0
- package/skills/manage-codex-workers/agents/openai.yaml +5 -0
package/README.md
ADDED
|
@@ -0,0 +1,1123 @@
|
|
|
1
|
+
# Codex Terminal Manager
|
|
2
|
+
|
|
3
|
+
A Mac-first prototype for letting one Codex session supervise and gently steer
|
|
4
|
+
another Codex session running in a terminal.
|
|
5
|
+
|
|
6
|
+
The goal is not full autonomy. The goal is lightweight supervision for Codex
|
|
7
|
+
tasks that mostly need progress checks, occasional nudges, test reruns, or
|
|
8
|
+
clean stop/resume handling.
|
|
9
|
+
|
|
10
|
+
## Motivating Principles
|
|
11
|
+
|
|
12
|
+
- **Project workflows often need nudging.** Some Codex tasks do not need a
|
|
13
|
+
second implementer; they need a manager that keeps watching, asks for status
|
|
14
|
+
at the right time, unblocks predictable terminal prompts, and gently steers
|
|
15
|
+
the worker back toward the user's goal.
|
|
16
|
+
- **Useful acceptance criteria often emerge during the work.** Even when the
|
|
17
|
+
user starts with a plan, implementation reveals new edge cases, missing tests,
|
|
18
|
+
unclear polish requirements, and follow-up decisions. The manager should help
|
|
19
|
+
discover, record, defer, satisfy, and audit these emergent acceptance criteria
|
|
20
|
+
instead of assuming the whole checklist can be known up front.
|
|
21
|
+
- **Supervision should be durable and replayable.** Manager observations,
|
|
22
|
+
nudges, interrupts, handoffs, compaction requests, and final decisions should
|
|
23
|
+
leave enough structured history to understand why the worker was pushed and
|
|
24
|
+
what evidence supported finishing or continuing the task.
|
|
25
|
+
|
|
26
|
+
## Burden Of Proof
|
|
27
|
+
|
|
28
|
+
Before declaring work complete, try to disprove the change. Identify the
|
|
29
|
+
strongest realistic failure mode, verify it with a command, test, trace,
|
|
30
|
+
screenshot, audit record, diff, or direct inspection, and include that evidence
|
|
31
|
+
in the final handoff. Treat `done`, `tests passed`, worker claims, passing
|
|
32
|
+
happy-path tests, generated summaries, and optimistic UI as claims, not proof.
|
|
33
|
+
Treat unverified assumptions as blockers or explicit follow-ups.
|
|
34
|
+
|
|
35
|
+
See `docs/agent-evidence-playbook.md` for the repo-specific evidence ladder,
|
|
36
|
+
receipt options, and final handoff shape agents should use when closing out
|
|
37
|
+
work.
|
|
38
|
+
|
|
39
|
+
## Architecture
|
|
40
|
+
|
|
41
|
+
Supervision is built on three primitives: **sessions**, **tasks**, and
|
|
42
|
+
**bindings**.
|
|
43
|
+
|
|
44
|
+
- **Worker session.** A Codex session running inside a named `tmux` session.
|
|
45
|
+
Workers own a rollout JSONL on disk (`~/.codex/sessions/.../rollout-*.jsonl`)
|
|
46
|
+
which Agent Conveyor ingests for state inference.
|
|
47
|
+
- **Manager session.** A Codex session running anywhere — Ghostty, iTerm2,
|
|
48
|
+
Terminal.app, or a web terminal. The manager does not need to run inside
|
|
49
|
+
tmux. Its job is to call `conveyor` commands, read their JSON output, and
|
|
50
|
+
decide whether to nudge, interrupt, finish, or wait.
|
|
51
|
+
- **Task.** A unit of supervised work. A task has a goal and optional
|
|
52
|
+
summary/manager instructions.
|
|
53
|
+
- **Binding.** A row that ties one worker session and one manager session to
|
|
54
|
+
one task. Bindings are explicit and durable.
|
|
55
|
+
|
|
56
|
+
The manager Codex drives the supervision loop by calling
|
|
57
|
+
`conveyor cycle <task>` repeatedly. Each cycle ingests new events from the
|
|
58
|
+
worker's rollout, captures the worker's tmux pane as a shadow signal, and
|
|
59
|
+
returns structured JSON. The manager reads that JSON and decides what to do.
|
|
60
|
+
|
|
61
|
+
`tmux` owns the worker PTY. Ghostty, iTerm2, Terminal.app, or `ttyd` can be
|
|
62
|
+
viewers, but they should not be the source of truth for orchestration.
|
|
63
|
+
|
|
64
|
+
```text
|
|
65
|
+
manager terminal
|
|
66
|
+
Codex manager session
|
|
67
|
+
|
|
|
68
|
+
| runs conveyor commands (cycle, session-nudge, ...)
|
|
69
|
+
v
|
|
70
|
+
tmux session: codex-worker-a
|
|
71
|
+
pane 1: Codex worker session
|
|
72
|
+
pane 2: optional dev server, tests, or logs
|
|
73
|
+
|
|
74
|
+
.codex-workers/
|
|
75
|
+
workerctl.db <- authoritative SQLite control plane
|
|
76
|
+
worker-a/ <- ignored runtime artifacts
|
|
77
|
+
status.json
|
|
78
|
+
transcript.txt
|
|
79
|
+
events.jsonl
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
## Non-Goals
|
|
83
|
+
|
|
84
|
+
- Cross-platform support.
|
|
85
|
+
- Browser-first orchestration.
|
|
86
|
+
- Full terminal emulator automation.
|
|
87
|
+
- Autonomous merging or destructive git actions.
|
|
88
|
+
- Managing many workers at once.
|
|
89
|
+
|
|
90
|
+
## Install
|
|
91
|
+
|
|
92
|
+
For users, install the published Agent Conveyor package with npm:
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
npm install -g agent-conveyor
|
|
96
|
+
conveyor install-skills
|
|
97
|
+
conveyor doctor
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
The package name is `agent-conveyor`. The installed CLI exposes both
|
|
101
|
+
`conveyor` and the compatibility command `workerctl`. `conveyor install-skills`
|
|
102
|
+
installs the `manage-codex-workers` and `codex-review` skills into
|
|
103
|
+
`$CODEX_HOME/skills` or `~/.codex/skills`. The `codex-review` install includes
|
|
104
|
+
the guarded review helper used by the QA and PR closeout flows.
|
|
105
|
+
|
|
106
|
+
For contributors working from this checkout, use the local installer instead:
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
scripts/install-local --write
|
|
110
|
+
export PATH="$PWD/bin:$PATH"
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
To test unreleased packaging changes before publish, install a local npm
|
|
114
|
+
tarball into a temporary prefix:
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
npm run build
|
|
118
|
+
npm pack
|
|
119
|
+
tmp_prefix="$(mktemp -d)"
|
|
120
|
+
npm install -g --prefix "$tmp_prefix" ./agent-conveyor-*.tgz
|
|
121
|
+
PATH="$tmp_prefix/bin:$PATH" conveyor --help
|
|
122
|
+
PATH="$tmp_prefix/bin:$PATH" workerctl --help
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
`conveyor doctor` reports local dependency health (tmux, codex, etc.).
|
|
126
|
+
`conveyor db-doctor` initializes and checks the SQLite control-plane
|
|
127
|
+
database.
|
|
128
|
+
Before publishing `agent-conveyor` to npm, use
|
|
129
|
+
[`docs/package-release.md`](docs/package-release.md).
|
|
130
|
+
|
|
131
|
+
After install, the intended Codex app entry point is natural language. Open a
|
|
132
|
+
new Codex app session in the target repo and say:
|
|
133
|
+
|
|
134
|
+
```text
|
|
135
|
+
Use the manage-codex-workers skill.
|
|
136
|
+
|
|
137
|
+
Set up a Codex app Ralph loop for issue CTL.
|
|
138
|
+
Require adversarial proof before another worker iteration.
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
The installed skill should call the `conveyor` CLI, choose names, create the
|
|
142
|
+
no-tmux binding with `create-disposable-binding`, point the worker at
|
|
143
|
+
`worker-inbox`, and use `loop-status` plus telemetry receipts before reporting
|
|
144
|
+
that the loop is ready.
|
|
145
|
+
|
|
146
|
+
Dispatch is core infrastructure for supervised worker/manager pairs. The
|
|
147
|
+
`pair` workflow starts a detached Dispatch watch process by default so worker
|
|
148
|
+
completion is routed to the bound manager mechanically. For manually bound
|
|
149
|
+
pairs, run Dispatch in a separate shell:
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
conveyor dispatch --watch --dispatcher-id dispatch-local
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
Use `conveyor qa-plan dispatch-completion` for a bounded verification flow, or
|
|
156
|
+
`conveyor qa-plan ralph-loop` for the repeated PR/CI/merge/context-clear
|
|
157
|
+
dogfood loop.
|
|
158
|
+
Use `conveyor qa-plan adversarial-triggers` to verify natural-language
|
|
159
|
+
manager prompts activate Ralph-loop adversarial gates.
|
|
160
|
+
Use `conveyor qa-plan goalbuddy-conveyor` when a broad request should become
|
|
161
|
+
sequential GoalBuddy child boards with PR/CI/merge receipts.
|
|
162
|
+
For manual QA, launch the dashboard with Dispatch enforcement so the page can
|
|
163
|
+
show live proof:
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
conveyor dashboard --task <task> --ensure-dispatch --dispatcher-id qa-dispatch-dashboard
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
## Quickstart
|
|
170
|
+
|
|
171
|
+
The fastest way to start a worker and register it is a single command:
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
# One command: spawn codex in tmux, wait for it to come up, register as worker
|
|
175
|
+
conveyor start-worker --name foo --cwd "$PWD" --task "Refactor auth"
|
|
176
|
+
|
|
177
|
+
# Register a manager. Managers do not need to run inside tmux.
|
|
178
|
+
MGR_PID=$$ # if your current shell is the manager; otherwise find its pid
|
|
179
|
+
conveyor register-manager --name foo-mgr --pid $MGR_PID --cwd "$PWD"
|
|
180
|
+
|
|
181
|
+
# Create a task and bind the pair to it.
|
|
182
|
+
conveyor tasks --create my-task --goal "Refactor auth"
|
|
183
|
+
conveyor bind --task my-task --worker foo --manager foo-mgr
|
|
184
|
+
|
|
185
|
+
# Start Dispatch in another shell so worker completion wakes the manager.
|
|
186
|
+
conveyor dispatch --watch --dispatcher-id dispatch-local
|
|
187
|
+
|
|
188
|
+
# One observation cycle. Returns JSON.
|
|
189
|
+
conveyor cycle my-task
|
|
190
|
+
|
|
191
|
+
# Optionally nudge the worker through its tmux pane.
|
|
192
|
+
conveyor session-nudge foo "What's your current state?"
|
|
193
|
+
|
|
194
|
+
# When the task is complete:
|
|
195
|
+
conveyor finish-task my-task --reason "auth refactor merged" --capture-transcript-before-stop --stop-manager --stop-worker
|
|
196
|
+
conveyor unbind --task my-task
|
|
197
|
+
conveyor deregister foo
|
|
198
|
+
conveyor deregister foo-mgr
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
For manual registration of a pre-existing Codex session:
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
# Start a Codex worker inside a fresh tmux session.
|
|
205
|
+
tmux new-session -d -s codex-foo
|
|
206
|
+
tmux send-keys -t codex-foo "codex" Enter
|
|
207
|
+
# (Wait a moment for codex to come up.)
|
|
208
|
+
WORKER_PID=$(pgrep -f "codex.*--sandbox" | head -1)
|
|
209
|
+
|
|
210
|
+
# Register the worker. lsof auto-discovers the rollout JSONL from the pid.
|
|
211
|
+
conveyor register-worker --name foo --pid $WORKER_PID \
|
|
212
|
+
--cwd "$PWD" --tmux-session codex-foo
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
If `lsof` discovery fails (e.g. the codex session was started ephemerally),
|
|
216
|
+
pass the rollout path explicitly with `--codex-session
|
|
217
|
+
~/.codex/sessions/.../rollout-...-<uuid>.jsonl`.
|
|
218
|
+
|
|
219
|
+
To register a manager session that's already running:
|
|
220
|
+
|
|
221
|
+
```bash
|
|
222
|
+
# If the codex is already running and you know its pid:
|
|
223
|
+
conveyor register-manager --name my-mgr --pid 28975
|
|
224
|
+
|
|
225
|
+
# register-manager runs `lsof -p <pid>` to find the rollout JSONL.
|
|
226
|
+
# If the codex hasn't written its rollout yet (no input typed),
|
|
227
|
+
# you'll get a hint asking you to type something in the codex prompt and retry.
|
|
228
|
+
|
|
229
|
+
# Or pass --codex-session explicitly to bypass the lsof probe:
|
|
230
|
+
conveyor register-manager --name my-mgr --pid 28975 \
|
|
231
|
+
--codex-session /path/to/rollout.jsonl
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
Note: `lsof` is the canonical pid→rollout lookup. `find -newermt` is unreliable because
|
|
235
|
+
filesystem mtime resolution and parsing of "X minutes ago" varies — `lsof` reads the open fd directly.
|
|
236
|
+
|
|
237
|
+
For low-risk verification without a real task, `conveyor start-test
|
|
238
|
+
<name>` creates a worker, asks it to update only its ignored
|
|
239
|
+
`status.json`, and leaves the tmux session attached:
|
|
240
|
+
|
|
241
|
+
```bash
|
|
242
|
+
conveyor start-test live-test --cwd "$PWD" --accept-trust --open
|
|
243
|
+
tmux attach -t codex-live-test
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
## Commands
|
|
247
|
+
|
|
248
|
+
### Sessions and binding
|
|
249
|
+
|
|
250
|
+
- `start-worker --name N [--cwd D] [--task "..."] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N]` —
|
|
251
|
+
Spawn Codex in a fresh tmux session and register it as a worker in one call.
|
|
252
|
+
The fastest way to start a supervised worker. Internally: `tmux new-session`
|
|
253
|
+
+ `codex` + poll for rollout + `register-worker`.
|
|
254
|
+
- `start-manager --name N [--cwd D] [--task T] [--task-goal G] [--worker W] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N]` —
|
|
255
|
+
Spawn Codex in a fresh tmux session and register it as a manager in one call.
|
|
256
|
+
Mirrors `start-worker` but uses a manager bootstrap prompt instead of a worker
|
|
257
|
+
task prompt. When `--task`, `--task-goal`, and `--worker` are supplied, the
|
|
258
|
+
bootstrap is ready for late attach: it names the task, goal, worker session,
|
|
259
|
+
and concrete `manager-config`, `cycle`, `manager-ack`, and `worker-ack`
|
|
260
|
+
commands. If manager config has already been recorded for the task, the
|
|
261
|
+
bootstrap tells the manager to start with `cycle` instead of asking setup
|
|
262
|
+
questions again. Without those flags, the bootstrap asks the manager to
|
|
263
|
+
collect the missing supervision details before cycling.
|
|
264
|
+
- `pair --task T --worker-name W --manager-name M [--cwd D] [--task-prompt PROMPT] [--task-goal GOAL] [--task-summary S] [--manager-objective O] [--manager-guideline G ...] [--manager-acceptance A ...] [--sandbox SANDBOX] [--ask-for-approval ASK_FOR_APPROVAL] [--accept-trust] [--timeout-seconds N] [--dispatcher-id ID] [--no-dispatch]` —
|
|
265
|
+
One-shot: spawn worker + manager and bind to a task in a single command. Combines
|
|
266
|
+
`start-worker` + `start-manager` + `bind`. The task is looked up or created (if
|
|
267
|
+
`--task-goal` is provided); if the task does not exist and no goal is given, an
|
|
268
|
+
error is raised with a hint. The worker receives the optional `--task-prompt` as
|
|
269
|
+
its initial Codex prompt; the manager receives a manager bootstrap prompt with
|
|
270
|
+
the task, goal, worker name, manager configuration status, and `cycle` commands.
|
|
271
|
+
`pair` records a default guided manager config before launching the manager, so
|
|
272
|
+
retries against an existing task do not fall back into setup-question mode.
|
|
273
|
+
If manager config flags are supplied (`--manager-mode`,
|
|
274
|
+
`--manager-objective`, repeated `--manager-guideline`,
|
|
275
|
+
`--manager-acceptance`, `--manager-reference`, or manager permission flags),
|
|
276
|
+
those values are merged into the seeded config and the bootstrap tells the
|
|
277
|
+
manager to start supervising with `cycle` instead of asking setup questions
|
|
278
|
+
first. Manager acceptance entries are also seeded into the living
|
|
279
|
+
acceptance criteria ledger when they do not already exist for the task. By
|
|
280
|
+
default `pair` starts a detached `dispatch --watch` process after successful
|
|
281
|
+
worker/manager setup, bind, and run creation. Use `--dispatcher-id` to set its
|
|
282
|
+
identity or `--no-dispatch` for isolated/manual workflows. A live dispatch
|
|
283
|
+
heartbeat is reused only when it has the same dispatcher id; otherwise `pair`
|
|
284
|
+
starts the requested dispatcher so audit receipts keep the configured
|
|
285
|
+
identity.
|
|
286
|
+
If the manager or bind fails after the worker is spawned, the worker remains
|
|
287
|
+
registered and can be cleaned up with `conveyor deregister`.
|
|
288
|
+
Use `--accept-trust` only for directories you intentionally trust; it retries
|
|
289
|
+
Enter during startup discovery so fresh workspaces do not stall before
|
|
290
|
+
registration.
|
|
291
|
+
- `register-worker --name N [--pid P | --codex-session PATH] [--cwd D] [--tmux-session S]` —
|
|
292
|
+
Register an already-running Codex session as a worker. Rollout JSONL is
|
|
293
|
+
auto-discovered from the pid via `lsof` unless `--codex-session` is given.
|
|
294
|
+
- `register-manager --name N ...` — Same arguments; tmux is not required.
|
|
295
|
+
Both registration commands print a `communication` object. When
|
|
296
|
+
`--tmux-session` is present, `communication.session_kind='tmux'`,
|
|
297
|
+
`receive_style='push'`, and `delivery_mode='push'`; without tmux but with a
|
|
298
|
+
Codex rollout identity, `session_kind='codex_app'`, `receive_style='pull'`,
|
|
299
|
+
and `delivery_mode='pull_required'`, with the role-specific inbox polling
|
|
300
|
+
command template.
|
|
301
|
+
- `deregister <name>` — Mark a session gone. Refuses if the session is bound
|
|
302
|
+
to an active task.
|
|
303
|
+
- `sessions [--role worker|manager] [--state active|gone|all] [--include-legacy]
|
|
304
|
+
[--name N ...] [--redact-identity-token]` — List registered sessions.
|
|
305
|
+
By default, `sessions` shows active registered sessions and hides Phase 1 backfill rows (legacy pre-redesign workers/managers, identified by `pid IS NULL`) plus rows marked `state='gone'`. Pass `--state all` to show every row, or `--state gone` to inspect only gone rows:
|
|
306
|
+
```bash
|
|
307
|
+
conveyor sessions # active registered sessions only
|
|
308
|
+
conveyor sessions --state active # explicit equivalent of the default
|
|
309
|
+
conveyor sessions --state gone # gone sessions only
|
|
310
|
+
conveyor sessions --state all # active, gone, and legacy rows
|
|
311
|
+
conveyor sessions --name <session> --redact-identity-token
|
|
312
|
+
```
|
|
313
|
+
For shareable QA evidence, prefer repeating `--name` for just the sessions in
|
|
314
|
+
scope and include `--redact-identity-token`; unfiltered output can include
|
|
315
|
+
unrelated active sessions and their registration tokens.
|
|
316
|
+
Each session row includes the same `communication` block emitted by
|
|
317
|
+
registration, so managers can detect whether a worker or manager is
|
|
318
|
+
tmux-push capable or must poll its mailbox.
|
|
319
|
+
- `tasks [--create NAME --goal G --summary S]` — List or create tasks.
|
|
320
|
+
- `create-disposable-binding TASK [--worker NAME] [--manager NAME] [--template TEMPLATE | --required-before-continue TYPE] [--adversarial]` —
|
|
321
|
+
Create a no-tmux manager/worker binding for real Ralph-loop slices. The
|
|
322
|
+
helper creates the task when missing, marks it managed, writes valid Codex
|
|
323
|
+
rollout JSONL files, registers worker and manager sessions with
|
|
324
|
+
`tmux_session=null`, binds them, optionally creates a template-backed or
|
|
325
|
+
custom Ralph-loop policy run, and prints replay commands for Dispatch,
|
|
326
|
+
`loop-status`, per-session `communication` metadata, plus a `worker_handoff`
|
|
327
|
+
prompt that tells Codex app workers to keep polling their worker inbox
|
|
328
|
+
through the bounded loop.
|
|
329
|
+
- `discover [QUERY] [--all] [--limit N]` / `search [QUERY]` — Search tasks,
|
|
330
|
+
registered sessions, active bindings, and recent telemetry in one JSON result.
|
|
331
|
+
Use this for conversational setup when a manager or Codex session needs to
|
|
332
|
+
present likely worker/manager/task connection options instead of asking the
|
|
333
|
+
user for generated names:
|
|
334
|
+
```bash
|
|
335
|
+
conveyor discover dashboard
|
|
336
|
+
conveyor search "auth refactor"
|
|
337
|
+
```
|
|
338
|
+
The output includes `tasks`, `sessions`, `bindings`, `telemetry`, and
|
|
339
|
+
`suggestions`; suggestions may include a ready-to-run `conveyor bind`
|
|
340
|
+
command or next-step prompts to register the missing worker or manager.
|
|
341
|
+
- `handoff <task> --summary S [--next-step N ...] [--payload-json JSON]` —
|
|
342
|
+
Persist a compact worker handoff for the task. Use this when a worker is
|
|
343
|
+
becoming managed or before a long context transition so the manager can read
|
|
344
|
+
progress and likely next steps from SQLite.
|
|
345
|
+
- `manager-config <task> [--mode light|guided|strict] [--objective O]
|
|
346
|
+
[--guideline G ...] [--acceptance A ...] [--reference R ...]
|
|
347
|
+
[--permit CATEGORY.ACTION ...] [--tool TOOL ...]
|
|
348
|
+
[--epilogue STEP ...] [--nudge-on-completion MODE] [--require-acks]
|
|
349
|
+
[--allow-pr] [--allow-merge-green] [--allow-worker-compact-clear]` —
|
|
350
|
+
Persist the manager's supervision contract: what to check against, how
|
|
351
|
+
structured the loop should be, acceptance criteria, source references, and
|
|
352
|
+
categorized permissions. With no recorded config it creates the default
|
|
353
|
+
guided config; with no mutating flags after that it prints the current config.
|
|
354
|
+
Use `--questions` from a manager Codex session to get a stable JSON question
|
|
355
|
+
schema to ask the user in chat, then save the answers with noninteractive
|
|
356
|
+
flags. Use `--interactive` only as a terminal fallback when a human is
|
|
357
|
+
running `conveyor` directly.
|
|
358
|
+
`--permit` grants taxonomy permissions such as `repo.open_pr`,
|
|
359
|
+
`verification.run_pytest`, `context.spawn_reviewer`,
|
|
360
|
+
`communication.notify_operator`, or `worker_session.compact`. Use `--tool`
|
|
361
|
+
to record expected verification/context tools, `--epilogue` for required
|
|
362
|
+
built-in finish steps (`run-tools`, `draft-pr`, `subagent-review`,
|
|
363
|
+
`record-handoff`), `--nudge-on-completion` for continuation review behavior
|
|
364
|
+
(`off`, `ask-operator`, `auto-review`, `auto-proceed`), and `--require-acks`
|
|
365
|
+
when `cycle`/`finish-task` should fail closed until both sides acknowledge.
|
|
366
|
+
Legacy flat flags and `--permissions-json` keys (`create_pr`,
|
|
367
|
+
`merge_green_pr`, `worker_compact_clear`, plus older `allow_*` aliases) are
|
|
368
|
+
still accepted and normalized into the categorized taxonomy.
|
|
369
|
+
- `criteria <task>` — Track emergent acceptance criteria discovered during
|
|
370
|
+
supervision. Managers should add useful proposed criteria, accept must-have
|
|
371
|
+
items, defer follow-ups, and mark criteria satisfied only when worker
|
|
372
|
+
receipts and verification cover them.
|
|
373
|
+
```bash
|
|
374
|
+
conveyor criteria my-task --list
|
|
375
|
+
conveyor criteria my-task --list --status accepted
|
|
376
|
+
conveyor criteria my-task --add --criterion "..." --source worker_proposed --status proposed
|
|
377
|
+
conveyor criteria my-task --add --criterion "..." --source manager_inferred --status accepted
|
|
378
|
+
conveyor criteria my-task --accept 12 --rationale "Must-have for this task"
|
|
379
|
+
conveyor criteria my-task --satisfy <id> --evidence-json '{"command":"...","status":"pass"}'
|
|
380
|
+
conveyor criteria my-task --defer 13 --rationale "Follow-up after this task"
|
|
381
|
+
conveyor criteria my-task --reject 14 --rationale "Duplicate or out of scope"
|
|
382
|
+
```
|
|
383
|
+
Replace placeholder `...` values with the actual criterion and verification
|
|
384
|
+
command. Use `worker_proposed` for criteria proposed by the worker. Use
|
|
385
|
+
`manager_inferred` for criteria inferred from manager config, cycle evidence,
|
|
386
|
+
or manager inspection; `manager_config` is not a valid criteria source.
|
|
387
|
+
To add a criterion and satisfy that same row after verification:
|
|
388
|
+
```bash
|
|
389
|
+
criterion_id=$(conveyor criteria my-task --add --criterion "Targeted prompt tests pass" --source worker_proposed --status proposed | python3 -c 'import json,sys; print(json.load(sys.stdin)["affected_criterion"]["id"])')
|
|
390
|
+
conveyor criteria my-task --satisfy "$criterion_id" --evidence-json '{"command":"python3 -m unittest tests.test_workerctl.ManagerBootstrapPromptTests -v","status":"pass"}'
|
|
391
|
+
```
|
|
392
|
+
For mutation responses, treat `affected_criterion` as the authoritative
|
|
393
|
+
receipt for the row changed by that command. When a manager applies multiple
|
|
394
|
+
criteria changes, run `criteria <task> --list` before final audit or other
|
|
395
|
+
decisions; the list command is the canonical task-level criteria state.
|
|
396
|
+
- `criteria-plan <task> --from-text ...|--from-worker-response PATH|--from-stdin
|
|
397
|
+
[--json]` — Draft reviewed `criteria --add` commands from a worker response
|
|
398
|
+
that separates must-have current-task criteria from deferred follow-ups. This
|
|
399
|
+
helper is read-only: it resolves the task and prints suggestions, but does not
|
|
400
|
+
mutate acceptance criteria, events, or commands.
|
|
401
|
+
```bash
|
|
402
|
+
conveyor criteria-plan my-task --from-worker-response response.md --json
|
|
403
|
+
```
|
|
404
|
+
- `manager-permission <task> <CATEGORY.ACTION|CATEGORY> [--list]
|
|
405
|
+
[--require] [--require-handoff]` — Check and audit whether the saved manager
|
|
406
|
+
config allows a categorized action, or list granted actions in a category.
|
|
407
|
+
Use `--require` when a manager command should fail closed. Use
|
|
408
|
+
`--require-handoff` before worker compact/clear style instructions so visible
|
|
409
|
+
context is persisted first.
|
|
410
|
+
- `worker-ack <task> --from-stdin|--json [--correlation-id ID]` /
|
|
411
|
+
`manager-ack <task> --from-stdin|--json [--correlation-id ID]` — Persist or
|
|
412
|
+
read the latest structured acknowledgement from the worker or manager. Acks
|
|
413
|
+
are revisioned and exposed to `cycle`, `replay`, and `audit` so startup
|
|
414
|
+
contract drift can be distinguished from later drift.
|
|
415
|
+
- `continuation <task> --submit worker|manager --from-stdin
|
|
416
|
+
[--correlation-id ID]` — Record independent worker/manager "what's next"
|
|
417
|
+
proposals for a completion turn. The worker proposal must be written first,
|
|
418
|
+
and manager-side reads are redacted until the manager submits its own
|
|
419
|
+
proposal.
|
|
420
|
+
- `continuation <task> --review --from-stdin [--correlation-id ID]` — Record a
|
|
421
|
+
structured reviewer verdict over the paired continuation proposals. This
|
|
422
|
+
requires `context.spawn_reviewer` permission and reviewer separation metadata
|
|
423
|
+
(`subagent_run.reviewer_session_id` distinct from the manager and
|
|
424
|
+
`manager_rollout_access=false`). Divergent reviews are routed for operator
|
|
425
|
+
attention unless `--nudge-on-completion auto-proceed` is configured.
|
|
426
|
+
- `continuation-reviewer <task> --correlation-id ID --reviewer-session-id ID
|
|
427
|
+
--manager-session-id ID --reviewer-command ...` — Run a reviewer command with
|
|
428
|
+
the allowed read-only context on stdin, capture reviewer metadata, and persist
|
|
429
|
+
the structured review. The context includes paired proposals, acceptance
|
|
430
|
+
criteria, manager config summary, diff metadata, and recent PR metadata; it
|
|
431
|
+
does not include manager rollout context. Reviewer commands run from an
|
|
432
|
+
isolated temporary cwd with a stripped environment and, on macOS, through
|
|
433
|
+
`sandbox-exec`. The sandbox keeps the targeted denial of bound
|
|
434
|
+
worker/manager rollout files plus the active control database and sidecars,
|
|
435
|
+
and also denies direct reads of the active `.codex-workers` state root so
|
|
436
|
+
legacy session files, transcripts, capture metadata, task state, and exports
|
|
437
|
+
are not available through filesystem reads. The allowed reviewer context still
|
|
438
|
+
arrives on stdin, and replay/audit/export commands outside this reviewer
|
|
439
|
+
subprocess are unchanged. Sandbox setup failures, reviewer command failures,
|
|
440
|
+
timeouts, or invalid JSON are recorded as `verdict=stop`, not silent approvals.
|
|
441
|
+
Use `--dry-run` to inspect the exact context without running the command.
|
|
442
|
+
- `continuation <task> --list [--as-role all|worker|manager|reviewer]
|
|
443
|
+
[--include-payload]` — List continuation proposals and reviews with
|
|
444
|
+
role-aware payload redaction.
|
|
445
|
+
- `record-decision <task> <wait|nudge|interrupt|escalate|stop|inspect>
|
|
446
|
+
--reason R [--cycle-id N] [--payload-json JSON]` — Persist a manager
|
|
447
|
+
decision and print its id. Use this before strict mutating commands that
|
|
448
|
+
require `--decision-id`.
|
|
449
|
+
- `compact-worker <task> --reason R [--clear] [--prompt-only]` — Convenience
|
|
450
|
+
wrapper that records a `nudge` manager decision, then sends Codex `/compact`
|
|
451
|
+
to the worker through the same strict audited path as
|
|
452
|
+
`request-worker-compact`. Use `--clear` to send `/clear`.
|
|
453
|
+
- `request-worker-compact <task> --decision-id N --strict-decisions` — Send
|
|
454
|
+
Codex `/compact` to the worker through the audited path. Use `--clear` to
|
|
455
|
+
send `/clear`, or `--prompt-only` to send an explanatory prompt instead.
|
|
456
|
+
Fails closed unless `worker_compact_clear` is enabled in manager config and
|
|
457
|
+
a worker handoff exists. Records a durable command and audit events
|
|
458
|
+
before/after sending the worker instruction. `--dry-run` still records the
|
|
459
|
+
command in `commands`, `replay`, and `mutation-audit` with `dry_run: true`
|
|
460
|
+
and `sent: false`.
|
|
461
|
+
- `bind --task T --worker W --manager M` — Create the task binding.
|
|
462
|
+
- `unbind --task T` — End the active binding for a task.
|
|
463
|
+
- `finish-task <task> [--reason R] [--require-criteria-audit]
|
|
464
|
+
[--require-acks] [--require-epilogue] [--require-adversarial-proof]
|
|
465
|
+
[--stop-manager] [--stop-worker]
|
|
466
|
+
[--capture-transcript-before-stop]` — Mark a task done.
|
|
467
|
+
Leaves the manager terminal open by default for review. With
|
|
468
|
+
`--require-criteria-audit`, fails before finishing if any acceptance criteria
|
|
469
|
+
for the task are still `accepted`; `proposed`, `satisfied`, `deferred`, and
|
|
470
|
+
`rejected` criteria do not block. With `--require-acks`, fails if worker or
|
|
471
|
+
manager acknowledgement is missing. With `--require-epilogue`, fails if any
|
|
472
|
+
configured epilogue step is not succeeded. With `--require-adversarial-proof`,
|
|
473
|
+
fails before finishing unless the task has at least one satisfied criterion
|
|
474
|
+
with `evidence_type=adversarial_check` and non-empty `failure_mode`, `check`,
|
|
475
|
+
and `result` fields; use this when `tests passed` is not enough by itself.
|
|
476
|
+
With `--capture-transcript-before-stop`, captures transcript segments for any
|
|
477
|
+
worker/manager sessions being stopped before killing tmux sessions; capture
|
|
478
|
+
failure fails before stop side effects.
|
|
479
|
+
- `stop-task <task> [--reason R] [--stop-worker]` — Force-stop a task's
|
|
480
|
+
manager (and optionally the worker), recording the reason in the audit
|
|
481
|
+
payload.
|
|
482
|
+
- `stop <session>` — Stop a tmux-backed worker or manager session by name. This
|
|
483
|
+
works for both legacy worker records and session-table workers/managers. For a
|
|
484
|
+
completed task with an active binding, prefer an idempotent cleanup pass with
|
|
485
|
+
`finish-task <task> --stop-manager --stop-worker` so the task audit records
|
|
486
|
+
the cleanup against the binding.
|
|
487
|
+
|
|
488
|
+
### Observation
|
|
489
|
+
|
|
490
|
+
- `dashboard [--task T] [--ensure-dispatch] [--dispatcher-id ID]
|
|
491
|
+
[--host 127.0.0.1] [--port 8797]` — Launch the
|
|
492
|
+
local live supervision cockpit. The dashboard binds to loopback by default,
|
|
493
|
+
uses the TypeScript backend to shell out to `conveyor` JSON commands, and
|
|
494
|
+
attaches interactive terminals to tmux-backed worker/manager sessions through
|
|
495
|
+
a WebSocket PTY bridge. It includes browser bootstrap controls for creating a
|
|
496
|
+
task, starting a worker/manager pair with `conveyor pair`, auto-attaching the
|
|
497
|
+
terminals, attach/bind controls, and audited action receipts for cycle,
|
|
498
|
+
nudge, interrupt, finish, and export. With `--ensure-dispatch`, launch also
|
|
499
|
+
ensures a Dispatch watch process using the supplied `--dispatcher-id` when
|
|
500
|
+
provided, reusing only a fresh heartbeat from that same dispatcher id. Use
|
|
501
|
+
`--dry-run --json` to inspect the launch command.
|
|
502
|
+
- `cycle <task> [--busy-wait-seconds N]` — One observation cycle. Idempotent. Runs `ingest`, computes
|
|
503
|
+
worker state from the JSON event stream, captures the tmux pane as a shadow
|
|
504
|
+
signal, writes a `manager_cycles` row, and returns a JSON dict the manager
|
|
505
|
+
Codex consumes. The `status_payload` includes:
|
|
506
|
+
- `worker_alive` / `manager_alive` — booleans computed by probing the registered session pids (`os.kill(pid, 0)`). `False` when the session's pid is `NULL` (legacy backfill) or the process has exited — useful for detecting silently-dead workers between cycles.
|
|
507
|
+
- `last_event_subtype` — the subtype of the most recent `codex_events` row for the worker, or `null` if no events exist.
|
|
508
|
+
- `task_completed` — `true` iff `last_event_subtype` is `"task_complete"`. Disambiguates "worker finished cleanly" from "worker idle but never started."
|
|
509
|
+
- `manager_context` — the latest `manager-config`, worker/manager
|
|
510
|
+
acknowledgements, `handoff`, and `acceptance_criteria` records for the
|
|
511
|
+
task, so each manager loop can reference the saved objective, living
|
|
512
|
+
acceptance criteria, categorized permissions, expected tools, acked
|
|
513
|
+
contract, worker progress, and next steps.
|
|
514
|
+
`manager_context.acceptance_criteria`
|
|
515
|
+
groups criteria by status, includes summary counts, and exposes `open` as
|
|
516
|
+
accepted criteria that still need proof before finishing.
|
|
517
|
+
`manager_context.criteria_negotiation` is advisory: when `needed` is true,
|
|
518
|
+
the manager should ask the worker for must-have current-task criteria versus
|
|
519
|
+
follow-up criteria, then record the result with `conveyor criteria`. The
|
|
520
|
+
field does not send nudges or mutate criteria automatically.
|
|
521
|
+
|
|
522
|
+
The `cycle` subcommand accepts `--busy-wait-seconds N` (default: 90) to tune the pane-signal classifier's stuck-busy threshold. Lower values flag stalls faster but increase false positives on long-running real work:
|
|
523
|
+
```bash
|
|
524
|
+
conveyor cycle my-task # default 90s threshold
|
|
525
|
+
conveyor cycle my-task --busy-wait-seconds 30 # tighter detection
|
|
526
|
+
```
|
|
527
|
+
- `ingest <session>` — Pull new events from a session's rollout JSONL into
|
|
528
|
+
the `codex_events` table. Tracks a byte offset, so subsequent runs only
|
|
529
|
+
pick up new events.
|
|
530
|
+
- `tail <session> [--limit N] [--subtype T] [--include-content]` — Print the
|
|
531
|
+
most recent events for a session, newest first. Text payload fields are
|
|
532
|
+
redacted by default; use `--include-content` only when stdout is redirected
|
|
533
|
+
or verbatim text is intentionally needed.
|
|
534
|
+
- `divergences <task> [--limit N]` — Cycles whose shadow pane signal flagged
|
|
535
|
+
a notable pattern (trust prompt, rate-limit prompt, approval prompt, etc.).
|
|
536
|
+
Useful for auditing the shadow signal against the JSON state.
|
|
537
|
+
- `dispatch [--once|--watch] [--limit N] [--interval SECONDS]
|
|
538
|
+
[--dispatcher-id ID] [--type notify_manager|nudge_worker|worker_task_complete]
|
|
539
|
+
[--watch-iterations N] [--lease-seconds N] [--dry-run] [--json]` — Run
|
|
540
|
+
Dispatch, the mechanical routing/actuation role.
|
|
541
|
+
`worker_task_complete` routing reads from `codex_events`, records
|
|
542
|
+
deduplicated `routed_notifications` keyed by the source event, and notifies
|
|
543
|
+
the bound manager without deciding task success. Explicit `notify_manager`
|
|
544
|
+
and `nudge_worker` command rows are atomically claimed, executed, and recorded
|
|
545
|
+
through `command_attempts` with conservative side-effect metadata. `--watch`
|
|
546
|
+
repeats polling with heartbeat telemetry; `--watch-iterations` bounds a watch
|
|
547
|
+
run for scripts and verification; `--lease-seconds` tunes command claim
|
|
548
|
+
recovery; `--once` performs one pass.
|
|
549
|
+
- `enqueue-notify-manager <task> --message "..." [--correlation-id C]
|
|
550
|
+
[--required-permission P] [--idempotency-key K] [--json]` — Queue a `notify_manager` command row for
|
|
551
|
+
Dispatch to claim and deliver to the bound manager.
|
|
552
|
+
- `enqueue-nudge-worker <task> --message "..." [--correlation-id C]
|
|
553
|
+
[--required-permission P] [--idempotency-key K] [--json]` — Queue a `nudge_worker` command row for
|
|
554
|
+
Dispatch to claim and deliver to the bound worker. Use this dispatcher-backed
|
|
555
|
+
route instead of `session-nudge` when the worker is registered without tmux;
|
|
556
|
+
the worker then receives the message through `worker-inbox`.
|
|
557
|
+
- `session-inbox <session> [--consume-next] [--wait] [--timeout N]
|
|
558
|
+
[--interval N] [--limit N] [--json]` — List or consume unconsumed routed
|
|
559
|
+
notifications addressed to a registered session. Text output includes the
|
|
560
|
+
pending count, signal type, delivery mode, source/target sessions, delivered
|
|
561
|
+
timestamp, and correlation id. Use `--consume-next --wait --json` for Codex
|
|
562
|
+
app long-polling; consumed items emit `dispatch_inbox_consumed` telemetry.
|
|
563
|
+
- `manager-inbox <task> [--consume-next] [--wait] [--timeout N] [--interval N]
|
|
564
|
+
[--limit N] [--json]` — Resolve the task's bound manager session and read its
|
|
565
|
+
dispatcher inbox.
|
|
566
|
+
- `worker-inbox <task> [--consume-next] [--wait] [--timeout N] [--interval N]
|
|
567
|
+
[--limit N] [--json]` — Resolve the task's bound worker session and read its
|
|
568
|
+
dispatcher inbox.
|
|
569
|
+
|
|
570
|
+
### Actuation
|
|
571
|
+
|
|
572
|
+
- `session-nudge <name> "<text>" [--dry-run]` — Send text plus Enter to the
|
|
573
|
+
session's tmux pane. Requires the session to have been registered with
|
|
574
|
+
`--tmux-session`. Managers running outside tmux cannot receive nudges; only
|
|
575
|
+
workers do.
|
|
576
|
+
- `session-interrupt <name> [--key K] [--followup T] [--dry-run]` — Send an
|
|
577
|
+
interrupt key (default `C-c`). Optional `--followup` text after the
|
|
578
|
+
interrupt.
|
|
579
|
+
|
|
580
|
+
### Audit
|
|
581
|
+
|
|
582
|
+
- `audit <task>` — Events history for a task. Lists `events`-table rows only.
|
|
583
|
+
`audit --json` redacts stored terminal/transcript content unless
|
|
584
|
+
`--include-content` is passed.
|
|
585
|
+
- `replay <task> [--format compact|timeline|transcript|full-transcript]
|
|
586
|
+
[--role all|worker|manager] [--limit N] [--include-content]` — Render a
|
|
587
|
+
chronological, human-readable reconstruction of the task. Cycle entries
|
|
588
|
+
include `[pane pattern: <pattern_id>]` when the shadow signal flagged
|
|
589
|
+
something. `full-transcript` is blocked unless `--include-content` is passed.
|
|
590
|
+
- `mutation-audit <task>` — Manager decisions and their consequences.
|
|
591
|
+
- `events <name>` — Worker events log.
|
|
592
|
+
- `commands [--task T] [--type T] [--state S] [--attempts]` — Durable
|
|
593
|
+
side-effect commands log. Use `--attempts` to include per-dispatcher attempt
|
|
594
|
+
history.
|
|
595
|
+
- `epilogue <task> --step run-tools|draft-pr|subagent-review|record-handoff
|
|
596
|
+
[--json] [--correlation-id ID]` — Run one configured epilogue step and record
|
|
597
|
+
its durable state. Use `--list` or `--status` to inspect configured steps and
|
|
598
|
+
latest run results.
|
|
599
|
+
- `telemetry [--run RUN] [--task TASK] [--search QUERY] [--summary] [--json]`
|
|
600
|
+
— Query local structured telemetry events, search them with SQLite FTS, or
|
|
601
|
+
print aggregate counts for a run/task. `telemetry snapshot --task <task>
|
|
602
|
+
--json` prints the task-scoped dashboard overview contract.
|
|
603
|
+
- `telemetry task <task> --json` — Print a task-scoped telemetry triage view:
|
|
604
|
+
recent cycle history, last successful cycle, worker/manager liveness,
|
|
605
|
+
decisions, commands, failed cycles/commands, ingest skipped/error summaries,
|
|
606
|
+
pane capture failures and notable patterns, open criteria counts, telemetry
|
|
607
|
+
counts, and retained storage counts. Raw transcript, pane, prompt, criterion,
|
|
608
|
+
command payload, and command result bodies are not included.
|
|
609
|
+
- `telemetry failures --json` — Print an operator failure triage view across
|
|
610
|
+
tasks: recent failed cycles, failed commands, ingest errors/skipped lines,
|
|
611
|
+
pane capture failures, open accepted criteria, active task/session health, and
|
|
612
|
+
retained storage counts without raw transcript or prompt content. Use `--task`,
|
|
613
|
+
`--run`, `--active-only`, or `--window 2h` to narrow the failure view for
|
|
614
|
+
recency or active-task triage.
|
|
615
|
+
- `telemetry metrics --window 24h --json` — Print bounded JSON rollups for
|
|
616
|
+
local telemetry and related tables: active tasks/sessions, cycle and command
|
|
617
|
+
success/failure counts, ingest/skipped-line totals, criteria counts,
|
|
618
|
+
reconcile drift counts, export counts, and retained capture/transcript bytes.
|
|
619
|
+
- `export-task <task> [--zip]` — Dump task status, audit, prompts, and
|
|
620
|
+
transcript metadata into an export bundle. Exports include
|
|
621
|
+
`telemetry-events.json`, `telemetry-summary.json`, and
|
|
622
|
+
`telemetry-report.md`; see `docs/local-telemetry-workflow.md`.
|
|
623
|
+
|
|
624
|
+
### Administration
|
|
625
|
+
|
|
626
|
+
- `doctor` — Local dependency and tmux health check.
|
|
627
|
+
- `doctor-self` — Verify the current Codex session can self-register.
|
|
628
|
+
- `db-doctor` — SQLite schema health check.
|
|
629
|
+
- `reconcile [--apply] [--stale-cycles-seconds N]` — Report (and optionally
|
|
630
|
+
fix) dead-pid sessions, dangling bindings, and stuck tasks. Default
|
|
631
|
+
stale-cycle threshold is 3600 seconds (1h); override with
|
|
632
|
+
`--stale-cycles-seconds N` to catch tasks where the manager has been silent
|
|
633
|
+
for shorter intervals. JSON output.
|
|
634
|
+
- `prune [--keep-latest N] [--dry-run]` — Drop old transcript content while
|
|
635
|
+
preserving metadata.
|
|
636
|
+
- `transcript-prune <task> [--keep-latest N]` — Same, scoped to a task.
|
|
637
|
+
- `transcript-capture <task> [--role R] [--mode M]` — Capture deduplicated
|
|
638
|
+
transcript segments. JSON output redacts raw captured terminal output by
|
|
639
|
+
default.
|
|
640
|
+
- `transcript-show <task> [--role R] [--include-content]` — Show stored
|
|
641
|
+
transcript segment metadata. Segment text is redacted unless
|
|
642
|
+
`--include-content` is passed.
|
|
643
|
+
- `qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor>` — Print a
|
|
644
|
+
repeatable manual QA checklist.
|
|
645
|
+
- `qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop> --receipt-output RECEIPT.json [--path DB]` —
|
|
646
|
+
Run a deterministic no-tmux QA harness and save a JSON receipt.
|
|
647
|
+
`ralph-loop-guardrails` proves max-iteration cutoff, missing-evidence
|
|
648
|
+
cutoff, fresh retry delivery after structured `adversarial_check` evidence,
|
|
649
|
+
and the `pr_ci_merge_loop` preset evidence gate. `generic-loop-template`
|
|
650
|
+
proves the `visual_diff_loop` template blocks before visual evidence,
|
|
651
|
+
rejects unstructured adversarial evidence, and delivers only after required
|
|
652
|
+
visual receipts plus structured adversarial proof exist.
|
|
653
|
+
`generic-loop-template-browser` runs the same `visual_diff_loop` gate proof
|
|
654
|
+
with a browser-rendered static HTML candidate screenshot, recording browser
|
|
655
|
+
backend, viewport, candidate HTML, screenshot, visual diff, and structured
|
|
656
|
+
adversarial evidence in the saved receipt. It uses the repo's Node
|
|
657
|
+
Playwright dependency and requires Chromium to be installed and launchable;
|
|
658
|
+
when unavailable, it fails with the browser-backed QA helper message.
|
|
659
|
+
`test-coverage-loop` proves the `test_coverage_loop` template blocks before
|
|
660
|
+
coverage evidence, rejects malformed adversarial evidence, and delivers only
|
|
661
|
+
after a structured coverage receipt plus adversarial proof exist.
|
|
662
|
+
`build-clear-loop` proves the non-coverage `build_then_clear` template
|
|
663
|
+
blocks before `build_passed` and `cleanup` receipts, still blocks after build
|
|
664
|
+
evidence alone, and delivers only after both build and cleanup evidence exist.
|
|
665
|
+
- `loop-triggers --list|--classify PROMPT [--json]` — List the controlled
|
|
666
|
+
natural-language loop triggers or classify a manager/operator prompt before
|
|
667
|
+
creating a loop policy or continuation gate. Approved trigger phrases include
|
|
668
|
+
the adversarial Ralph-loop, iteration, finish, worker-directed proof, and
|
|
669
|
+
manager-created adversarial criteria gates; generic caution like "be careful
|
|
670
|
+
and run tests" intentionally does not arm those gates.
|
|
671
|
+
- `loop-templates --list|--show TEMPLATE|--create-run TASK --template TEMPLATE` —
|
|
672
|
+
List generic loop templates or create a template-backed loop policy run.
|
|
673
|
+
Template-backed runs use the same Dispatch guardrails as Ralph-loop presets:
|
|
674
|
+
`max_iterations` prevents over-looping, and `required_before_continue`
|
|
675
|
+
evidence blocks a manager continuation before worker delivery until matching
|
|
676
|
+
satisfied criterion evidence exists. `ralph-loop-presets` remains as a
|
|
677
|
+
compatibility alias for the current Ralph-loop QA flows. The built-in
|
|
678
|
+
`visual_diff_loop` template requires `reference_artifact`,
|
|
679
|
+
`candidate_screenshot`, `visual_diff_report`, `diff_below_threshold`, and
|
|
680
|
+
`adversarial_check` evidence before a manager-requested next visual pass can
|
|
681
|
+
reach the worker. Quality-oriented templates (`pr_ci_merge_loop`,
|
|
682
|
+
`test_coverage_loop`, and `visual_diff_loop`) also expose an
|
|
683
|
+
`artifact_requirements["adversarial_check"]` object requiring
|
|
684
|
+
`failure_mode`, `check`, and `result` fields.
|
|
685
|
+
- `loop-status TASK --run RUN [--json]` — Summarize a Ralph-loop run for manager
|
|
686
|
+
review: policy template, iteration bounds, command states, routed
|
|
687
|
+
notifications, worker inbox backlog, evidence types, consumed-inbox
|
|
688
|
+
telemetry, failure counts, and a recommendation.
|
|
689
|
+
|
|
690
|
+
For real vertical slices, start with the Ralph loop operator guide in
|
|
691
|
+
`docs/qa/ralph-loop-operator-guide.md`. It explains the controlled
|
|
692
|
+
natural-language triggers, Dispatch authority model, worker inbox polling,
|
|
693
|
+
required evidence, adversarial proof, `loop-status`, and telemetry review pass
|
|
694
|
+
bar.
|
|
695
|
+
Use `create-disposable-binding` when the manager and worker are Codex app or
|
|
696
|
+
other no-tmux sessions and you want the same Dispatch rails without manual
|
|
697
|
+
task/session/bind setup.
|
|
698
|
+
- `enqueue-continue-iteration TASK --loop-run RUN --requested-iteration N` —
|
|
699
|
+
Queue a manager-requested next loop pass for Dispatch. The command refuses
|
|
700
|
+
same/current iteration requests before they become pending queue rows, while
|
|
701
|
+
Dispatch also blocks any stale same/current iteration command that reaches
|
|
702
|
+
the queue. Max-iteration and missing-evidence refusals remain Dispatch policy receipts.
|
|
703
|
+
JSON output includes `loop_policy`; delivered manager/worker inbox payloads
|
|
704
|
+
include the same `loop_policy` plus enriched `ralph_loop` metadata so tmux and
|
|
705
|
+
Codex app sessions can see the template, cleanup policy, required evidence,
|
|
706
|
+
artifact requirements, and recommended tools.
|
|
707
|
+
- `loop-evidence add TASK --loop-run RUN --iteration N --evidence-type TYPE` —
|
|
708
|
+
Record a run-qualified evidence receipt for a loop policy. Use
|
|
709
|
+
`loop-evidence visual-diff` to compare PNG screenshots, write an optional
|
|
710
|
+
diff/report artifact, and record `visual_diff_report` plus
|
|
711
|
+
`diff_below_threshold` as satisfied only when the computed score is within
|
|
712
|
+
threshold.
|
|
713
|
+
- `loop-evidence adversarial-check TASK --loop-run RUN --iteration N --failure-mode F --check C --result R` —
|
|
714
|
+
Record first-class adversarial proof for a loop iteration. Use it when a
|
|
715
|
+
manager or worker tried to disprove the iteration before continuing. The
|
|
716
|
+
receipt is stored as `evidence_type=adversarial_check` with structured
|
|
717
|
+
`failure_mode`, `check`, and `result` metadata and can satisfy Ralph-loop
|
|
718
|
+
continuation policy. See `docs/qa/adversarial-proof.md` for the receipt
|
|
719
|
+
shape and how it maps to manager prompts, Ralph-loop evidence, Dispatch
|
|
720
|
+
blocking, and audited finish.
|
|
721
|
+
- `qa-plan goalbuddy-conveyor` — Print the reusable natural-language starter
|
|
722
|
+
prompt and QA contract for autonomous GoalBuddy conveyor runs. Use it when a
|
|
723
|
+
manager should split broad work into one parent board plus sequential
|
|
724
|
+
vertical-slice child boards with PR/CI/merge receipts, satisfied-on-main
|
|
725
|
+
proof, and adversarial review gates. See `docs/qa/goalbuddy-conveyor.md`.
|
|
726
|
+
- `ralph-loop-presets --list|--show PRESET|--create-run TASK --preset PRESET` —
|
|
727
|
+
List saved Ralph-loop guardrail templates or create a preset-backed
|
|
728
|
+
`ralph_loop` policy run.
|
|
729
|
+
- `import-compat` — Dry-run or import existing `.codex-workers/<worker>/`
|
|
730
|
+
artifacts into SQLite.
|
|
731
|
+
|
|
732
|
+
### Worker setup
|
|
733
|
+
|
|
734
|
+
- `create <name> --cwd D --task "..."` — Full worker creation: spawn a tmux
|
|
735
|
+
session, start Codex, send the initial worker contract.
|
|
736
|
+
- `start <name> --cwd D` — Start a plain Codex session inside tmux without
|
|
737
|
+
registering a worker. Useful when you want to register it manually later.
|
|
738
|
+
- `start-test <name>` — Low-risk verification worker that only updates its
|
|
739
|
+
ignored `status.json`.
|
|
740
|
+
|
|
741
|
+
### QA Plans
|
|
742
|
+
|
|
743
|
+
Print repeatable live QA checklists from the CLI:
|
|
744
|
+
|
|
745
|
+
```bash
|
|
746
|
+
conveyor qa-plan self-management
|
|
747
|
+
conveyor qa-plan emergent-criteria
|
|
748
|
+
conveyor qa-plan emergent-criteria --json
|
|
749
|
+
conveyor qa-plan tmux-errors
|
|
750
|
+
conveyor qa-plan dispatch-completion
|
|
751
|
+
conveyor qa-plan ralph-loop
|
|
752
|
+
conveyor qa-plan adversarial-triggers
|
|
753
|
+
conveyor qa-plan goalbuddy-conveyor
|
|
754
|
+
conveyor qa-run ralph-loop-guardrails --receipt-output /tmp/ralph-loop-guardrails-receipt.json --json
|
|
755
|
+
conveyor qa-run generic-loop-template --receipt-output /tmp/generic-loop-template-receipt.json --json
|
|
756
|
+
conveyor qa-run generic-loop-template-browser --receipt-output /tmp/generic-loop-template-browser-receipt.json --json
|
|
757
|
+
conveyor qa-run test-coverage-loop --receipt-output /tmp/test-coverage-loop-receipt.json --json
|
|
758
|
+
conveyor qa-run adversarial-triggers --receipt-output /tmp/adversarial-triggers-receipt.json --json
|
|
759
|
+
conveyor qa-run build-clear-loop --receipt-output /tmp/build-clear-loop-receipt.json --json
|
|
760
|
+
conveyor loop-triggers --classify "Run this as an adversarially gated Ralph loop." --json
|
|
761
|
+
conveyor loop-templates --list --json
|
|
762
|
+
conveyor loop-templates --show visual_diff_loop --json
|
|
763
|
+
conveyor loop-evidence visual-diff qa-task --loop-run "$RUN_ID" --iteration 1 --reference reference.png --candidate candidate.png --threshold 0.02 --report-output visual-diff.json --diff-output visual-diff.png
|
|
764
|
+
conveyor ralph-loop-presets --list --json
|
|
765
|
+
```
|
|
766
|
+
|
|
767
|
+
General loop templates let operators create policy-backed runs without adding
|
|
768
|
+
bespoke Dispatch behavior for each loop shape. For example,
|
|
769
|
+
`conveyor loop-templates --create-run qa-task --template visual_diff_loop`
|
|
770
|
+
creates a visual-diff loop run whose `required_before_continue` evidence must
|
|
771
|
+
be recorded before the manager's next visual pass can reach the worker.
|
|
772
|
+
Existing `ralph-loop-presets` commands remain compatible aliases over the same
|
|
773
|
+
template-backed guardrails.
|
|
774
|
+
|
|
775
|
+
For natural-language control, run `conveyor loop-triggers --classify
|
|
776
|
+
"<manager prompt>" --json` before automatically creating policy or
|
|
777
|
+
continuation gates. Only a matched controlled trigger should create a
|
|
778
|
+
Ralph-loop policy, require `adversarial_check` before continuation, require
|
|
779
|
+
`finish-task --require-adversarial-proof`, or record worker-proposed
|
|
780
|
+
adversarial proof. The executable receipt check is
|
|
781
|
+
`conveyor qa-run adversarial-triggers --receipt-output
|
|
782
|
+
/tmp/adversarial-triggers-receipt.json --json`.
|
|
783
|
+
|
|
784
|
+
The `emergent-criteria` scenario covers a real worker/manager pair, criteria
|
|
785
|
+
negotiation, audited finish gating, replay/export evidence, and
|
|
786
|
+
`--stop-manager --stop-worker` cleanup verification. It also includes an
|
|
787
|
+
optional `criteria-plan` step for drafting reviewed criteria commands from the
|
|
788
|
+
worker's separated must-have and follow-up response.
|
|
789
|
+
|
|
790
|
+
The `tmux-errors` scenario covers read-only JSON degradation, mutating command
|
|
791
|
+
failures, pane capture degradation, stop failures, and reconcile recovery when
|
|
792
|
+
tmux is unavailable or a disposable tmux target disappears.
|
|
793
|
+
|
|
794
|
+
The `dispatch-completion` scenario covers the issue #113 completion-routing
|
|
795
|
+
flow: a worker `task_complete` signal is read from `codex_events`, Dispatch
|
|
796
|
+
records and deduplicates a routed notification, the bound manager receives a
|
|
797
|
+
mechanical wake-up, duplicate-route races emit suppressed telemetry without an
|
|
798
|
+
extra send, and audit/replay/dashboard surfaces show readable, chronological
|
|
799
|
+
dispatch evidence.
|
|
800
|
+
|
|
801
|
+
The `ralph-loop` scenario covers the issue #152 managed delivery loop: the
|
|
802
|
+
manager runs the same seed prompt through at least two iterations, requires
|
|
803
|
+
criteria and epilogue evidence, gates PR creation, CI monitoring/fixing, green
|
|
804
|
+
merge, handoff, and worker clear on explicit permissions, and proves the second
|
|
805
|
+
iteration starts after audited clear in fresh-worker isolation. Replay iterations
|
|
806
|
+
start with an inspect-first guard: if the previous iteration's work is already
|
|
807
|
+
merged, the worker records that state and stops without making replacement edits
|
|
808
|
+
or opening another PR unless something is actually missing. The same QA plan
|
|
809
|
+
also covers preset-backed guardrails such as `pr_ci_merge_loop`, where Dispatch
|
|
810
|
+
blocks another worker iteration until required `pr_url`, `ci_green`, `merge`,
|
|
811
|
+
and `adversarial_check` evidence exists.
|
|
812
|
+
|
|
813
|
+
### Terminal helpers
|
|
814
|
+
|
|
815
|
+
- `open <name>` — Open a macOS terminal window attached to a registered
|
|
816
|
+
worker.
|
|
817
|
+
- `open-worker <task>` — Open a terminal window for a task's worker without
|
|
818
|
+
spelling raw tmux session names.
|
|
819
|
+
- `open-manager <task>` — Same, for the task's manager.
|
|
820
|
+
|
|
821
|
+
### Low-level worker actions (legacy worker-name-keyed)
|
|
822
|
+
|
|
823
|
+
These commands operate against workers by name and predate the manual-binding
|
|
824
|
+
path. They remain useful for direct access against backfilled workers and for
|
|
825
|
+
debugging.
|
|
826
|
+
|
|
827
|
+
- `list` — List known workers.
|
|
828
|
+
- `status <name>` — Print worker status as JSON.
|
|
829
|
+
- `idle-check <name>` — Classify worker freshness and recommend an action.
|
|
830
|
+
- `capture <name> [--include-content]` — Capture recent terminal output.
|
|
831
|
+
Default output is metadata only; pass `--include-content` only when verbatim
|
|
832
|
+
pane text is intentionally needed.
|
|
833
|
+
- `nudge <name> "<text>"` — Legacy worker-directory nudge. For managed
|
|
834
|
+
session pairs, prefer `session-nudge <name> "<text>"`; `nudge` falls back to
|
|
835
|
+
session-name delivery when no legacy worker directory exists.
|
|
836
|
+
- `interrupt <name>` — Send an explicit interrupt key.
|
|
837
|
+
- `stop <name>` — Stop a worker tmux session.
|
|
838
|
+
- `update-status <name>` — Update a worker status contract.
|
|
839
|
+
- `classify --text "..."` — Debug the busy-wait pattern classifier.
|
|
840
|
+
|
|
841
|
+
## Manager Loop Pattern
|
|
842
|
+
|
|
843
|
+
A manager Codex drives supervision by calling `conveyor cycle <task>`
|
|
844
|
+
repeatedly. Each call:
|
|
845
|
+
|
|
846
|
+
1. Runs `ingest` against the worker's rollout JSONL (idempotent; picks up only
|
|
847
|
+
new bytes since the last cycle).
|
|
848
|
+
2. Computes the worker's current state from the JSON event stream
|
|
849
|
+
(`busy`, `idle`, or `unknown`). `task_started`/`user_message` set the
|
|
850
|
+
session to `busy`; `task_complete` sets it to `idle`; everything else does
|
|
851
|
+
not change state.
|
|
852
|
+
3. Captures the worker's tmux pane (if attached) and runs the legacy
|
|
853
|
+
pattern detector — surfaces `trust_prompt`, `rate_limit_prompt`,
|
|
854
|
+
`enter_to_confirm`, etc. as `notable_pane_pattern`. This is the shadow
|
|
855
|
+
signal: best-effort and supplementary.
|
|
856
|
+
4. Writes a row to `manager_cycles` so the full observation history is
|
|
857
|
+
replayable via `conveyor replay <task>`.
|
|
858
|
+
5. Returns a structured JSON dict.
|
|
859
|
+
|
|
860
|
+
The manager parses the JSON, decides whether to act, and optionally calls
|
|
861
|
+
`conveyor session-nudge` / `session-interrupt`. Then it loops.
|
|
862
|
+
|
|
863
|
+
```bash
|
|
864
|
+
conveyor cycle auth-refactor
|
|
865
|
+
# {
|
|
866
|
+
# "kind": "session_cycle",
|
|
867
|
+
# "task": "auth-refactor",
|
|
868
|
+
# "worker_session": "auth-worker",
|
|
869
|
+
# "manager_session": "auth-mgr",
|
|
870
|
+
# "ingest": { "new_events": 3, "new_offset": 12345 },
|
|
871
|
+
# "state": "busy",
|
|
872
|
+
# "last_state_event_at": "2026-05-11T14:32:11Z",
|
|
873
|
+
# "staleness_seconds": 4.2,
|
|
874
|
+
# "notable_pane_pattern": "trust_prompt",
|
|
875
|
+
# "pane_signal": {
|
|
876
|
+
# "captured": true,
|
|
877
|
+
# "classifier": { "pattern": "trust_prompt", ... },
|
|
878
|
+
# "notable_pattern": "trust_prompt"
|
|
879
|
+
# },
|
|
880
|
+
# "manager_context": {
|
|
881
|
+
# "manager_config": {...},
|
|
882
|
+
# "worker_handoff": {...},
|
|
883
|
+
# "acceptance_criteria": {
|
|
884
|
+
# "summary": {"proposed": 1, "accepted": 2, "satisfied": 0, "deferred": 1, "rejected": 0},
|
|
885
|
+
# "open": [...],
|
|
886
|
+
# "proposed": [...],
|
|
887
|
+
# "satisfied": [...],
|
|
888
|
+
# "deferred": [...],
|
|
889
|
+
# "rejected": [...]
|
|
890
|
+
# },
|
|
891
|
+
# "criteria_negotiation": {
|
|
892
|
+
# "needed": true,
|
|
893
|
+
# "reason": "no_criteria",
|
|
894
|
+
# "prompt": "Please propose 2-4 acceptance criteria for the current slice...",
|
|
895
|
+
# "suggested_actions": [...]
|
|
896
|
+
# }
|
|
897
|
+
# },
|
|
898
|
+
# "cycle_id": 17,
|
|
899
|
+
# ...
|
|
900
|
+
# }
|
|
901
|
+
```
|
|
902
|
+
|
|
903
|
+
If `notable_pane_pattern` is set the manager can branch on it directly —
|
|
904
|
+
e.g., on `trust_prompt` send Enter via `session-nudge` rather than waiting on
|
|
905
|
+
`staleness_seconds`. On `IngestError` (rollout missing or rotated), `cycle`
|
|
906
|
+
records a `state='failed'` row before re-raising so the audit trail still
|
|
907
|
+
captures the attempt.
|
|
908
|
+
|
|
909
|
+
**Audit convention.** Mutating commands (`session-nudge`,
|
|
910
|
+
`session-interrupt`) write to the `events` table. Observation and
|
|
911
|
+
dedicated-table commands (`cycle`, `ingest`) write to their own tables
|
|
912
|
+
(`manager_cycles`, `codex_events`) — those tables ARE the audit trail. The
|
|
913
|
+
plain-text `conveyor audit <task>` lists `events` rows only; cycle
|
|
914
|
+
observations show up via `conveyor replay <task>` and the `manager_cycles`
|
|
915
|
+
table.
|
|
916
|
+
|
|
917
|
+
## Phase 6 Polish
|
|
918
|
+
|
|
919
|
+
Recent additions to streamline worker setup and observability:
|
|
920
|
+
|
|
921
|
+
- `start-worker` convenience command for spawn-and-register in one call.
|
|
922
|
+
- `reconcile --stale-cycles-seconds N` to customize the stale-cycle threshold.
|
|
923
|
+
- Observability: `terminal_capture_error` / `terminal_fresh` fields in
|
|
924
|
+
status/idle JSON; `rollback_error` in nudge/interrupt audit payloads;
|
|
925
|
+
`skipped_lines` in `cycle` output's `ingest` field; stderr warnings on
|
|
926
|
+
malformed event lines and audit-insert failures.
|
|
927
|
+
|
|
928
|
+
## Phase 7 polish (2026-05-11)
|
|
929
|
+
|
|
930
|
+
Three quality-of-life additions following Phase 6 dogfood:
|
|
931
|
+
|
|
932
|
+
- **`sessions --state`** — by default, `conveyor sessions` now hides Phase 1 backfill rows (`pid IS NULL`) and rows marked `state='gone'`. Use `--state all` to inspect every row, `--state gone` for completed/dead registrations, or `--state active` for the default view.
|
|
933
|
+
- **`worker_alive` / `manager_alive` in cycle output** — every `conveyor cycle` JSON now includes these booleans, computed by `os.kill(pid, 0)` against the registered session pids. Surfaces silently-dead workers between cycles.
|
|
934
|
+
- **`cycle --busy-wait-seconds N`** — exposes the pane-signal classifier's stuck-busy threshold (previously hard-coded at 90s) as a per-cycle flag.
|
|
935
|
+
|
|
936
|
+
## Phase 8 classifier improvements (2026-05-12)
|
|
937
|
+
|
|
938
|
+
- **Recent event suppression for `long_running_interruptible`** — the classifier now weighs `recent_event_count` (from `ingest.new_events`) alongside `status_age_seconds`. When a worker is actively emitting events (>= 10/cycle), the `long_running_interruptible` flag is suppressed—the worker is healthy despite stale status.json. This stops false positives on long-running tools (e.g. test suites, large file reads) that stay busy but quiet on status updates.
|
|
939
|
+
|
|
940
|
+
## Dispatch and completion contracts
|
|
941
|
+
|
|
942
|
+
Dispatch is the mechanical core infrastructure between workers and managers. It
|
|
943
|
+
routes facts and executes queued side effects; it does not decide whether work
|
|
944
|
+
is correct, finish tasks, satisfy criteria, choose strategy, merge PRs, or route
|
|
945
|
+
to human operators.
|
|
946
|
+
|
|
947
|
+
Current dispatch state:
|
|
948
|
+
|
|
949
|
+
- `dispatch --once` routes bound worker `task_complete` signals from
|
|
950
|
+
`codex_events`, not the pane classifier.
|
|
951
|
+
- Routed completion notifications are deduplicated by source event id, recorded
|
|
952
|
+
in `routed_notifications`, and threaded with `correlation_id`.
|
|
953
|
+
- The session inbox is the same `routed_notifications` stream addressed by
|
|
954
|
+
`target_session_id`: tmux push is optional transport. Codex app-based sessions
|
|
955
|
+
should long-poll with `manager-inbox --consume-next --wait --json` or
|
|
956
|
+
`worker-inbox --consume-next --wait --json`. For disposable Ralph loops, use
|
|
957
|
+
the generated `worker_handoff` prompt so the worker keeps polling until no
|
|
958
|
+
inbox item remains or the loop reaches `max_iterations`.
|
|
959
|
+
- `register-worker`, `register-manager`, `sessions`, `discover`, and
|
|
960
|
+
`create-disposable-binding --json` expose a `communication` block per
|
|
961
|
+
session. Treat `session_kind='tmux'` plus `receive_style='push'` as direct
|
|
962
|
+
tmux-delivery capable; treat `session_kind='codex_app'` plus
|
|
963
|
+
`receive_style='pull'` as mailbox polling required for that worker or
|
|
964
|
+
manager.
|
|
965
|
+
- Template-backed `continue_iteration` deliveries include `loop_policy` in the
|
|
966
|
+
inbox payload, with template name, current/max iteration, cleanup policy,
|
|
967
|
+
required evidence, artifact requirements, and recommended tools. Codex
|
|
968
|
+
app-based workers receive the same policy context by polling that tmux workers
|
|
969
|
+
receive by push.
|
|
970
|
+
- A target with a tmux session records `delivery_mode='push'` after successful
|
|
971
|
+
tmux delivery. A target without tmux records `delivery_mode='pull_required'`
|
|
972
|
+
and remains unconsumed until the addressed session polls and consumes it.
|
|
973
|
+
- Consuming a mailbox item records `dispatch_inbox_consumed` telemetry with the
|
|
974
|
+
notification id, signal type, delivery mode, target session role, and poll
|
|
975
|
+
count, so manager/worker dispatcher handoffs are visible in audit evidence.
|
|
976
|
+
- If `doctor-self --json` reports `workerctl_on_path=false` inside a Codex app
|
|
977
|
+
session, run `conveyor ...` from the repository root or install the
|
|
978
|
+
local wrapper with `scripts/install-local --write`. Its `inside_tmux` check
|
|
979
|
+
describes the shell running `doctor-self`; for Codex app evidence, prefer the
|
|
980
|
+
rollout JSONL path, `lsof` lookup, and the registration role.
|
|
981
|
+
- When a live drill ingests a whole rollout, Dispatch may route older completion
|
|
982
|
+
signals before the target proof turn. Either ingest after the target worker
|
|
983
|
+
turn or have the manager consume/review older completion signals before
|
|
984
|
+
deciding on the current one.
|
|
985
|
+
- Explicit `notify_manager` and `nudge_worker` command rows can be processed by
|
|
986
|
+
Dispatch with atomic claim/lease metadata, durable `command_attempts`,
|
|
987
|
+
invalid-payload failure before side effects, and conservative tmux
|
|
988
|
+
side-effect started/completed flags.
|
|
989
|
+
- `dispatch --watch` continuously repeats the same mechanical polling loop with
|
|
990
|
+
dispatcher identity and heartbeat telemetry; `--watch-iterations N` bounds the
|
|
991
|
+
run and `--lease-seconds N` controls when attempted command claims become
|
|
992
|
+
recoverable.
|
|
993
|
+
- Replay/audit surfaces include routed notifications, command attempts, and
|
|
994
|
+
correlation chains where the data exists. Routed notification replay includes
|
|
995
|
+
delivery mode, source/target sessions, delivered timestamp, consumed-by
|
|
996
|
+
session, and consumed timestamp. The dashboard groups bound-task dispatch
|
|
997
|
+
correlation chains with command state, attempt counts, notification counts,
|
|
998
|
+
inbox pending/consumed counts, decision/cycle ids, source event ids,
|
|
999
|
+
suppressed-signal visibility, chronological ordering, and side-effect risk.
|
|
1000
|
+
- Dashboard manual QA should use
|
|
1001
|
+
`conveyor dashboard --task <task> --ensure-dispatch --dispatcher-id qa-dispatch-dashboard`
|
|
1002
|
+
and visually confirm the Dispatch active banner, dispatcher id, heartbeat age,
|
|
1003
|
+
iteration, processed count, dry-run/live state, completion/routing/cycle
|
|
1004
|
+
conversation lane entries, command claim/attempt/delivery entries, inbox
|
|
1005
|
+
pending/consumed counts, pull-required notification evidence where applicable,
|
|
1006
|
+
and stale or not-observed warnings.
|
|
1007
|
+
|
|
1008
|
+
The adjacent completion-contract surfaces are separate from Dispatch:
|
|
1009
|
+
|
|
1010
|
+
- Worker and manager acknowledgements persist the startup contract and can gate
|
|
1011
|
+
`cycle`/`finish-task`.
|
|
1012
|
+
- Epilogues are named post-completion steps that can gate `finish-task`.
|
|
1013
|
+
- Continuations persist worker-first and manager-independent "what's next"
|
|
1014
|
+
proposals plus a recorded reviewer verdict. The CLI enforces ordering,
|
|
1015
|
+
redaction, permission checks, reviewer separation metadata, and can run an
|
|
1016
|
+
independent restricted-context reviewer command through
|
|
1017
|
+
`continuation-reviewer`. Reviewer execution is additionally isolated with a
|
|
1018
|
+
temporary cwd, stripped environment, and macOS `sandbox-exec` denial of bound
|
|
1019
|
+
rollout/database reads plus direct reads under the active `.codex-workers`
|
|
1020
|
+
state root. That broader state-root denial applies only to the
|
|
1021
|
+
`continuation-reviewer` subprocess; normal replay, audit, export, and
|
|
1022
|
+
telemetry generation paths remain outside that sandbox.
|
|
1023
|
+
|
|
1024
|
+
## Schema
|
|
1025
|
+
|
|
1026
|
+
SQLite database at `.codex-workers/workerctl.db`. Key tables:
|
|
1027
|
+
|
|
1028
|
+
- `sessions` — Unified worker/manager registration.
|
|
1029
|
+
- `bindings` — Task ↔ worker session ↔ manager session.
|
|
1030
|
+
- `tasks` — Task records.
|
|
1031
|
+
- `codex_events` — Per-session JSONL events ingested from rollout files.
|
|
1032
|
+
- `manager_cycles` — One row per `cycle` invocation, with the full JSON
|
|
1033
|
+
payload as `status_json`.
|
|
1034
|
+
- `events` — Actuation audit log (`session_nudged`, `session_interrupted`,
|
|
1035
|
+
etc.).
|
|
1036
|
+
- `commands` — Durable side-effect command log, including Dispatch claim/lease
|
|
1037
|
+
metadata for queued command execution.
|
|
1038
|
+
- `command_attempts` — Per-dispatcher command execution attempts with
|
|
1039
|
+
side-effect started/completed flags and result/error payloads.
|
|
1040
|
+
- `routed_notifications` — Mechanical worker/manager routed facts and command
|
|
1041
|
+
delivery records, deduped and linked by `correlation_id`.
|
|
1042
|
+
- `task_acknowledgements` — Revisioned worker/manager startup contract
|
|
1043
|
+
acknowledgements.
|
|
1044
|
+
- `epilogue_runs` — Durable state for configured post-completion epilogue
|
|
1045
|
+
steps.
|
|
1046
|
+
- `task_continuations` / `continuation_reviews` — Worker/manager continuation
|
|
1047
|
+
proposals and reviewer verdicts for "what's next" review flows.
|
|
1048
|
+
- `workers`, `managers` — Legacy tables retained for read-only history.
|
|
1049
|
+
|
|
1050
|
+
`conveyor db-doctor` reports schema health. `conveyor reconcile` reports
|
|
1051
|
+
runtime drift (dead-pid sessions, dangling bindings, stuck tasks); add
|
|
1052
|
+
`--apply` to fix.
|
|
1053
|
+
|
|
1054
|
+
## Migration from the Legacy Path
|
|
1055
|
+
|
|
1056
|
+
Earlier prototypes used a worker-first promotion flow where a worker was
|
|
1057
|
+
created first and a manager was then spawned to supervise it. Those legacy
|
|
1058
|
+
commands have been retired. The new path inverts the model: register two
|
|
1059
|
+
already-running Codex sessions, create a task, bind them, and let the manager
|
|
1060
|
+
Codex drive observation via `cycle`.
|
|
1061
|
+
|
|
1062
|
+
The legacy database tables (`workers`, `managers`) remain readable via
|
|
1063
|
+
`audit`, `replay`, and `export-task` for historical reference, but no kept
|
|
1064
|
+
CLI command writes to them. To resume work on a legacy task, call
|
|
1065
|
+
`finish-task` on it and start fresh via `register-worker` +
|
|
1066
|
+
`register-manager` + `bind`.
|
|
1067
|
+
|
|
1068
|
+
## Tests
|
|
1069
|
+
|
|
1070
|
+
Release-candidate deterministic gate:
|
|
1071
|
+
|
|
1072
|
+
```bash
|
|
1073
|
+
scripts/rc-check --skip-live-smoke-repeat
|
|
1074
|
+
```
|
|
1075
|
+
|
|
1076
|
+
Full local release-candidate gate:
|
|
1077
|
+
|
|
1078
|
+
```bash
|
|
1079
|
+
scripts/rc-check --with-live-smoke-repeat
|
|
1080
|
+
```
|
|
1081
|
+
|
|
1082
|
+
Underlying deterministic checks:
|
|
1083
|
+
|
|
1084
|
+
```bash
|
|
1085
|
+
python3 -m unittest discover -s tests -v
|
|
1086
|
+
scripts/check-resource-warnings
|
|
1087
|
+
python3 -m py_compile scripts/workerctl scripts/check-resource-warnings workerctl/*.py
|
|
1088
|
+
npm run migration:audit:final
|
|
1089
|
+
scripts/package-smoke
|
|
1090
|
+
scripts/release-check
|
|
1091
|
+
```
|
|
1092
|
+
|
|
1093
|
+
For local parallel experiments, prefer:
|
|
1094
|
+
|
|
1095
|
+
```bash
|
|
1096
|
+
scripts/run-unittests-isolated
|
|
1097
|
+
```
|
|
1098
|
+
|
|
1099
|
+
This gives the process a temporary `WORKERCTL_STATE_ROOT` and a test namespace.
|
|
1100
|
+
The standard CI job remains serial.
|
|
1101
|
+
|
|
1102
|
+
GitHub Actions runs `scripts/rc-check --skip-live-smoke-repeat` and
|
|
1103
|
+
`scripts/package-smoke` on every push and pull request. The live smoke repeat
|
|
1104
|
+
remains local/manual because hosted runners may not have `codex`.
|
|
1105
|
+
The ResourceWarning gate intentionally fails on any `ResourceWarning` text in
|
|
1106
|
+
test output so finalization-time resource warnings cannot be hidden by a zero
|
|
1107
|
+
`unittest` exit status.
|
|
1108
|
+
|
|
1109
|
+
Live local smoke gate:
|
|
1110
|
+
|
|
1111
|
+
```bash
|
|
1112
|
+
scripts/live-smoke
|
|
1113
|
+
```
|
|
1114
|
+
|
|
1115
|
+
The live smoke requires macOS, `tmux`, `codex`, and `rg`. It starts disposable
|
|
1116
|
+
Codex worker/manager sessions, exercises `pair`, `cycle`, `session-nudge`,
|
|
1117
|
+
criteria mutation, transcript capture before stop, replay, mutation audit, and
|
|
1118
|
+
export, then verifies cleanup with `sessions --state active` and `reconcile`.
|
|
1119
|
+
It writes evidence under `docs/live-qa-artifacts/` and should leave no active
|
|
1120
|
+
smoke sessions, tmux panes, dangling bindings, or stuck tasks.
|
|
1121
|
+
|
|
1122
|
+
For the focused manual coverage pass, use
|
|
1123
|
+
[docs/manual-qa-checklist.md](docs/manual-qa-checklist.md).
|