agent-conveyor 0.1.12 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -139,8 +139,9 @@ release version that is not already on npm.
139
139
  For common manager setups, start with
140
140
  [`docs/manager-recipes.md`](docs/manager-recipes.md). It maps natural-language
141
141
  requests such as GoalBuddy conveyor runs, test coverage loops, UX polish loops,
142
- what-next nudging, and PR/CI/merge Ralph loops to concrete `manager-config`
143
- settings, permissions, evidence gates, cleanup behavior, and example
142
+ what-next nudging, PR/CI/merge Ralph loops, and autonomous ship-it loops to
143
+ concrete `manager-config` settings, permissions, evidence gates, cleanup
144
+ behavior, and example
144
145
  manager/Dispatch/worker interactions. Use `conveyor manager-recipes --list`
145
146
  or `conveyor manager-recipes --show goalbuddy-conveyor --json` for a
146
147
  machine-readable setup preview.
@@ -174,7 +175,16 @@ thread identity through `--worker-codex-app-thread-id` and
174
175
  deliver the generated `worker_handoff` bootstrap prompt. The raw terminal
175
176
  `conveyor` CLI does not create Codex app threads by itself; if app thread tools
176
177
  are unavailable, open a separate Codex app worker manually and paste the
177
- `worker_handoff` prompt.
178
+ `worker_handoff` prompt. After a worker consumes a manager instruction, its
179
+ completion or blocker report must go back through the generated
180
+ `enqueue-notify-manager` command and a bounded Dispatch watch tick; a direct
181
+ Codex app final answer is not a durable manager receipt.
182
+ The live manager and worker sessions should also be readable as the primary
183
+ operator transcript: after consuming an inbox item, the consuming session must
184
+ print `CONVEYOR POLL`, `CONVEYOR RECEIVED`, `WORK`, `CONVEYOR SEND`, and
185
+ `DISPATCH` sections while the turn is happening. SQLite/replay/status output is
186
+ audit proof, not a replacement for the live session story. Idle polls may be a
187
+ single `CONVEYOR IDLE` line.
178
188
 
179
189
  Dispatch is core infrastructure for supervised worker/manager pairs. The
180
190
  `pair` workflow starts a detached Dispatch watch process by default so worker
@@ -192,6 +202,9 @@ Use `conveyor qa-plan adversarial-triggers` to verify natural-language
192
202
  manager prompts activate Ralph-loop adversarial gates.
193
203
  Use `conveyor qa-plan goalbuddy-conveyor` when a broad request should become
194
204
  sequential GoalBuddy child boards with PR/CI/merge receipts.
205
+ Use `conveyor qa-plan ship-it-loop` when a manager is allowed to push a branch,
206
+ open a PR, monitor CI, resolve bounded conflicts, and merge only after explicit
207
+ manager-owned merge evidence.
195
208
  Before cutting a manager loose, have it resolve the freeform setup request to a
196
209
  named recipe from `docs/manager-recipes.md` or an explicit `custom` setup, then
197
210
  show the saved mode, permissions, evidence gates, cleanup policy, and disallowed
@@ -371,14 +384,25 @@ tmux attach -t codex-live-test
371
384
  Codex app sessions, the JSON output also includes
372
385
  `heartbeat_recommendations` with role-specific poll prompts; Dispatch can
373
386
  deliver into those inboxes, but a heartbeat or operator wake-up is still
374
- required to make an idle app thread poll autonomously. Those recommendations
375
- also include `wakeup_dispatch_command` and `delivery_receipt_commands` for
387
+ required to make an idle app thread poll autonomously. The worker handoff and
388
+ worker heartbeat prompt also include the exact durable
389
+ `enqueue-notify-manager` and one-iteration `dispatch --watch` commands that a
390
+ worker must run after completing or blocking on a consumed item. Those prompts
391
+ require live session transcript blocks for consumed items, so the operator can
392
+ inspect the actual Codex app sessions and see the same flow the durable inbox
393
+ later proves. Those
394
+ recommendations also include `wakeup_dispatch_command` and
395
+ `delivery_receipt_commands` for
376
396
  app-thread wake recovery. Use them to record sent, skipped, and blocked wake
377
397
  outcomes after `app-wakeup-dispatch`; an app-thread send is not task
378
398
  completion. The recommendations include a `teardown_policy`: an idle poll is
379
399
  only a quiet interval, not a reason to delete or pause heartbeat automation;
380
400
  heartbeat teardown belongs to the manager/operator after terminal closeout or
381
- explicit operator instruction.
401
+ explicit operator instruction. For same-thread Codex app visible-session
402
+ dogfood, prefer `--template app_visible_build_loop` or a custom adversarial
403
+ gate; reserve cleanup-gated templates such as `build_then_clear` for flows
404
+ that create a fresh worker context or can record a real cleanup receipt
405
+ between iterations.
382
406
  The optional
383
407
  Codex app thread metadata is normally supplied after a Codex app manager has
384
408
  used `create_thread` and `set_thread_title`; terminal-only users can omit it
@@ -405,7 +429,8 @@ tmux attach -t codex-live-test
405
429
  a matching `ready_to_send` action with `send_ready=true` and the same thread
406
430
  id; healthy and blocked roles must be recorded as `skipped` or `blocked`.
407
431
  - `app-autopilot start|stop|status TASK [--dispatcher-id ID]
408
- [--interval SECONDS] [--watch-iterations N] [--stale-after N] [--json]` —
432
+ [--interval SECONDS] [--watch-iterations N] [--stale-after N]
433
+ [--quiet-after N] [--json]` —
409
434
  Manage the pair-level app-native heartbeat policy for the active
410
435
  manager/worker binding. `start` and `stop` write telemetry receipts and emit
411
436
  the exact manager/worker Codex app heartbeat automation specs plus the
@@ -413,6 +438,11 @@ tmux attach -t codex-live-test
413
438
  thread tools, so create/pause those heartbeat automations from a Codex app
414
439
  operator session using the emitted specs; Conveyor remains the durable source
415
440
  of truth through Dispatch, inboxes, wake receipts, and app heartbeat status.
441
+ `status` also reports `plan.quiescence`: when the loop is healthy, has no
442
+ `next_actions`, and both roles have produced `--quiet-after` paired
443
+ heartbeats since the last command or inbox-consumption receipt, it recommends
444
+ `stop_autopilot` so operators can quiesce blocked/no-progress loops instead
445
+ of repeating idle pulses.
416
446
  - `discover [QUERY] [--all] [--limit N]` / `search [QUERY]` — Search tasks,
417
447
  registered sessions, active bindings, and recent telemetry in one JSON result.
418
448
  Use this for conversational setup when a manager or Codex session needs to
@@ -443,7 +473,8 @@ tmux attach -t codex-live-test
443
473
  flags. Use `--interactive` only as a terminal fallback when a human is
444
474
  running `conveyor` directly.
445
475
  `--permit` grants taxonomy permissions such as `repo.open_pr`,
446
- `verification.run_pytest`, `context.spawn_reviewer`,
476
+ `repo.push_branch`, `repo.monitor_ci`, `repo.resolve_conflicts`,
477
+ `repo.merge_green_pr`, `verification.run_pytest`, `context.spawn_reviewer`,
447
478
  `communication.notify_operator`, or `worker_session.compact`. Use `--tool`
448
479
  to record expected verification/context tools, `--epilogue` for required
449
480
  built-in finish steps (`run-tools`, `draft-pr`, `subagent-review`,
@@ -484,7 +515,13 @@ tmux attach -t codex-live-test
484
515
  [--json]` — Draft reviewed `criteria --add` commands from a worker response
485
516
  that separates must-have current-task criteria from deferred follow-ups. This
486
517
  helper is read-only: it resolves the task and prints suggestions, but does not
487
- mutate acceptance criteria, events, or commands.
518
+ mutate acceptance criteria, events, or commands. If a proposed criterion
519
+ appears to describe manager closeout mechanics such as `finish-task`,
520
+ `--require-criteria-audit`, heartbeat teardown, or final manager reporting,
521
+ the helper emits a non-blocking warning and classifies that suggestion as
522
+ manager closeout proof. Keep that proof in the manager final report, audit,
523
+ replay, or epilogue evidence instead of accepted worker/task criteria unless
524
+ the task is explicitly Conveyor closeout QA.
488
525
  ```bash
489
526
  conveyor criteria-plan my-task --from-worker-response response.md --json
490
527
  ```
@@ -640,7 +677,10 @@ tmux attach -t codex-live-test
640
677
  recovery; `--once` performs one pass.
641
678
  - `enqueue-notify-manager <task> --message "..." [--correlation-id C]
642
679
  [--required-permission P] [--idempotency-key K] [--json]` — Queue a `notify_manager` command row for
643
- Dispatch to claim and deliver to the bound manager.
680
+ Dispatch to claim and deliver to the bound manager. Codex app/no-tmux
681
+ workers must use this route for completion and blocker reports after
682
+ consuming a manager instruction; direct app-thread final answers are local
683
+ text, not manager inbox receipts.
644
684
  - `enqueue-nudge-worker <task> --message "..." [--correlation-id C]
645
685
  [--required-permission P] [--idempotency-key K] [--json]` — Queue a `nudge_worker` command row for
646
686
  Dispatch to claim and deliver to the bound worker. Use this dispatcher-backed
@@ -734,9 +774,9 @@ tmux attach -t codex-live-test
734
774
  - `transcript-show <task> [--role R] [--include-content]` — Show stored
735
775
  transcript segment metadata. Segment text is redacted unless
736
776
  `--include-content` is passed.
737
- - `qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor>` — Print a
777
+ - `qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor|ship-it-loop>` — Print a
738
778
  repeatable manual QA checklist.
739
- - `qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop> --receipt-output RECEIPT.json [--path DB]` —
779
+ - `qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop|ship-it-loop> --receipt-output RECEIPT.json [--path DB]` —
740
780
  Run a deterministic no-tmux QA harness and save a JSON receipt.
741
781
  `ralph-loop-guardrails` proves max-iteration cutoff, missing-evidence
742
782
  cutoff, fresh retry delivery after structured `adversarial_check` evidence,
@@ -756,6 +796,10 @@ tmux attach -t codex-live-test
756
796
  `build-clear-loop` proves the non-coverage `build_then_clear` template
757
797
  blocks before `build_passed` and `cleanup` receipts, still blocks after build
758
798
  evidence alone, and delivers only after both build and cleanup evidence exist.
799
+ `ship-it-loop` proves push, PR, and merge commands fail closed until their
800
+ permissions are granted, then proves the `ship_it_loop` lifecycle blocks
801
+ before branch, PR, CI, mergeability, manager decision, merge, post-merge, and
802
+ adversarial receipts exist.
759
803
  - `loop-triggers --list|--classify PROMPT [--json]` — List the controlled
760
804
  natural-language loop triggers or classify a manager/operator prompt before
761
805
  creating a loop policy or continuation gate. Approved trigger phrases include
@@ -769,11 +813,16 @@ tmux attach -t codex-live-test
769
813
  evidence blocks a manager continuation before worker delivery until matching
770
814
  satisfied criterion evidence exists. `ralph-loop-presets` remains as a
771
815
  compatibility alias for the current Ralph-loop QA flows. The built-in
816
+ `app_visible_build_loop` template requires `build_passed` plus structured
817
+ `adversarial_check` evidence, but no cleanup evidence, so visible Codex app
818
+ threads can continue without pretending that same-thread context was cleared.
819
+ The built-in
772
820
  `visual_diff_loop` template requires `reference_artifact`,
773
821
  `candidate_screenshot`, `visual_diff_report`, `diff_below_threshold`, and
774
822
  `adversarial_check` evidence before a manager-requested next visual pass can
775
- reach the worker. Quality-oriented templates (`pr_ci_merge_loop`,
776
- `test_coverage_loop`, and `visual_diff_loop`) also expose an
823
+ reach the worker. Quality-oriented templates (`app_visible_build_loop`,
824
+ `pr_ci_merge_loop`, `ship_it_loop`, `test_coverage_loop`, and
825
+ `visual_diff_loop`) also expose an
777
826
  `artifact_requirements["adversarial_check"]` object requiring
778
827
  `failure_mode`, `check`, and `result` fields.
779
828
  - `loop-status TASK --run RUN [--json]` — Summarize a Ralph-loop run for manager
@@ -793,7 +842,12 @@ same-project `create_thread` worker plus `set_thread_title` before creating the
793
842
  binding, then pass the worker thread id/title into Conveyor. Use `fork_thread`
794
843
  only when the user explicitly asks to fork or resume this conversation. If app
795
844
  thread tools are unavailable, create the binding anyway and paste the returned
796
- `worker_handoff` prompt into a manually opened worker session.
845
+ `worker_handoff` prompt into a manually opened worker session. The handoff
846
+ requires a worker to report completion/blockers through
847
+ `enqueue-notify-manager` plus a bounded Dispatch watch run before treating the
848
+ manager as notified, and to print the live `CONVEYOR POLL` / `CONVEYOR
849
+ RECEIVED` / `WORK` / `CONVEYOR SEND` / `DISPATCH` transcript in the worker
850
+ session for any consumed item.
797
851
  - `enqueue-continue-iteration TASK --loop-run RUN --requested-iteration N` —
798
852
  Queue a manager-requested next loop pass for Dispatch. The command refuses
799
853
  same/current iteration requests before they become pending queue rows, while
@@ -805,6 +859,8 @@ thread tools are unavailable, create the binding anyway and paste the returned
805
859
  artifact requirements, and recommended tools.
806
860
  - `loop-evidence add TASK --loop-run RUN --iteration N --evidence-type TYPE` —
807
861
  Record a run-qualified evidence receipt for a loop policy. Use
862
+ `loop-evidence build-passed TASK --loop-run RUN --iteration N` as the
863
+ friendly alias for the common `evidence_type=build_passed` receipt. Use
808
864
  `loop-evidence visual-diff` to compare PNG screenshots, write an optional
809
865
  diff/report artifact, and record `visual_diff_report` plus
810
866
  `diff_below_threshold` as satisfied only when the computed score is within
@@ -850,14 +906,17 @@ conveyor qa-plan dispatch-completion
850
906
  conveyor qa-plan ralph-loop
851
907
  conveyor qa-plan adversarial-triggers
852
908
  conveyor qa-plan goalbuddy-conveyor
909
+ conveyor qa-plan ship-it-loop
853
910
  conveyor qa-run ralph-loop-guardrails --receipt-output /tmp/ralph-loop-guardrails-receipt.json --json
854
911
  conveyor qa-run generic-loop-template --receipt-output /tmp/generic-loop-template-receipt.json --json
855
912
  conveyor qa-run generic-loop-template-browser --receipt-output /tmp/generic-loop-template-browser-receipt.json --json
856
913
  conveyor qa-run test-coverage-loop --receipt-output /tmp/test-coverage-loop-receipt.json --json
857
914
  conveyor qa-run adversarial-triggers --receipt-output /tmp/adversarial-triggers-receipt.json --json
858
915
  conveyor qa-run build-clear-loop --receipt-output /tmp/build-clear-loop-receipt.json --json
916
+ conveyor qa-run ship-it-loop --receipt-output /tmp/ship-it-loop-receipt.json --json
859
917
  conveyor loop-triggers --classify "Run this as an adversarially gated Ralph loop." --json
860
918
  conveyor loop-templates --list --json
919
+ conveyor loop-templates --show ship_it_loop --json
861
920
  conveyor loop-templates --show visual_diff_loop --json
862
921
  conveyor loop-evidence visual-diff qa-task --loop-run "$RUN_ID" --iteration 1 --reference reference.png --candidate candidate.png --threshold 0.02 --report-output visual-diff.json --diff-output visual-diff.png
863
922
  conveyor ralph-loop-presets --list --json