agent-conveyor 0.1.13 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -139,8 +139,9 @@ release version that is not already on npm.
139
139
  For common manager setups, start with
140
140
  [`docs/manager-recipes.md`](docs/manager-recipes.md). It maps natural-language
141
141
  requests such as GoalBuddy conveyor runs, test coverage loops, UX polish loops,
142
- what-next nudging, and PR/CI/merge Ralph loops to concrete `manager-config`
143
- settings, permissions, evidence gates, cleanup behavior, and example
142
+ what-next nudging, PR/CI/merge Ralph loops, and autonomous ship-it loops to
143
+ concrete `manager-config` settings, permissions, evidence gates, cleanup
144
+ behavior, and example
144
145
  manager/Dispatch/worker interactions. Use `conveyor manager-recipes --list`
145
146
  or `conveyor manager-recipes --show goalbuddy-conveyor --json` for a
146
147
  machine-readable setup preview.
@@ -178,6 +179,12 @@ are unavailable, open a separate Codex app worker manually and paste the
178
179
  completion or blocker report must go back through the generated
179
180
  `enqueue-notify-manager` command and a bounded Dispatch watch tick; a direct
180
181
  Codex app final answer is not a durable manager receipt.
182
+ The live manager and worker sessions should also be readable as the primary
183
+ operator transcript: after consuming an inbox item, the consuming session must
184
+ print `CONVEYOR POLL`, `CONVEYOR RECEIVED`, `WORK`, `CONVEYOR SEND`, and
185
+ `DISPATCH` sections while the turn is happening. SQLite/replay/status output is
186
+ audit proof, not a replacement for the live session story. Idle polls may be a
187
+ single `CONVEYOR IDLE` line.
181
188
 
182
189
  Dispatch is core infrastructure for supervised worker/manager pairs. The
183
190
  `pair` workflow starts a detached Dispatch watch process by default so worker
@@ -195,6 +202,9 @@ Use `conveyor qa-plan adversarial-triggers` to verify natural-language
195
202
  manager prompts activate Ralph-loop adversarial gates.
196
203
  Use `conveyor qa-plan goalbuddy-conveyor` when a broad request should become
197
204
  sequential GoalBuddy child boards with PR/CI/merge receipts.
205
+ Use `conveyor qa-plan ship-it-loop` when a manager is allowed to push a branch,
206
+ open a PR, monitor CI, resolve bounded conflicts, and merge only after explicit
207
+ manager-owned merge evidence.
198
208
  Before cutting a manager loose, have it resolve the freeform setup request to a
199
209
  named recipe from `docs/manager-recipes.md` or an explicit `custom` setup, then
200
210
  show the saved mode, permissions, evidence gates, cleanup policy, and disallowed
@@ -377,7 +387,10 @@ tmux attach -t codex-live-test
377
387
  required to make an idle app thread poll autonomously. The worker handoff and
378
388
  worker heartbeat prompt also include the exact durable
379
389
  `enqueue-notify-manager` and one-iteration `dispatch --watch` commands that a
380
- worker must run after completing or blocking on a consumed item. Those
390
+ worker must run after completing or blocking on a consumed item. Those prompts
391
+ require live session transcript blocks for consumed items, so the operator can
392
+ inspect the actual Codex app sessions and see the same flow the durable inbox
393
+ later proves. Those
381
394
  recommendations also include `wakeup_dispatch_command` and
382
395
  `delivery_receipt_commands` for
383
396
  app-thread wake recovery. Use them to record sent, skipped, and blocked wake
@@ -385,7 +398,11 @@ tmux attach -t codex-live-test
385
398
  completion. The recommendations include a `teardown_policy`: an idle poll is
386
399
  only a quiet interval, not a reason to delete or pause heartbeat automation;
387
400
  heartbeat teardown belongs to the manager/operator after terminal closeout or
388
- explicit operator instruction.
401
+ explicit operator instruction. For same-thread Codex app visible-session
402
+ dogfood, prefer `--template app_visible_build_loop` or a custom adversarial
403
+ gate; reserve cleanup-gated templates such as `build_then_clear` for flows
404
+ that create a fresh worker context or can record a real cleanup receipt
405
+ between iterations.
389
406
  The optional
390
407
  Codex app thread metadata is normally supplied after a Codex app manager has
391
408
  used `create_thread` and `set_thread_title`; terminal-only users can omit it
@@ -456,7 +473,8 @@ tmux attach -t codex-live-test
456
473
  flags. Use `--interactive` only as a terminal fallback when a human is
457
474
  running `conveyor` directly.
458
475
  `--permit` grants taxonomy permissions such as `repo.open_pr`,
459
- `verification.run_pytest`, `context.spawn_reviewer`,
476
+ `repo.push_branch`, `repo.monitor_ci`, `repo.resolve_conflicts`,
477
+ `repo.merge_green_pr`, `verification.run_pytest`, `context.spawn_reviewer`,
460
478
  `communication.notify_operator`, or `worker_session.compact`. Use `--tool`
461
479
  to record expected verification/context tools, `--epilogue` for required
462
480
  built-in finish steps (`run-tools`, `draft-pr`, `subagent-review`,
@@ -497,7 +515,13 @@ tmux attach -t codex-live-test
497
515
  [--json]` — Draft reviewed `criteria --add` commands from a worker response
498
516
  that separates must-have current-task criteria from deferred follow-ups. This
499
517
  helper is read-only: it resolves the task and prints suggestions, but does not
500
- mutate acceptance criteria, events, or commands.
518
+ mutate acceptance criteria, events, or commands. If a proposed criterion
519
+ appears to describe manager closeout mechanics such as `finish-task`,
520
+ `--require-criteria-audit`, heartbeat teardown, or final manager reporting,
521
+ the helper emits a non-blocking warning and classifies that suggestion as
522
+ manager closeout proof. Keep that proof in the manager final report, audit,
523
+ replay, or epilogue evidence instead of accepted worker/task criteria unless
524
+ the task is explicitly Conveyor closeout QA.
501
525
  ```bash
502
526
  conveyor criteria-plan my-task --from-worker-response response.md --json
503
527
  ```
@@ -750,9 +774,9 @@ tmux attach -t codex-live-test
750
774
  - `transcript-show <task> [--role R] [--include-content]` — Show stored
751
775
  transcript segment metadata. Segment text is redacted unless
752
776
  `--include-content` is passed.
753
- - `qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor>` — Print a
777
+ - `qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor|ship-it-loop>` — Print a
754
778
  repeatable manual QA checklist.
755
- - `qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop> --receipt-output RECEIPT.json [--path DB]` —
779
+ - `qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop|ship-it-loop> --receipt-output RECEIPT.json [--path DB]` —
756
780
  Run a deterministic no-tmux QA harness and save a JSON receipt.
757
781
  `ralph-loop-guardrails` proves max-iteration cutoff, missing-evidence
758
782
  cutoff, fresh retry delivery after structured `adversarial_check` evidence,
@@ -772,6 +796,10 @@ tmux attach -t codex-live-test
772
796
  `build-clear-loop` proves the non-coverage `build_then_clear` template
773
797
  blocks before `build_passed` and `cleanup` receipts, still blocks after build
774
798
  evidence alone, and delivers only after both build and cleanup evidence exist.
799
+ `ship-it-loop` proves push, PR, and merge commands fail closed until their
800
+ permissions are granted, then proves the `ship_it_loop` lifecycle blocks
801
+ before branch, PR, CI, mergeability, manager decision, merge, post-merge, and
802
+ adversarial receipts exist.
775
803
  - `loop-triggers --list|--classify PROMPT [--json]` — List the controlled
776
804
  natural-language loop triggers or classify a manager/operator prompt before
777
805
  creating a loop policy or continuation gate. Approved trigger phrases include
@@ -785,11 +813,16 @@ tmux attach -t codex-live-test
785
813
  evidence blocks a manager continuation before worker delivery until matching
786
814
  satisfied criterion evidence exists. `ralph-loop-presets` remains as a
787
815
  compatibility alias for the current Ralph-loop QA flows. The built-in
816
+ `app_visible_build_loop` template requires `build_passed` plus structured
817
+ `adversarial_check` evidence, but no cleanup evidence, so visible Codex app
818
+ threads can continue without pretending that same-thread context was cleared.
819
+ The built-in
788
820
  `visual_diff_loop` template requires `reference_artifact`,
789
821
  `candidate_screenshot`, `visual_diff_report`, `diff_below_threshold`, and
790
822
  `adversarial_check` evidence before a manager-requested next visual pass can
791
- reach the worker. Quality-oriented templates (`pr_ci_merge_loop`,
792
- `test_coverage_loop`, and `visual_diff_loop`) also expose an
823
+ reach the worker. Quality-oriented templates (`app_visible_build_loop`,
824
+ `pr_ci_merge_loop`, `ship_it_loop`, `test_coverage_loop`, and
825
+ `visual_diff_loop`) also expose an
793
826
  `artifact_requirements["adversarial_check"]` object requiring
794
827
  `failure_mode`, `check`, and `result` fields.
795
828
  - `loop-status TASK --run RUN [--json]` — Summarize a Ralph-loop run for manager
@@ -812,7 +845,9 @@ thread tools are unavailable, create the binding anyway and paste the returned
812
845
  `worker_handoff` prompt into a manually opened worker session. The handoff
813
846
  requires a worker to report completion/blockers through
814
847
  `enqueue-notify-manager` plus a bounded Dispatch watch run before treating the
815
- manager as notified.
848
+ manager as notified, and to print the live `CONVEYOR POLL` / `CONVEYOR
849
+ RECEIVED` / `WORK` / `CONVEYOR SEND` / `DISPATCH` transcript in the worker
850
+ session for any consumed item.
816
851
  - `enqueue-continue-iteration TASK --loop-run RUN --requested-iteration N` —
817
852
  Queue a manager-requested next loop pass for Dispatch. The command refuses
818
853
  same/current iteration requests before they become pending queue rows, while
@@ -824,6 +859,8 @@ manager as notified.
824
859
  artifact requirements, and recommended tools.
825
860
  - `loop-evidence add TASK --loop-run RUN --iteration N --evidence-type TYPE` —
826
861
  Record a run-qualified evidence receipt for a loop policy. Use
862
+ `loop-evidence build-passed TASK --loop-run RUN --iteration N` as the
863
+ friendly alias for the common `evidence_type=build_passed` receipt. Use
827
864
  `loop-evidence visual-diff` to compare PNG screenshots, write an optional
828
865
  diff/report artifact, and record `visual_diff_report` plus
829
866
  `diff_below_threshold` as satisfied only when the computed score is within
@@ -869,14 +906,17 @@ conveyor qa-plan dispatch-completion
869
906
  conveyor qa-plan ralph-loop
870
907
  conveyor qa-plan adversarial-triggers
871
908
  conveyor qa-plan goalbuddy-conveyor
909
+ conveyor qa-plan ship-it-loop
872
910
  conveyor qa-run ralph-loop-guardrails --receipt-output /tmp/ralph-loop-guardrails-receipt.json --json
873
911
  conveyor qa-run generic-loop-template --receipt-output /tmp/generic-loop-template-receipt.json --json
874
912
  conveyor qa-run generic-loop-template-browser --receipt-output /tmp/generic-loop-template-browser-receipt.json --json
875
913
  conveyor qa-run test-coverage-loop --receipt-output /tmp/test-coverage-loop-receipt.json --json
876
914
  conveyor qa-run adversarial-triggers --receipt-output /tmp/adversarial-triggers-receipt.json --json
877
915
  conveyor qa-run build-clear-loop --receipt-output /tmp/build-clear-loop-receipt.json --json
916
+ conveyor qa-run ship-it-loop --receipt-output /tmp/ship-it-loop-receipt.json --json
878
917
  conveyor loop-triggers --classify "Run this as an adversarially gated Ralph loop." --json
879
918
  conveyor loop-templates --list --json
919
+ conveyor loop-templates --show ship_it_loop --json
880
920
  conveyor loop-templates --show visual_diff_loop --json
881
921
  conveyor loop-evidence visual-diff qa-task --loop-run "$RUN_ID" --iteration 1 --reference reference.png --candidate candidate.png --threshold 0.02 --report-output visual-diff.json --diff-output visual-diff.png
882
922
  conveyor ralph-loop-presets --list --json