npm - agent-conveyor - Versions diffs - 0.1.12 → 0.1.14 - Mend

agent-conveyor 0.1.12 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +74 -15
package/dist/cli/typescript-runtime.js +598 -13
package/dist/cli/typescript-runtime.js.map +1 -1
package/dist/runtime/app-autonomy.d.ts +12 -0
package/dist/runtime/app-autonomy.js +119 -0
package/dist/runtime/app-autonomy.js.map +1 -1
package/dist/runtime/manager-permissions.js +1 -1
package/dist/runtime/manager-permissions.js.map +1 -1
package/docs/manager-recipes.md +90 -0
package/package.json +1 -1
package/skills/manage-codex-workers/SKILL.md +93 -10

package/README.md CHANGED Viewed

@@ -139,8 +139,9 @@ release version that is not already on npm.
 For common manager setups, start with
 [`docs/manager-recipes.md`](docs/manager-recipes.md). It maps natural-language
 requests such as GoalBuddy conveyor runs, test coverage loops, UX polish loops,
-what-next nudging, and PR/CI/merge Ralph loops to concrete `manager-config`
-settings, permissions, evidence gates, cleanup behavior, and example
+what-next nudging, PR/CI/merge Ralph loops, and autonomous ship-it loops to
+concrete `manager-config` settings, permissions, evidence gates, cleanup
+behavior, and example
 manager/Dispatch/worker interactions. Use `conveyor manager-recipes --list`
 or `conveyor manager-recipes --show goalbuddy-conveyor --json` for a
 machine-readable setup preview.
@@ -174,7 +175,16 @@ thread identity through `--worker-codex-app-thread-id` and
 deliver the generated `worker_handoff` bootstrap prompt. The raw terminal
 `conveyor` CLI does not create Codex app threads by itself; if app thread tools
 are unavailable, open a separate Codex app worker manually and paste the
-`worker_handoff` prompt.
+`worker_handoff` prompt. After a worker consumes a manager instruction, its
+completion or blocker report must go back through the generated
+`enqueue-notify-manager` command and a bounded Dispatch watch tick; a direct
+Codex app final answer is not a durable manager receipt.
+The live manager and worker sessions should also be readable as the primary
+operator transcript: after consuming an inbox item, the consuming session must
+print `CONVEYOR POLL`, `CONVEYOR RECEIVED`, `WORK`, `CONVEYOR SEND`, and
+`DISPATCH` sections while the turn is happening. SQLite/replay/status output is
+audit proof, not a replacement for the live session story. Idle polls may be a
+single `CONVEYOR IDLE` line.
 Dispatch is core infrastructure for supervised worker/manager pairs. The
 `pair` workflow starts a detached Dispatch watch process by default so worker
@@ -192,6 +202,9 @@ Use `conveyor qa-plan adversarial-triggers` to verify natural-language
 manager prompts activate Ralph-loop adversarial gates.
 Use `conveyor qa-plan goalbuddy-conveyor` when a broad request should become
 sequential GoalBuddy child boards with PR/CI/merge receipts.
+Use `conveyor qa-plan ship-it-loop` when a manager is allowed to push a branch,
+open a PR, monitor CI, resolve bounded conflicts, and merge only after explicit
+manager-owned merge evidence.
 Before cutting a manager loose, have it resolve the freeform setup request to a
 named recipe from `docs/manager-recipes.md` or an explicit `custom` setup, then
 show the saved mode, permissions, evidence gates, cleanup policy, and disallowed
@@ -371,14 +384,25 @@ tmux attach -t codex-live-test
   Codex app sessions, the JSON output also includes
   `heartbeat_recommendations` with role-specific poll prompts; Dispatch can
   deliver into those inboxes, but a heartbeat or operator wake-up is still
-  required to make an idle app thread poll autonomously. Those recommendations
-  also include `wakeup_dispatch_command` and `delivery_receipt_commands` for
+  required to make an idle app thread poll autonomously. The worker handoff and
+  worker heartbeat prompt also include the exact durable
+  `enqueue-notify-manager` and one-iteration `dispatch --watch` commands that a
+  worker must run after completing or blocking on a consumed item. Those prompts
+  require live session transcript blocks for consumed items, so the operator can
+  inspect the actual Codex app sessions and see the same flow the durable inbox
+  later proves. Those
+  recommendations also include `wakeup_dispatch_command` and
+  `delivery_receipt_commands` for
   app-thread wake recovery. Use them to record sent, skipped, and blocked wake
   outcomes after `app-wakeup-dispatch`; an app-thread send is not task
   completion. The recommendations include a `teardown_policy`: an idle poll is
   only a quiet interval, not a reason to delete or pause heartbeat automation;
   heartbeat teardown belongs to the manager/operator after terminal closeout or
-  explicit operator instruction.
+  explicit operator instruction. For same-thread Codex app visible-session
+  dogfood, prefer `--template app_visible_build_loop` or a custom adversarial
+  gate; reserve cleanup-gated templates such as `build_then_clear` for flows
+  that create a fresh worker context or can record a real cleanup receipt
+  between iterations.
   The optional
   Codex app thread metadata is normally supplied after a Codex app manager has
   used `create_thread` and `set_thread_title`; terminal-only users can omit it
@@ -405,7 +429,8 @@ tmux attach -t codex-live-test
   a matching `ready_to_send` action with `send_ready=true` and the same thread
   id; healthy and blocked roles must be recorded as `skipped` or `blocked`.
 - `app-autopilot start|stop|status TASK [--dispatcher-id ID]
-  [--interval SECONDS] [--watch-iterations N] [--stale-after N] [--json]` —
+  [--interval SECONDS] [--watch-iterations N] [--stale-after N]
+  [--quiet-after N] [--json]` —
   Manage the pair-level app-native heartbeat policy for the active
   manager/worker binding. `start` and `stop` write telemetry receipts and emit
   the exact manager/worker Codex app heartbeat automation specs plus the
@@ -413,6 +438,11 @@ tmux attach -t codex-live-test
   thread tools, so create/pause those heartbeat automations from a Codex app
   operator session using the emitted specs; Conveyor remains the durable source
   of truth through Dispatch, inboxes, wake receipts, and app heartbeat status.
+  `status` also reports `plan.quiescence`: when the loop is healthy, has no
+  `next_actions`, and both roles have produced `--quiet-after` paired
+  heartbeats since the last command or inbox-consumption receipt, it recommends
+  `stop_autopilot` so operators can quiesce blocked/no-progress loops instead
+  of repeating idle pulses.
 - `discover [QUERY] [--all] [--limit N]` / `search [QUERY]` — Search tasks,
   registered sessions, active bindings, and recent telemetry in one JSON result.
   Use this for conversational setup when a manager or Codex session needs to
@@ -443,7 +473,8 @@ tmux attach -t codex-live-test
   flags. Use `--interactive` only as a terminal fallback when a human is
   running `conveyor` directly.
   `--permit` grants taxonomy permissions such as `repo.open_pr`,
-  `verification.run_pytest`, `context.spawn_reviewer`,
+  `repo.push_branch`, `repo.monitor_ci`, `repo.resolve_conflicts`,
+  `repo.merge_green_pr`, `verification.run_pytest`, `context.spawn_reviewer`,
   `communication.notify_operator`, or `worker_session.compact`. Use `--tool`
   to record expected verification/context tools, `--epilogue` for required
   built-in finish steps (`run-tools`, `draft-pr`, `subagent-review`,
@@ -484,7 +515,13 @@ tmux attach -t codex-live-test
   [--json]` — Draft reviewed `criteria --add` commands from a worker response
   that separates must-have current-task criteria from deferred follow-ups. This
   helper is read-only: it resolves the task and prints suggestions, but does not
-  mutate acceptance criteria, events, or commands.
+  mutate acceptance criteria, events, or commands. If a proposed criterion
+  appears to describe manager closeout mechanics such as `finish-task`,
+  `--require-criteria-audit`, heartbeat teardown, or final manager reporting,
+  the helper emits a non-blocking warning and classifies that suggestion as
+  manager closeout proof. Keep that proof in the manager final report, audit,
+  replay, or epilogue evidence instead of accepted worker/task criteria unless
+  the task is explicitly Conveyor closeout QA.
   ```bash
   conveyor criteria-plan my-task --from-worker-response response.md --json
   ```
@@ -640,7 +677,10 @@ tmux attach -t codex-live-test
   recovery; `--once` performs one pass.
 - `enqueue-notify-manager <task> --message "..." [--correlation-id C]
   [--required-permission P] [--idempotency-key K] [--json]` — Queue a `notify_manager` command row for
-  Dispatch to claim and deliver to the bound manager.
+  Dispatch to claim and deliver to the bound manager. Codex app/no-tmux
+  workers must use this route for completion and blocker reports after
+  consuming a manager instruction; direct app-thread final answers are local
+  text, not manager inbox receipts.
 - `enqueue-nudge-worker <task> --message "..." [--correlation-id C]
   [--required-permission P] [--idempotency-key K] [--json]` — Queue a `nudge_worker` command row for
   Dispatch to claim and deliver to the bound worker. Use this dispatcher-backed
@@ -734,9 +774,9 @@ tmux attach -t codex-live-test
 - `transcript-show <task> [--role R] [--include-content]` — Show stored
   transcript segment metadata. Segment text is redacted unless
   `--include-content` is passed.
-- `qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor>` — Print a
+- `qa-plan <self-management|emergent-criteria|tmux-errors|dispatch-completion|ralph-loop|adversarial-triggers|goalbuddy-conveyor|ship-it-loop>` — Print a
   repeatable manual QA checklist.
-- `qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop> --receipt-output RECEIPT.json [--path DB]` —
+- `qa-run <ralph-loop-guardrails|generic-loop-template|generic-loop-template-browser|test-coverage-loop|adversarial-triggers|build-clear-loop|ship-it-loop> --receipt-output RECEIPT.json [--path DB]` —
   Run a deterministic no-tmux QA harness and save a JSON receipt.
   `ralph-loop-guardrails` proves max-iteration cutoff, missing-evidence
   cutoff, fresh retry delivery after structured `adversarial_check` evidence,
@@ -756,6 +796,10 @@ tmux attach -t codex-live-test
   `build-clear-loop` proves the non-coverage `build_then_clear` template
   blocks before `build_passed` and `cleanup` receipts, still blocks after build
   evidence alone, and delivers only after both build and cleanup evidence exist.
+  `ship-it-loop` proves push, PR, and merge commands fail closed until their
+  permissions are granted, then proves the `ship_it_loop` lifecycle blocks
+  before branch, PR, CI, mergeability, manager decision, merge, post-merge, and
+  adversarial receipts exist.
 - `loop-triggers --list|--classify PROMPT [--json]` — List the controlled
   natural-language loop triggers or classify a manager/operator prompt before
   creating a loop policy or continuation gate. Approved trigger phrases include
@@ -769,11 +813,16 @@ tmux attach -t codex-live-test
   evidence blocks a manager continuation before worker delivery until matching
   satisfied criterion evidence exists. `ralph-loop-presets` remains as a
   compatibility alias for the current Ralph-loop QA flows. The built-in
+  `app_visible_build_loop` template requires `build_passed` plus structured
+  `adversarial_check` evidence, but no cleanup evidence, so visible Codex app
+  threads can continue without pretending that same-thread context was cleared.
+  The built-in
   `visual_diff_loop` template requires `reference_artifact`,
   `candidate_screenshot`, `visual_diff_report`, `diff_below_threshold`, and
   `adversarial_check` evidence before a manager-requested next visual pass can
-  reach the worker. Quality-oriented templates (`pr_ci_merge_loop`,
-  `test_coverage_loop`, and `visual_diff_loop`) also expose an
+  reach the worker. Quality-oriented templates (`app_visible_build_loop`,
+  `pr_ci_merge_loop`, `ship_it_loop`, `test_coverage_loop`, and
+  `visual_diff_loop`) also expose an
   `artifact_requirements["adversarial_check"]` object requiring
   `failure_mode`, `check`, and `result` fields.
 - `loop-status TASK --run RUN [--json]` — Summarize a Ralph-loop run for manager
@@ -793,7 +842,12 @@ same-project `create_thread` worker plus `set_thread_title` before creating the
 binding, then pass the worker thread id/title into Conveyor. Use `fork_thread`
 only when the user explicitly asks to fork or resume this conversation. If app
 thread tools are unavailable, create the binding anyway and paste the returned
-`worker_handoff` prompt into a manually opened worker session.
+`worker_handoff` prompt into a manually opened worker session. The handoff
+requires a worker to report completion/blockers through
+`enqueue-notify-manager` plus a bounded Dispatch watch run before treating the
+manager as notified, and to print the live `CONVEYOR POLL` / `CONVEYOR
+RECEIVED` / `WORK` / `CONVEYOR SEND` / `DISPATCH` transcript in the worker
+session for any consumed item.
 - `enqueue-continue-iteration TASK --loop-run RUN --requested-iteration N` —
   Queue a manager-requested next loop pass for Dispatch. The command refuses
   same/current iteration requests before they become pending queue rows, while
@@ -805,6 +859,8 @@ thread tools are unavailable, create the binding anyway and paste the returned
   artifact requirements, and recommended tools.
 - `loop-evidence add TASK --loop-run RUN --iteration N --evidence-type TYPE` —
   Record a run-qualified evidence receipt for a loop policy. Use
+  `loop-evidence build-passed TASK --loop-run RUN --iteration N` as the
+  friendly alias for the common `evidence_type=build_passed` receipt. Use
   `loop-evidence visual-diff` to compare PNG screenshots, write an optional
   diff/report artifact, and record `visual_diff_report` plus
   `diff_below_threshold` as satisfied only when the computed score is within
@@ -850,14 +906,17 @@ conveyor qa-plan dispatch-completion
 conveyor qa-plan ralph-loop
 conveyor qa-plan adversarial-triggers
 conveyor qa-plan goalbuddy-conveyor
+conveyor qa-plan ship-it-loop
 conveyor qa-run ralph-loop-guardrails --receipt-output /tmp/ralph-loop-guardrails-receipt.json --json
 conveyor qa-run generic-loop-template --receipt-output /tmp/generic-loop-template-receipt.json --json
 conveyor qa-run generic-loop-template-browser --receipt-output /tmp/generic-loop-template-browser-receipt.json --json
 conveyor qa-run test-coverage-loop --receipt-output /tmp/test-coverage-loop-receipt.json --json
 conveyor qa-run adversarial-triggers --receipt-output /tmp/adversarial-triggers-receipt.json --json
 conveyor qa-run build-clear-loop --receipt-output /tmp/build-clear-loop-receipt.json --json
+conveyor qa-run ship-it-loop --receipt-output /tmp/ship-it-loop-receipt.json --json
 conveyor loop-triggers --classify "Run this as an adversarially gated Ralph loop." --json
 conveyor loop-templates --list --json
+conveyor loop-templates --show ship_it_loop --json
 conveyor loop-templates --show visual_diff_loop --json
 conveyor loop-evidence visual-diff qa-task --loop-run "$RUN_ID" --iteration 1 --reference reference.png --candidate candidate.png --threshold 0.02 --report-output visual-diff.json --diff-output visual-diff.png
 conveyor ralph-loop-presets --list --json