npm - @yemi33/minions - Versions diffs - 0.1.1633 → 0.1.1635 - Mend

@yemi33/minions 0.1.1633 → 0.1.1635

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/CHANGELOG.md +10 -0
package/README.md +11 -11
package/dashboard.js +46 -0
package/docs/auto-discovery.md +17 -15
package/docs/blog-first-successful-dispatch.md +7 -10
package/docs/engine-restart.md +8 -11
package/docs/human-vs-automated.md +3 -4
package/docs/pr-review-fix-loop.md +1 -1
package/docs/rfc-completion-json.md +5 -5
package/engine/copilot-models.json +1 -1
package/engine/lifecycle.js +1 -1
package/engine/playbook.js +2 -1
package/engine/queries.js +4 -4
package/engine/shared.js +4 -12
package/engine/timeout.js +59 -168
package/engine.js +11 -42
package/package.json +1 -1
package/playbooks/build-and-test.md +22 -139
package/playbooks/docs.md +113 -0
package/playbooks/fix.md +1 -1
package/playbooks/implement-shared.md +1 -1
package/playbooks/implement.md +3 -7
package/playbooks/shared-rules.md +4 -45
package/playbooks/test.md +17 -40
package/playbooks/verify.md +29 -141
package/playbooks/work-item.md +1 -0
package/prompts/cc-system.md +2 -0

package/playbooks/docs.md ADDED Viewed

@@ -0,0 +1,113 @@
+# Docs Playbook
+> Agent: {{agent_name}} ({{agent_role}}) | Task: {{item_name}} | ID: {{item_id}}
+## Context
+Repo: {{repo_name}} | Org: {{ado_org}} | Project: {{ado_project}}
+Team root: {{team_root}}
+Project path: {{project_path}}
+## Mission
+Update, expand, or rewrite project documentation. Targets include READMEs, CLAUDE.md,
+files under `docs/`, JSDoc/TSDoc on exported APIs, and inline comments where they add
+real WHY value (not WHAT — the code already says what). Keep voice consistent with the
+project's existing docs.
+## Task
+**{{item_name}}**
+{{item_description}}
+{{additional_context}}
+{{references}}
+{{acceptance_criteria}}
+## Steps
+### 1. Read the doc(s) and the code they describe
+- Open the doc(s) being changed end-to-end before writing.
+- Read the source they describe — function signatures, exported symbols, config keys,
+  CLI flags, file paths. Don't trust the existing doc; trust the code.
+- For project-level docs (README, CLAUDE.md, /docs/*.md), skim adjacent docs so your
+  voice and structure match.
+### 2. Confirm doc reflects current code
+For every claim in the doc you're touching, verify it against current code:
+- Does the function still exist with that signature?
+- Are the file paths correct?
+- Are the listed flags / config keys still accepted?
+- Are removed features still being documented?
+- Are new features (visible in code) missing from the doc?
+If the doc describes vapor, delete the section. If real features are missing, add them.
+### 3. Write or update concisely
+- Match the project's existing voice — read 2-3 nearby docs to calibrate tone.
+- Prefer concrete examples over abstract description.
+- For code comments: follow the project's "Default to writing no comments" rule
+  (CLAUDE.md). Add comments only where they explain WHY a non-obvious choice was made,
+  never to restate WHAT the code does.
+- For project-level docs: the bar is "would a new contributor understand this?"
+- Keep tables, lists, and code blocks formatted consistently with surrounding docs.
+### 4. Verify
+- Re-read the changed doc end-to-end after editing — does it still flow?
+- If the project has doc-validation tests (lint, link-check, snippet-execution), run
+  them. Otherwise run `npm test` (or the project's documented test command) to make
+  sure nothing else broke.
+- For docs with embedded code samples, mentally execute each sample against current
+  code — stale samples are worse than missing ones.
+## Acceptance
+- Doc accurately reflects current code (no vapor, no missing features).
+- Voice and structure match the rest of the project's docs.
+- For inline code comments: follow project conventions; add comments only where they
+  explain WHY, never WHAT.
+- For project-level docs: a new contributor could read it and understand the topic.
+- Existing tests still pass; any doc-validation tests pass.
+## Git Workflow
+You are already running in a git worktree on branch `{{branch_name}}`. Do NOT create
+additional worktrees — the engine pre-created one for you. Do NOT remove the worktree —
+the engine handles cleanup automatically.
+Commit only the doc files (and any helper assets they reference). Do not bundle
+unrelated code changes into a docs PR.
+```bash
+git add <doc files>
+git commit -m "{{commit_message}}"
+git push -u origin {{branch_name}}
+```
+PR creation is MANDATORY for docs tasks — docs go through the same review flow as code.
+Use the appropriate repo-host tooling for PR creation. For Azure DevOps, prefer the
+`az` CLI first and use the ADO MCP only as a fallback.
+## Rules
+- Do NOT modify product code unless the task explicitly asks for it.
+- Do NOT add comments that restate what the code does.
+- Do NOT invent features that don't exist; verify against current code.
+- Read `notes.md` for all team rules before starting.
+## When to Stop
+Your task is complete once the doc accurately reflects current code, the PR is created
+with the changed doc files, and any doc-validation tests pass. Do not continue editing
+adjacent docs that weren't part of the task.
+## Team Decisions
+{{notes_content}}

package/playbooks/fix.md CHANGED Viewed

@@ -45,7 +45,7 @@ Before pushing, prove the review fix did not break the branch:
 - Fix regressions you introduced. If failures are pre-existing or unrelated, capture the evidence and include it in the PR comment.
 - Do not push code that breaks existing tests or the build because of your changes.
-> ⚠️ **Long builds (Gradle, MSBuild, dotnet, fresh `npm install`)**: any command that may stay silent for more than ~4 minutes will be killed by the heartbeat monitor. Run it via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` (max 600000 ms). See **Long-Running Build / Test Commands** below for the full pattern.
+Long builds, dependency installs, and tests may be quiet for several minutes. Let the normal CLI command run naturally; do not add artificial heartbeat output or split commands just to show progress.
 ## Publish & Comment on PR

package/playbooks/implement-shared.md CHANGED Viewed

@@ -66,7 +66,7 @@ Before publishing, prove the shared branch still works with your change included
 - Fix regressions you introduced. If failures are pre-existing or caused by earlier branch work, capture the evidence and say so clearly.
 - Do not push code with a broken build or failing tests that you introduced.
-> ⚠️ **Long builds (Gradle, MSBuild, dotnet, fresh `npm install`)**: any command that may stay silent for more than ~4 minutes will be killed by the heartbeat monitor. Run it via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` (max 600000 ms). See **Long-Running Build / Test Commands** below for the full pattern.
+Long builds, dependency installs, and tests may be quiet for several minutes. Let the normal CLI command run naturally; do not add artificial heartbeat output or split commands just to show progress.
 ## Publish

package/playbooks/implement.md CHANGED Viewed

@@ -20,15 +20,11 @@ Implement PRD item **{{item_id}}: {{item_name}}**
 {{checkpoint_context}}
-## Projects
+## Project Scope
 Primary repo: **{{repo_name}}** ({{ado_org}}/{{ado_project}}) at `{{project_path}}`
-If this feature spans multiple projects, you may need to:
-1. Read code from all listed project paths to understand integration points
-2. Make changes in the primary project (your worktree)
-3. If changes are needed in other projects, create separate worktrees and PRs for each
-4. Note cross-repo dependencies in PR descriptions (e.g., "Requires office-bohemia PR #123")
+If this feature spans multiple projects, inspect the relevant repos, make changes where they belong, and call out any cross-repo PR dependencies in PR descriptions.
 ## Health Check
@@ -62,7 +58,7 @@ Before publishing, prove the change with the repo's own documented checks:
 - Fix regressions you introduced. If failures are pre-existing or outside the task, capture the evidence and make that explicit in the PR.
 - Do not publish changes with a broken build or failing tests that you introduced.
-> ⚠️ **Long builds (Gradle, MSBuild, dotnet, fresh `npm install`)**: any command that may stay silent for more than ~4 minutes will be killed by the heartbeat monitor. Run it via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` (max 600000 ms). See **Long-Running Build / Test Commands** below for the full pattern.
+Long builds, dependency installs, and tests may be quiet for several minutes. Let the normal CLI command run naturally; do not add artificial heartbeat output or split commands just to show progress.
 ## Publish

package/playbooks/shared-rules.md CHANGED Viewed

@@ -1,17 +1,6 @@
-## Quality Standard
+## Operating Principle
-Codex will review your changes — make sure your implementation is thorough and not lazy.
-## Reasoning and Teaching Posture
-- Act like you've already explained this yesterday. Do not ramble, re-teach obvious basics, or pad the answer. Get to the point fast.
-- You are an IQ 150 software engineering specialist. If the reasoning is average, vague, or hand-wavy, it is wrong.
-- You have a student who is eagerly trying to learn from you. Display model behavior: be rigorous, teach cleanly, and show what good engineering thinking looks like.
-- Explain concepts like you are teaching a packed auditorium. If the structure is weak or the example is forgettable, the explanation failed.
-- Always verify your claims. If you state something as true, earn it.
-- Treat every answer like there is $100 on the line. Sloppy logic, missed edge cases, and fake confidence lose the bet.
-- Assume another CLI is going to review the code and try to prove you wrong. Close every hole before you answer.
-- Leave no stone unturned when implementing or explaining. Half-checks, shallow analysis, and partial reasoning are not acceptable.
+Treat a Minions assignment like the user typed the same task directly into a capable CLI agent. Optimize for the requested outcome and use the repo's own tools, conventions, and acceptance criteria.
 ## Context Window Awareness
@@ -61,39 +50,9 @@ Treat a Minions assignment like the user typed the same task directly into a cap
   Do **not** create a skill for one-off bug fixes, isolated command outputs, obvious repo facts, or anything already covered by existing docs/playbooks/skills.
 - Do TDD where it makes sense — write failing tests first, then implement, then verify tests pass. Especially for bug fixes (write a test that reproduces the bug) and new utility functions.
-## Long-Running Build / Test Commands (Heartbeat Safety)
-The engine kills agents that produce no stdout for `heartbeatTimeout` (default **300s / 5 min**). A blocking shell call with zero stdout for that long is treated as hung. Cold builds (Gradle daemon spin-up, MSBuild restore, fresh `npm install`, `cargo build`) routinely exceed this with no intermediate output.
-**Two approved patterns — pick one when a command may exceed 4 minutes of silence:**
-### Pattern A — Stream output via background process + Monitor (preferred)
-```
-1. Bash({ command: "./gradlew test", run_in_background: true })       # returns a bash_id immediately
-2. Monitor({ bash_id: "<id>" })                                       # streams each stdout line as a notification
-```
-Why: each line that the build emits arrives as a notification, which resets the heartbeat. You see live progress in the dashboard. The Monitor call itself is recognised by the engine as a blocking tool (heartbeat extended ~30 min).
-> ⚠️ **Never use `Monitor({ command: "tail -F <file> | grep ..." })` for long builds.** It looks tidy — only the lines you care about — but it is a heartbeat trap. Cold Gradle / MSBuild / `cargo build` spend 3–8 minutes in a startup + dependency-resolution phase that produces output that **does not match** typical filter terms (`BUILD SUCCESSFUL`, `BUILD FAILED`, `error:`). The grep filter swallows every line, Monitor emits zero notifications, the heartbeat fires at 300s, and the engine kills the agent mid-build. Always pass `bash_id` directly — every output line resets the heartbeat, and noisy output is the *whole point* of the pattern.
-### Pattern B — Single Bash call with explicit `timeout`
-```
-Bash({ command: "./gradlew test", timeout: 600000 })                  # max 600000 = 10 min
-```
-The engine reads `input.timeout` from the tool call and extends the heartbeat to `timeout + 60s` for that turn. **The extension is opt-in** — without an explicit `timeout`, the agent is killed at `heartbeatTimeout`. PowerShell tool follows the same rule.
-### What NOT to do
-- Do NOT run `./gradlew`, `mvn`, `dotnet test`, or any cold-cache build as a default `Bash` call (no `timeout`, no `run_in_background`). It will hit the 120s Bash default, then the 300s heartbeat, and the engine will kill you.
-- Do NOT use `Monitor({ command: "tail | grep ..." })` for any build that has a silent startup phase (cold Gradle, MSBuild, fresh `npm install`, `cargo build`). The grep filter suppresses Gradle's startup output, Monitor emits nothing, heartbeat fires at 300s, agent is killed. Use `Monitor({ bash_id })` instead — noisy output is better than a dead agent.
-- Do NOT loop `sleep` to "wait it out" — sleep produces no stdout and looks identical to a hang.
-- Do NOT pipe through `tee` thinking that helps — heartbeat reads agent stdout, not the underlying file.
+## Long-Running Commands
-If you don't know how long a command takes, default to **Pattern A** — there is no downside to streaming.
+Builds, dependency installs, tests, and local servers can be quiet for long periods. Run the repo's normal CLI commands and let them finish; do not add artificial progress output, heartbeat loops, or command-specific workarounds just to keep Minions active.
 ## Checking PR and Build Status

package/playbooks/test.md CHANGED Viewed

@@ -15,60 +15,37 @@ Team root: {{team_root}}
 {{additional_context}}
-## Instructions
+## Mission
-This is a **test/build/run task**. Your goal is to build, run, test, or verify something.
+Build, run, test, or verify the requested target. This is a verification task unless the description explicitly asks for code or test changes.
-1. **Navigate** to the correct project directory
-2. You are already in the correct working directory. If you need a specific branch, use `git checkout` — do NOT create additional worktrees.
-3. **Build** the project — follow the repo's build instructions (check CLAUDE.md, package.json, README)
-4. **Run** if the task asks for it (e.g., `yarn start`, `yarn dev`, docker-compose, etc.)
-5. **Test** if the task asks for it (e.g., `yarn test`, `pytest`, etc.)
-6. **Write tests** if the task asks you to add or improve test coverage — commit and open a PR when done
-7. **Report results** — what worked, what failed, build output, test results, localhost URL if running
+## Long-Running Commands
-## Working Style
+Builds, tests, dependency installs, and server startups can be silent for several minutes. Run the normal CLI commands and wait for them to finish; do not add progress pings or extra logging just to keep the engine active.
-Use subagents only for genuinely parallel, independent tasks. For building, testing, and reporting results, work directly — do not spawn subagents.
+## Approach
-## Rules
+Work from the current project checkout prepared by the engine. Follow the repo's own instructions (`CLAUDE.md`, README, package files, Makefiles, project scripts) and run the smallest sensible set of commands that proves the requested behavior.
-- **Do NOT modify production code** unless the task explicitly asks for a fix
-- **Create a PR** if the task involves writing or modifying files (new tests, test fixes, etc.) — follow the same PR conventions as implement tasks
-- **Do NOT push or create a PR** for pure build/run/verify tasks that make no file changes
-- Use PowerShell for build commands on Windows if applicable
-- If a build or test fails, report the error clearly — don't try to fix it unless asked
-- If running a local server, report the URL (e.g., http://localhost:3000)
+If the task asks you to add or modify files, commit those focused changes, push the branch, and create a PR using the same conventions as implement tasks. For pure build/run/verify tasks, do not push or create a PR.
-## Run Command
+If a build or test fails, report the error clearly instead of fixing it unless the task explicitly asks for a fix.
-When the build succeeds and the task involves running a server or app, output a ready-to-paste run command using **absolute paths** so the user can launch it from any terminal. Format it exactly like this:
+If the task involves a local server or app, report the URL and a ready-to-paste run command with absolute paths:
-```
+```text
 ## Run Command
 cd <absolute-path-to-project-or-worktree> && <exact command to start the server>
 ```
-Example: `cd C:/Users/you/my-project && yarn dev`
-The agent process terminates after completion, so any dev server you start will die with it. The user needs this command to run it themselves.
-## After Successful Completion
-Write your findings to `{{team_root}}/notes/inbox/{{agent_id}}-{{item_id}}-{{date}}.md` **only after a successful run**: all requested build/test/run steps completed, required checks passed, and any required PR was created.
-If any requested build, test, run, or file-change step fails or is blocked, do **not** write an inbox note. Report the failure clearly in your final response, including the command, error summary, and any next action needed.
-Include:
-- Build status
-- Test results if applicable
-- Any errors or warnings
-- The run command (absolute paths, copy-pasteable from any terminal)
-- Localhost URL if applicable
+## Findings
+Write findings to `{{team_root}}/notes/inbox/{{agent_id}}-{{item_id}}-{{date}}.md` only after successful completion.
-## When to Stop
+Include build status, test results, errors or warnings, run command, localhost URL if applicable, and PR URL if file changes were made.
-Your task is complete once you have run the tests, written the success findings to the inbox file, and (if the task involved file changes) created and submitted a PR. If the run failed or was blocked, stop after reporting the failure in your final response; do not write an inbox file.
+## Constraints
-Do NOT remove worktrees — the engine handles cleanup automatically.
+- Do not modify production code unless explicitly asked.
+- Use PowerShell for build commands on Windows if applicable.
+- Do not remove worktrees; the engine handles cleanup automatically.

package/playbooks/verify.md CHANGED Viewed

@@ -11,168 +11,56 @@ Repo: {{repo_name}} | Org: {{ado_org}} | ADO Project: {{ado_project}}
 {{task_description}}
-## Your Task
+## Mission
-Verify that a set of related changes work correctly together. You must **figure out** how to build, test, and run this specific project — do not assume any particular language, framework, or tooling. Follow this process:
+Verify the completed plan as a whole: build and test the relevant project worktrees, identify how to run the user-facing app if there is one, create or update aggregate E2E PRs when needed, and write a transparent testing guide.
-1. **Set up worktrees** with all PR branches merged
-2. **Understand the project** — read its docs to learn how to build, test, and run it
-3. **Build and test** from each worktree
-4. **Start the application** if applicable (keep it running detached)
-5. **Write a transparent verification report and testing guide**
-6. **Create E2E pull requests**
+## Long-Running Commands
-## Step 1: Set Up Worktrees
+Builds, dependency installs, tests, and app startup can be silent for several minutes. Run the normal CLI commands and let them finish; do not emit artificial progress pings or heartbeat output.
-The description above contains setup commands that create **one worktree per project** and merge all PR branches into it. Run them.
+## Approach
-If any merge conflicts occur:
-- Resolve them, preferring the PR branch changes
-- Commit the resolution in the worktree
+Use the setup information in the plan details to create or enter the aggregate worktrees and merge the relevant PR branches. Resolve merge conflicts only when needed to make the verification branch buildable, and record what was resolved.
-After setup, all changes for a project are in a single directory — no switching between branches.
+For each relevant worktree, follow the repo's own instructions (`CLAUDE.md`, README, package files, Makefiles, project scripts), install/restore dependencies as needed, build, and run tests where they exist. If a project fails, report the error and continue verifying the rest; do not turn this into an implementation/fix task.
-## Step 2: Understand the Project
+If there is a user-facing app, identify the normal command and URL. If the app needs to keep running after the agent exits, start it detached in the worktree, save the PID, verify the URL responds, and include restart/stop commands with absolute paths.
-For each project worktree, **read its documentation** to understand:
-- What language/framework it uses
-- How to install dependencies
-- How to build it
-- How to run tests
-- How to start it (if it has a runnable application)
+## Testing Guide
-Check these files: `CLAUDE.md`, `README.md`, `package.json`, `Makefile`, `Cargo.toml`, `pyproject.toml`, `build.gradle`, `CMakeLists.txt`, `docker-compose.yml`, `Podfile`, `build.gradle.kts`, `*.xcodeproj`, `*.xcworkspace`, or whatever build system the project uses.
+Be transparent: clearly state what passed, what failed, what was not run, and why.
-Treat the repository's own docs, scripts, and config as the source of truth for build, test, and run commands. If the repo provides multiple options, choose the standard local verification path it recommends.
+Always create the permanent guide linked from the dashboard:
-**Do not assume any specific platform.** The project could be a web app, mobile app (Android/iOS/React Native/Flutter), backend service, CLI tool, library, monorepo, or anything else. Adapt your verification approach to what the project actually is.
+`{{team_root}}/prd/guides/verify-{{plan_slug}}.md`
-## Step 3: Build and Test
+Create the inbox copy only after a successful verification run:
-For each project worktree:
-1. `cd` into the worktree path
-2. Install dependencies using the command or workflow the repo specifies
-3. Run the build using the repo's documented build command
-4. Run the test suite the repo defines for full verification
-5. Record: PASS or FAIL with error output, test counts (passed/failed/skipped)
+`{{team_root}}/notes/inbox/verify-{{plan_slug}}.md`
-If a build or test fails, **do NOT fix it** — report the exact error and continue with other projects.
+A successful verification run means every required project build/test passed or was legitimately not applicable, each runnable app was started and smoke-checked when required, and the required E2E PRs were created or updated. If verification is failed, blocked, or partial, do not create the inbox copy; record the details in the permanent guide and final response instead.
-> ⚠️ **Long builds (Gradle, MSBuild, dotnet, fresh `npm install`)**: any command that may stay silent for more than ~4 minutes will be killed by the heartbeat monitor. Run it via `Bash(run_in_background: true)` then `Monitor` to stream stdout, OR pass an explicit `timeout` (max 600000 ms). See **Long-Running Build / Test Commands** below for the full pattern.
+Use the template at `{{team_root}}/playbooks/templates/verify-guide.md` and clearly separate what was verified from what still needs human review.
-## Step 4: Start the Application (if applicable)
+## E2E PRs
-Determine if the project has a **runnable application** (web server, API, desktop app, mobile emulator, etc.) by reading its documentation and build config. For mobile apps, check if an emulator/simulator can be launched or if building an APK/IPA is the appropriate verification step.
+For each project with aggregate verification changes, create or update one draft/review PR that combines the plan branches into a single reviewable diff. Reuse an existing `e2e/{{plan_slug}}` branch/PR when present. Include the plan summary, merged PRs, testing guide, and build/test status. Do not auto-complete or merge these PRs.
-If found:
-1. Start it **detached from your process** so it survives after you exit.
-   - If the repo docs provide a local run or background-start command, use that.
-   - Otherwise, use the detached-process mechanism that fits the current environment. Do not assume Bash, PowerShell, or any specific shell unless the repo or runtime clearly provides it.
-2. Wait a few seconds, then verify it using the repo's documented smoke test, health check, startup output, or the lightest project-appropriate manual check.
-3. Note the URL, port, process identifier, or equivalent runtime details the repo exposes
-4. Output the exact restart command with **absolute worktree paths**
-5. Include the stop command or shutdown procedure that matches how you started it
+Track each E2E PR in the project's `.minions/pull-requests.json` if it is not already tracked.
-If the project has no runnable application, skip this step and note that in the guide.
-## Step 5: Write the Verification Report and Testing Guide
-Always create the permanent guide linked from the dashboard:
-1. **Permanent location**: `{{team_root}}/prd/guides/verify-{{plan_slug}}.md`
-Create the inbox copy only after a successful verification run:
-2. **Inbox copy**: `{{team_root}}/notes/inbox/verify-{{plan_slug}}.md`
-A successful verification run means every required project build/test passed or was legitimately not applicable, each runnable app was started and smoke-checked when required, and the required E2E PRs were created or updated. If verification is failed, blocked, or partial, do **not** create the inbox copy; record the details in the permanent guide and final response instead.
-**Be transparent.** The guide must clearly state what was built, what was tested, what passed, what failed, and what still needs human verification.
-Use the template structure from `{{team_root}}/playbooks/templates/verify-guide.md` — read it and fill in each section with your actual findings.
-## Step 6: Create E2E Pull Requests
-For each project that has changes, create a single **aggregate PR** that combines all the plan's branches into one. This gives the human reviewer a single diff showing the full picture.
-For each project worktree:
-1. **Check for an existing E2E branch/PR first** — a prior verify run may have already created one:
-   ```bash
-   cd <worktree-path>
-   git fetch origin
-   # Check if the E2E branch already exists on remote
-   git ls-remote --heads origin e2e/{{plan_slug}}
-   ```
-   - If the branch **already exists**: check out the existing branch (`git checkout e2e/{{plan_slug}}`), reset it to your current worktree state, and force-push to update it. Then check if a PR already exists for this branch — if so, **update that PR** instead of creating a new one.
-   - If the branch **does not exist**: create it fresh:
-     ```bash
-     git checkout -b e2e/{{plan_slug}}
-     git push origin e2e/{{plan_slug}}
-     ```
-2. **Check for an existing E2E PR** before creating a new one:
-   - For GitHub: `gh pr list --head e2e/{{plan_slug}} --state open`
-   - For ADO: use `az` CLI first to search for PRs with source branch `e2e/{{plan_slug}}`; use ADO MCP only as a fallback when `az` is unavailable or insufficient
-   - If found, **update the existing PR** description with latest build/test results. Do NOT create a duplicate.
-   - If not found, create a new PR targeting the project's main branch:
-     - **Title:** `[E2E] <plan summary>`
-     - **Description:** Include the plan summary, list of all individual PRs merged, build/test status from Step 3, and link to the testing guide
-     - **Target branch:** the project's main branch (e.g., `main` or `master`)
-     - **Do NOT auto-complete** — this is for review only
-     - **Mark as draft** if the option is available
-3. Add the E2E PR to the project's `.minions/pull-requests.json` (skip if it's already tracked):
-   ```bash
-   node -e "
-   const fs = require('fs');
-   const p = '<project-path>/.minions/pull-requests.json';
-   const prs = JSON.parse(fs.readFileSync(p, 'utf8'));
-   // Skip if already tracked
-   if (prs.some(pr => pr.branch === 'e2e/{{plan_slug}}')) process.exit(0);
-   prs.push({
-     id: 'PR-<number>',
-     title: '[E2E] <plan summary>',
-     agent: '{{agent_name}}',
-     branch: 'e2e/{{plan_slug}}',
-     reviewStatus: 'pending',
-     status: 'active',
-     created: new Date().toISOString().slice(0,10),
-     url: '<pr-url>',
-     prdItems: []
-   });
-   fs.writeFileSync(p, JSON.stringify(prs, null, 2));
-   "
-   ```
-4. Note the E2E PR URLs in the testing guide.
-## Working Style
-Use subagents only for genuinely parallel, independent build/test tasks on separate project worktrees. For sequential work (docs → build → test → report), and for starting detached servers, work directly — do not spawn subagents.
-## Rules
-- **Read the project docs first** — never assume a build system, language, or framework
-- Treat repo-provided docs, scripts, and config as the source of truth for build/test/run commands
-- Base testing steps on the **acceptance criteria** from each plan item
-- Include **concrete steps** — URLs, buttons to click, inputs to type, expected results
-- Be **transparent** — clearly separate what you verified vs what needs human review
-- If a project doesn't build, still document what SHOULD be testable once fixed
-- Do NOT fix code — only report issues
-- Leave all worktrees in place for the user to inspect
-- Start the application **detached** so it keeps running after your process exits
-- Use absolute paths everywhere so the user can copy-paste commands
-- E2E PRs are for review only — do NOT auto-complete or merge them
+## Constraints
+- Do not assume any specific platform, language, framework, shell, build system, mobile app, web app, Android, or iOS target.
+- Read project docs first; use `CLAUDE.md`, `README.md`, and nearby project configuration as the source of truth.
+- Base manual testing steps on the plan acceptance criteria.
+- If a project does not build, still document what should be testable once fixed.
+- Do not fix product code except for merge-conflict resolutions needed to create the verification branch.
+- Leave verification worktrees in place for user inspection.
+- Use absolute paths in commands the user may copy.
 ## When to Stop
-Your task is complete once you have: (1) merged dependency branches, (2) built and tested, (3) written the verification report to the permanent guide, (4) written the inbox copy only if verification succeeded, and (5) created the E2E PR(s).
-**IMPORTANT: Your final message MUST include the E2E PR URL(s) so the engine can track them.** Example final message:
-```
-Verification complete. E2E PR created: https://github.com/org/repo/pull/123
-Testing guide saved to prd/guides/verify-plan-name.md
-```
+Your task is complete once dependency branches are merged, build/test verification has run, the permanent guide is written, the inbox copy is written only if verification succeeded, and E2E PR URLs are included in your final response.
-Stop after confirming the PR was created.
+Your final message MUST include each E2E PR URL and the testing guide path, for example: `E2E PR created: <url>` and `Testing guide: {{team_root}}/prd/guides/verify-{{plan_slug}}.md`.

package/playbooks/work-item.md CHANGED Viewed

@@ -30,6 +30,7 @@ Treat this like the user typed the task directly into a CLI agent:
 - Follow existing repo conventions and avoid unrelated cleanups.
 - Validate with the repo's documented build/test/check commands. Fix regressions you introduced; if failures are pre-existing or outside the task, document the evidence.
 - Do NOT publish code with a broken build or failing tests that you introduced.
+- Long builds and tests may be quiet for several minutes. Let normal CLI commands run without artificial heartbeat output.
 After the change is ready for review, commit only relevant files, push `{{branch_name}}`, create the PR, and post implementation notes with the validation result:

package/prompts/cc-system.md CHANGED Viewed

@@ -77,6 +77,8 @@ Core action types:
   workTypes: `explore` (research/report only, NO PR), `ask` (answer/report, NO PR), `implement` (new code, PR REQUIRED), `fix` (bug fix, PR REQUIRED), `review` (code review, NO PR), `test` (tests, PR if new), `verify` (merge/build/maintenance, NO PR)
   If the user wants a design/architecture artifact committed through a PR, dispatch `implement` or `docs` rather than `explore`.
   When the user names a specific agent ("assign this to lambert"), put exactly that one name in `agents` (e.g. `"agents": ["lambert"]`). A single-agent assignment is hard-pinned by the server — it will queue for that agent only and skip the routing table. Use multi-agent arrays only when the user names multiple agents or asks for fan-out.
+- **build-and-test**: pr, project (optional), agent (optional) — Run the build-and-test playbook against a PR. The agent will checkout the PR branch, run the project's build/test commands, and report results. Use when the user asks to "run tests on PR X" or "build PR X" or after a fix to verify nothing regressed.
+  Example: user says "run build and test on PR 1834" → `{"type":"build-and-test","pr":"1834"}`
 - **note**: title, content — save to inbox
 - **knowledge**: title, content, category (architecture/conventions/project-notes/build-reports/reviews) — create new KB entry or copy existing doc to KB
 - **pin-to-pinned**: title, content, level (critical/warning) — write to pinned.md, force-injected into ALL agent prompts (rarely needed)