npm - claude-dev-env - Versions diffs - 1.36.2 → 1.37.1 - Mend

claude-dev-env 1.36.2 → 1.37.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (76) hide show

package/skills/bugteam/SKILL_EVALS.md CHANGED Viewed

@@ -14,23 +14,20 @@ Evals are split into two layers. Both layers run against the same trace but carr
 ## Ironclad invariants (Layer A, apply to every eval)
-Each invariant cites the normative section or companion file it derives from. **Path A vs Path B:** `SKILL.md` **Path routing** splits harnesses. Invariants **I-1, I-3, I-4, I-7, I-9, I-11 (teammate shutdown → `TeamDelete` prefix), I-13** apply to **Path A only**. **Path B** substitutes per [`reference/workflow-path-b-task-harness.md`](reference/workflow-path-b-task-harness.md): no `TeamCreate` / `TeamDelete`; parallel auditors use parallel **`Task`** calls; **I-12 Path B** — the **lead** runs Step 2.5 `gh api` posts. **I-2, I-5, I-6, I-8, I-10** apply to **both** paths (revoke once; fresh spawn per loop; `model="opus"` on audit/fix workers; cap; read outcome XML).
+Each invariant cites the normative section or companion file it derives from. All spawns use `Agent(..., run_in_background=true)`. Invariants apply uniformly across all eval fixtures.
 | # | Invariant | Citation |
 |---|---|---|
-| I-1 | **Path A:** `Bash` invoking `scripts/grant_project_claude_permissions.py` precedes every `TeamCreate`. **Path B:** grant precedes first audit **`Task`**. | `SKILL.md` § Step 0; § Path routing |
-| I-2 | `Bash` invoking `scripts/revoke_project_claude_permissions.py` runs exactly once per invocation on every exit path, **after** teardown that applies to that path (`TeamDelete` only on Path A). | `SKILL.md` § Step 5 |
-| I-3 | **Path A:** Exactly one `TeamCreate` and exactly one `TeamDelete` per invocation. **Path B:** zero `TeamCreate` / `TeamDelete`. | `SKILL.md` § Step 2; § Step 4 |
-| I-4 | **Path A:** Before `TeamDelete`, no teammate remains active without cleanup (self-terminated `Agent` or `SendMessage` shutdown). | `SKILL.md` § AUDIT/FIX shutdown; § Step 4 |
-| I-5 | **Path A:** `Agent` calls are fresh per loop. **Path B:** `Task` calls for audit/fix are fresh per loop (same `name` discipline where the host exposes naming). | `CONSTRAINTS.md` — **Fresh teammate per loop**; deltas **Clean-room note** |
-| I-6 | Both paths: audit and fix worker spawns pass `model="opus"` on `Agent` **or** `Task` as documented in `SKILL.md` § AUDIT/FIX. | `SKILL.md` § Step 2 (**Roles**); `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for both teammates** |
-| I-7 | **Path A:** `TeamDelete()` is called with no arguments. **Path B:** omit. | `SKILL.md` § Step 4 |
-| I-8 | Loop count ≤ 10 audits. 11th audit never fires. | `SKILL.md` YAML `description` (10-loop cap); § Step 3 (**Pre-audit** / **FIX** increment rules) |
-| I-9 | **Path A:** From loop 4 onward without convergence, three parallel `Agent` calls in one message. **Path B:** three parallel **`Task`** calls. | `SKILL.md` § AUDIT action (**Parallel auditors**); `reference/workflow-path-b-task-harness.md` |
-| I-10 | Lead reads `.bugteam-loop-<N>.outcomes.xml` with the `Read` tool after each audit, before the next action. | `SKILL.md` § AUDIT action |
-| I-11 | **Path A:** `git worktree remove` each PR → teammate shutdowns → `TeamDelete` → `rmtree` `<team_temp_dir>` → Step 4.5 → revoke. **Path B:** `git worktree remove` each PR → (omit shutdown / `TeamDelete`) → `rmtree` → Step 4.5 → revoke. | `SKILL.md` § Step 4; § Step 4.5; § Step 5; `reference/workflow-path-a-orchestrated-teams.md` § Step 4; `reference/workflow-path-b-task-harness.md` § Step 4 |
-| I-12 | **Path A:** Lead never posts PR review / finding / fix replies except Step 4.5 body. **Path B:** Lead performs Step 2.5 posts per deltas; Step 4.5 unchanged. | `CONSTRAINTS.md` — **Audit/fix comment posting** |
-| I-13 | **Path A:** Only the lead invokes `TeamCreate`; every teammate `Agent(..., team_name=...)`. **Path B:** no `TeamCreate`; `Task` spawns omit `team_name`. | `CONSTRAINTS.md` — **Path A — orchestrator-only `TeamCreate`**; `reference/workflow-path-b-task-harness.md` |
+| I-1 | `Bash` invoking `scripts/grant_project_claude_permissions.py` precedes the first audit `Agent` spawn. | `SKILL.md` § Step 0 |
+| I-2 | `Bash` invoking `scripts/revoke_project_claude_permissions.py` runs exactly once per invocation on every exit path, after teardown. | `SKILL.md` § Step 5 |
+| I-3 | Orchestration uses `Agent(..., run_in_background=true)` only — no `TeamCreate`, `TeamDelete`, `SendMessage`, or `Task` tool calls. | `SKILL.md` § Step 2; § Step 4 |
+| I-4 | `Agent` calls are fresh per loop (`run_in_background=true`; new `name` each loop). | `CONSTRAINTS.md` — **Fresh subagent per loop** |
+| I-5 | Audit sibling spawns pass `model="haiku"`; validator and fix spawns pass `model="opus"`. | `SKILL.md` § AUDIT action (parallel auditors); § FIX action; `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for validator and fix subagents** |
+| I-6 | Loop count ≤ 10 audits. 11th audit never fires. | `SKILL.md` YAML `description` (10-loop cap); § Step 3 (**Pre-audit** / **FIX** increment rules) |
+| I-7 | From loop 4 onward without convergence, eleven parallel `Agent(..., run_in_background=true)` calls in one message for audit. | `SKILL.md` § AUDIT action (**Parallel auditors**) |
+| I-8 | Lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` with the `Read` tool after each audit, before the next action. | `SKILL.md` § AUDIT action |
+| I-9 | Teardown sequence: `git worktree remove` each PR → `rmtree` `<run_temp_dir>` → Step 4.5 → revoke. | `SKILL.md` § Step 4; § Step 4.5; § Step 5 |
+| I-10 | The bugfind subagent posts ONE per-loop review; the bugfix subagent posts fix replies. The lead's only PR-write action is the Step 4.5 description rewrite. | `CONSTRAINTS.md` — **Audit/fix comment posting** |
 Any eval failing one or more Layer A invariants fails the run.
@@ -46,25 +43,23 @@ The harness does not yet exist; this document defines its contract.
 ---
-## Eval 1 — Path B: agent teams env unset (Task harness, not a refusal)
+## Eval 1 — Smoke: background subagent spawns fire correctly
-**Scenario.** `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` is unset in both `claude config` and `~/.claude/settings.json`.
+**Scenario.** PR exists; PR is a clean target with no unusual pre-conditions.
 **Trigger.** `/bugteam`
-**Layer A invariants.** Path B subset (I-2, I-5, I-6, I-8, I-10; I-1/I-3/I-4/I-7/I-9/I-11/I-13 N/A or Path-B-shaped).
+**Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-8, I-9, I-10.
-**Layer B predicted trace (Path B smoke).**
-1. `Bash("claude config get env.CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS")` → empty.
-2. `Bash("python .../grant_project_claude_permissions.py")` runs (Step 0).
-3. **No** `TeamCreate`.
-4. At least one `Task(subagent_type="code-quality-agent", ...)` or host-equivalent for AUDIT; FIX rounds use the host-appropriate FIX `Task` from `workflow-path-b-task-harness.md` § **FIX spawn** (`clean-coder` subtype on Claude Code when accepted; `generalPurpose` + clean-coder **Read** preamble on Cursor when the enum rejects `clean-coder`).
+**Layer B predicted trace (smoke).**
+1. `Bash("python .../grant_project_claude_permissions.py")` runs (Step 0).
+2. `Agent(subagent_type="code-quality-agent", name="bugfind-pr...-loop1", run_in_background=true, model="opus", ...)` spawned for AUDIT.
+3. Lead awaits background-completion notification, then `Read(".bugteam-pr42-loop1.outcomes.xml")`.
+4. `Agent(subagent_type="clean-coder", name="bugfix-pr...-loop1", run_in_background=true, model="opus", ...)` spawned for FIX (if findings).
 5. `Bash("python .../revoke_project_claude_permissions.py")` on exit.
 **Pass criteria.**
-- **No** refusal string about missing agent teams.
-- Zero `TeamCreate`, zero `TeamDelete`, zero teammate `SendMessage` shutdowns.
-- Non-zero `Task` (or `Agent` without `team_name` only if the host maps Path B that way) carrying **`code-quality-agent`** / **fix worker under the clean-coder contract** (subtype `clean-coder` where accepted, else `generalPurpose` + `clean-coder.md` Read per `workflow-path-b-task-harness.md`).
+- Non-zero `Agent(subagent_type="code-quality-agent", run_in_background=true)` and `Agent(subagent_type="clean-coder", run_in_background=true)` calls.
 ---
@@ -75,7 +70,7 @@ The harness does not yet exist; this document defines its contract.
 **Layer B predicted trace.**
 1. `Bash("gh pr view --json ...")` → non-zero exit.
 2. `Bash("git merge-base HEAD origin/main")` → empty.
-3. No grant script, no `TeamCreate`.
+3. No grant script.
 **Pass criteria.** Assistant message matches `No PR or upstream diff. /bugteam needs a target.`. Zero downstream tool calls.
@@ -93,15 +88,15 @@ The harness does not yet exist; this document defines its contract.
 **Scenario.** `code-quality-agent` is present in the available-agents list; `clean-coder` is not.
-**Pass criteria.** Assistant message contains `Required subagent type clean-coder not installed.`. Zero grant script call, zero `TeamCreate`.
+**Pass criteria.** Assistant message contains `Required subagent type clean-coder not installed.`. Zero grant script call, zero `Agent` spawns.
 ---
-## Eval 5 — Happy path: converges in 2 loops (Path A fixture)
+## Eval 5 — Happy path: converges in 2 loops
-**Scenario.** PR #42 contains three P1 bugs all addressable by the mock fix teammate. Loop 1 audit returns 3 findings; loop 1 fix commits cleanly; loop 2 audit returns zero findings. **`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`** (Path A — `TeamCreate` + teammate `Agent`).
+**Scenario.** PR #42 contains three P1 bugs all addressable by the mock fix subagent. Loop 1 audit returns 3 findings; loop 1 fix commits cleanly; loop 2 audit returns zero findings.
-**Layer A invariants.** Path A: I-1, I-2, I-3, I-4, I-5, I-6, I-7, I-10, I-11, I-12, I-13.
+**Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-6, I-8, I-9, I-10.
 **Layer B predicted trace.**
@@ -109,41 +104,40 @@ The harness does not yet exist; this document defines its contract.
 |---|---|---|
 | 1 | `Bash("python .../scripts/grant_project_claude_permissions.py")` | `SKILL.md` § Step 0 |
 | 2 | `Bash("gh pr view --json number,baseRefName,headRefName,url")` | `SKILL.md` § Step 1 |
-| 3 | `Bash("git rev-parse HEAD")` → captures `starting_sha` | `SKILL.md` § Step 2 — **Loop state** block |
-| 4 | `TeamCreate(team_name="bugteam-pr-42-<ts>", description=..., agent_type="team-lead")` | `SKILL.md` § Step 2 |
-| 5 | `Bash("mkdir -p <team_temp_dir>")` | `SKILL.md` § AUDIT action |
-| 6 | `Bash("gh pr diff 42 -R ... > <team_temp_dir>/loop-1.patch")` | `SKILL.md` § AUDIT action |
-| 7 | `Agent(subagent_type="code-quality-agent", name="bugfind", team_name=..., model="opus", description=..., prompt=<audit XML loop 1>)` | `SKILL.md` § AUDIT action |
-| 8 | `Read(".bugteam-loop-1.outcomes.xml")` | `SKILL.md` § AUDIT action |
-| 9 | `SendMessage(to="bugfind", message={type: "shutdown_request", reason: "audit loop 1 complete; outcome XML captured"})` | `SKILL.md` § AUDIT action (**Shutdown** fallback) |
-| 10 | `Agent(subagent_type="clean-coder", name="bugfix", team_name=..., model="opus", description=..., prompt=<fix XML loop 1>)` | `SKILL.md` § FIX action |
-| 11 | `Read(".bugteam-loop-1.outcomes.xml")` — bugfix outcome XML overwrites same filename | `SKILL.md` § FIX action |
-| 12 | `Bash("git rev-parse HEAD")` → verify HEAD advanced | `SKILL.md` § FIX action (**Verify**) |
-| 13 | `Bash("git fetch origin <branch> && git rev-parse origin/<branch>")` → verify push landed | `SKILL.md` § FIX action (**Verify**) |
-| 14 | `SendMessage(to="bugfix", message={type: "shutdown_request", reason: "fix loop 1 complete; commit <sha7> pushed"})` | `SKILL.md` § FIX action (**Shutdown** fallback) |
-| 15 | `Bash("gh pr diff 42 -R ... > <team_temp_dir>/loop-2.patch")` | `SKILL.md` § AUDIT action |
-| 16 | `Agent(subagent_type="code-quality-agent", name="bugfind", ...)` (loop 2) | `SKILL.md` § AUDIT action |
-| 17 | `Read(".bugteam-loop-2.outcomes.xml")` — zero findings | `SKILL.md` § AUDIT action |
-| 18 | `SendMessage(to="bugfind", message={type: "shutdown_request", reason: "audit loop 2 complete; zero findings"})` | `SKILL.md` § AUDIT action (**Shutdown** fallback) |
-| 19 | `TeamDelete()` | `SKILL.md` § Step 4 |
-| 20 | `Bash("python -c \"import os, shutil, stat, sys; h = lambda f, p, *_: (os.chmod(p, stat.S_IWRITE), f(p)); shutil.rmtree(r'<team_temp_dir>', **({'onexc': h} if sys.version_info >= (3, 12) else {'onerror': h}))\"")` | `SKILL.md` § Step 4 (Windows-safe teardown) |
+| 3 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → captures `starting_sha` | `SKILL.md` § Step 2 — **Loop state** block |
+| 4 | `Bash("mkdir -p <run_temp_dir>/pr-42")` | `SKILL.md` § AUDIT action |
+| 5 | `Bash("gh pr diff 42 -R ... > <run_temp_dir>/pr-42/loop-1.patch")` | `SKILL.md` § AUDIT action |
+| 6 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop1", run_in_background=true, model="opus", description=..., prompt=<audit XML loop 1>)` | `SKILL.md` § AUDIT action |
+| 7 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
+| 8 | `Read(".bugteam-pr42-loop1.outcomes.xml")` | `SKILL.md` § AUDIT action |
+| 9 | `Agent(subagent_type="clean-coder", name="bugfix-pr42-loop1", run_in_background=true, model="opus", description=..., prompt=<fix XML loop 1>)` | `SKILL.md` § FIX action |
+| 10 | Lead awaits background-completion notification | `SKILL.md` § FIX action |
+| 11 | `Read(".bugteam-pr42-loop1.outcomes.xml")` — bugfix outcome XML | `SKILL.md` § FIX action |
+| 12 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → verify HEAD advanced | `SKILL.md` § FIX action (**Verify**) |
+| 13 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" fetch origin <branch>")` → fetch remote state | `SKILL.md` § FIX action (**Verify**) |
+| 14 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse origin/<branch>")` → confirm matches HEAD | `SKILL.md` § FIX action (**Verify**) |
+| 15 | `Bash("gh pr diff 42 -R ... > <run_temp_dir>/pr-42/loop-2.patch")` | `SKILL.md` § AUDIT action |
+| 16 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop2", run_in_background=true, ...)` (loop 2) | `SKILL.md` § AUDIT action |
+| 17 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
+| 18 | `Read(".bugteam-pr42-loop2.outcomes.xml")` — zero findings | `SKILL.md` § AUDIT action |
+| 19 | `Bash("git worktree remove \"<run_temp_dir>/pr-42/worktree\"")` | `SKILL.md` § Step 4 step 1 |
+| 20 | `Bash("python -c \"...shutil.rmtree(r'<run_temp_dir>', ...)\"")` | `SKILL.md` § Step 4 step 2 (Windows-safe teardown) |
 | 21 | `Bash("gh pr diff 42 -R ... > .bugteam-final.diff")` | `SKILL.md` § Step 4.5 step 1 |
 | 22 | `Bash("gh pr view 42 -R ... --json body --jq .body > .bugteam-original-body.md")` | `SKILL.md` § Step 4.5 step 2 |
 | 23 | `Agent(subagent_type="pr-description-writer", description=..., prompt=<brief>)` | `SKILL.md` § Step 4.5 |
 | 24 | `Write(".bugteam-final-body.md", <returned body>)` | `SKILL.md` § Step 4.5 step 4 |
 | 25 | `Bash("gh pr edit 42 -R ... --body-file .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 4 |
-| 26 | `Bash("rm .bugteam-final.diff .bugteam-original-body.md .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 5 (lead may add `.bugteam-loop-*.outcomes.xml` in the same or a separate `rm` — reconcile on first real run) |
+| 26 | `Bash("rm .bugteam-final.diff .bugteam-original-body.md .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 5 |
 | 27 | `Bash("python .../scripts/revoke_project_claude_permissions.py")` | `SKILL.md` § Step 5 |
 **Pass criteria.**
 - All Layer A invariants hold.
-- Exactly 2 `Agent(name="bugfind"...)` calls, exactly 1 `Agent(name="bugfix"...)` call.
-- Exactly 2 bugfind shutdown messages + 1 bugfix shutdown message.
+- Exactly 2 `Agent(name="bugfind-pr42-loop...")` calls, exactly 1 `Agent(name="bugfix-pr42-loop...")` call.
 - Final report contains `/bugteam exit: converged` and `Loops: 2`.
 **Process check after first real run.** Compare the observed trace against steps 1–27. Common expected divergences that should not fail the eval:
 - Extra `Bash("git rev-parse HEAD")` calls the lead inserts for bookkeeping.
-- Consolidated `Bash` calls (step 26 may split into two or three calls).
+- Consolidated `Bash` calls (step 25 may split into two or three calls).
 - Extra `Read` calls when the lead re-reads an outcome XML to quote specific findings.
 - Reordered but still-Layer-A-compliant cleanup sequencing.
@@ -151,40 +145,40 @@ Patch this table to match observation and annotate each correction.
 ---
-## Eval 6 — Stuck path: fix teammate produces no commit
+## Eval 6 — Stuck path: fix subagent produces no commit
-**Scenario.** Loop 1 audit finds 2 P1 bugs; the mock fix teammate reports both as `could_not_address` (no commit created).
+**Scenario.** Loop 1 audit finds 2 P1 bugs; the mock fix subagent reports both as `could_not_address` (no commit created).
-**Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-6, I-7, I-10, I-11, I-12. I-8 trivially holds.
+**Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-6, I-8, I-9, I-10. I-6 trivially holds.
-**Layer B predicted trace.** Identical to Eval 5 steps 1–14 with this divergence:
+**Layer B predicted trace.** Identical to Eval 5 steps 1–12 with this divergence:
 - Step 11 bugfix outcome XML marks every finding `status="could_not_address"`.
 - Step 12 `Bash("git rev-parse HEAD")` returns the pre-fix SHA unchanged.
-- Skill sets exit reason = `stuck`, skips loop 2, and falls through to `TeamDelete()`.
+- Skill sets exit reason = `stuck`, skips loop 2, and falls through to `rmtree`.
 **Pass criteria.**
 - Loop count stops at 1.
 - Final report contains `/bugteam exit: stuck` and names the two unresolved findings.
-- Steps 19–27 fire despite the stuck exit — I-2 and I-11 enforce this.
+- Steps 19–26 fire despite the stuck exit — I-2 and I-9 enforce this.
 ---
 ## Eval 7 — Cap reached: 10 loops, no convergence
-**Scenario.** Mock audit returns one P2 finding every loop. Mock fix teammate always commits but never clears the finding.
+**Scenario.** Mock audit returns one P2 finding every loop. Mock fix subagent always commits but never clears the finding.
-**Layer A invariants.** All of I-1 through I-12.
+**Layer A invariants.** All of I-1 through I-10.
 **Layer B predicted behavior.**
-- Loops 1–3: single `Agent(name="bugfind")` per loop.
-- Loops 4–10: three parallel `Agent(name="bugfind-loop-<N>-a/b/c")` in a single assistant message per loop, followed by two parallel `-b`/`-c` shutdowns and one `-a` shutdown.
-- Each loop produces one `Agent(name="bugfix")` and its matching shutdown.
+- Loops 1–3: single `Agent(name="bugfind-pr<N>-loop<L>", run_in_background=true)` per loop.
+- Loops 4–10: eleven parallel `Agent(name="bugfind-pr<N>-loop<L>-[a..k]", run_in_background=true)` in a single assistant message per loop (10 haiku + 1 opus validator); lead awaits the validator notification.
+- Each loop produces one `Agent(name="bugfix-pr<N>-loop<L>", run_in_background=true)`.
 - Exactly 10 audit phases, exactly 10 fix phases.
-- Steps 19–27 from Eval 5 fire at teardown.
+- Steps 19–26 from Eval 5 fire at teardown.
 **Pass criteria.**
-- I-8 holds: exactly 10 audit phases.
-- I-9 holds: loops 4–10 each emit three audit `Agent` calls in a single assistant message.
+- I-6 holds: exactly 10 audit phases.
+- I-7 holds: loops 4–10 each emit eleven audit `Agent` calls in a single assistant message.
 - Final report contains `/bugteam exit: cap reached` and the remaining bug count.
 **Process check.** The distinct `Agent(name=...)` audit-call count is a prediction. On the first real run, record the exact count and rewrite the formula here.
@@ -195,12 +189,12 @@ Patch this table to match observation and annotate each correction.
 **Scenario.** Loop 1 audit returns zero findings.
-**Layer A invariants.** I-1 through I-7, I-10, I-11, I-12.
+**Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-6, I-8, I-9, I-10.
-**Layer B predicted trace.** Eval 5 steps 1–9 and 19–27 only — no FIX phase because zero findings means the skill exits the loop at `last_action == "audited"` and `last_findings.total == 0`.
+**Layer B predicted trace.** Eval 5 steps 1–8 and 19–26 only — no FIX phase because zero findings means the skill exits the loop at `last_action == "audited"` and `last_findings.total == 0`.
 **Pass criteria.**
-- Exactly 1 `Agent(name="bugfind"...)` call, 0 `Agent(name="bugfix"...)` calls, 1 bugfind shutdown.
+- Exactly 1 `Agent(subagent_type="code-quality-agent", run_in_background=true)` call, 0 fix agent spawns.
 - Bugfind's outcome XML records zero findings; the per-loop review POST carries body `## /bugteam loop 1 audit: 0P0 / 0P1 / 0P2 → clean`.
 - Step 4.5 and Step 5 still fire.
@@ -212,7 +206,7 @@ Patch this table to match observation and annotate each correction.
 **Layer A invariants.** Same as Eval 5.
-**Layer B predicted teammate-side behavior** (observed via the recorded `gh api ... /reviews` POST payload in the bugfind teammate fixture).
+**Layer B predicted subagent-side behavior** (observed via the recorded `gh api ... /reviews` POST payload in the bugfind subagent fixture).
 - `comments[]` length in the POST body = 2 (anchored findings only).
 - Review body contains a `### Findings without a diff anchor` section listing the third finding.
 - Bugfix outcome XML marks all 3 findings with a `reply_comment_url`; the unanchored finding's `used_fallback="true"` and `finding_comment_url` equals the parent review URL.
@@ -242,9 +236,9 @@ Patch this table to match observation and annotate each correction.
 - Bugfix teammate outcome XML marks every finding `status="hook_blocked"` with populated `<hook_output>`.
 - Bugfix teammate posts `Hook blocked the fix commit: <one-line summary>` to each finding comment.
 - Lead's `Bash("git rev-parse HEAD")` after fix detects no SHA change → exit reason `stuck`.
-- Steps 19–27 from Eval 5 fire at teardown.
+- Steps 19–26 from Eval 5 fire at teardown.
-**Pass criteria.** Layer A I-2 and I-11 hold. Final report contains `/bugteam exit: stuck` and surfaces the hook_output summary.
+**Pass criteria.** Layer A I-2 and I-9 hold. Final report contains `/bugteam exit: stuck` and surfaces the hook_output summary.
 ---
@@ -252,13 +246,13 @@ Patch this table to match observation and annotate each correction.
 **Scenario.** The available-agents list does not include `pr-description-writer` but does include `general-purpose`.
-**Layer B predicted trace.** Eval 5 steps 1–22 identical; step 23 becomes:
+**Layer B predicted trace.** Eval 5 steps 1–21 identical; step 22 becomes:
 ```
 Agent(subagent_type="general-purpose", description="Rewrite PR 42 body from cumulative diff", prompt=<same brief>)
 ```
-Steps 24–27 follow normally.
+Steps 23–26 follow normally.
 **Pass criteria.** Exactly 1 `Agent(subagent_type="general-purpose", ...)` call for the description rewrite. `gh pr edit` fires. Final report carries no Step 4.5 skip warning.
@@ -268,7 +262,7 @@ Steps 24–27 follow normally.
 **Scenario.** Neither `pr-description-writer` nor `general-purpose` appear in the available-agents list.
-**Layer B predicted trace.** Eval 5 steps 1–22, then skip steps 23–25. Steps 26–27 still fire.
+**Layer B predicted trace.** Eval 5 steps 1–21, then skip steps 22–24. Steps 25–26 still fire.
 **Pass criteria.**
 - Zero `Agent` calls for PR description rewriting.
@@ -280,50 +274,17 @@ Steps 24–27 follow normally.
 ## Eval 14 — Permissions revoke on error path
-**Scenario.** Bugfind teammate refuses `shutdown_request` during loop 1, returning `{type: "shutdown_response", approve: false}`.
+**Scenario.** Bugfind subagent completes but writes no outcomes XML (background subagent completes notification arrives with no file at the expected path).
-**Layer B predicted trace.** Eval 5 steps 1–8, then:
-- Step 9 `SendMessage(to="bugfind", ...)` receives `approve: false`.
-- Skill sets exit reason = `error: bugfind teammate refused shutdown`.
-- Steps 19–27 all fire (Layer A I-2 and I-11 mandate this).
+**Layer B predicted trace.** Eval 5 steps 1–7, then:
+- Lead awaits notification and calls `Read(".bugteam-pr42-loop1.outcomes.xml")` → file missing.
+- Skill sets exit reason = `error: outcomes XML missing after bugfind loop 1`.
+- Teardown (steps 19–26 from Eval 5) all fire.
 **Pass criteria.** Final report surfaces the error and the loop number. Revoke fires despite the error.
 ---
-## Eval 15 — Orchestrator-only `TeamCreate` (supplementary work path)
-**Scenario.** A loop 1 audit surfaces a P0/P1 finding whose root cause sits in adjacent infrastructure the lead needs to fix before the cycle can converge (e.g., a broken CI hook, a misbehaving lint config, a wrong GitHub API shape in a teammate's own dependency). The lead recognizes supplementary work is needed and decides to spawn additional teammates to handle it.
-**Layer A invariants.** I-1, I-3, I-4, I-5, I-6, I-7, I-11, I-12, **I-13 (primary focus)**.
-**Layer B predicted trace.** Eval 5 steps 1–9 identical. At step 10 (where a standard cycle spawns `bugfix`), the lead decides the finding requires adjacent infrastructure work first. Rather than call `TeamCreate` for a new team, the lead spawns a supplementary teammate into the existing team:
-```
-Agent(
-  subagent_type="code-quality-agent",
-  name="bugfind-adjacent",
-  team_name="<lead_team_name>",          // same team as bugfind/bugfix
-  model="opus",
-  description="Supplementary audit of adjacent infrastructure",
-  prompt=<brief naming the specific adjacent files + observed symptom>
-)
-```
-The adjacent-audit teammate writes its own outcome XML, self-terminates. Lead reads the XML, decides fix strategy, spawns an adjacent-fix teammate into the same team. Cycle eventually returns to the standard `bugfix` spawn for the original finding(s). All spawns pass the same `team_name`.
-**Pass criteria.**
-- Layer A I-13 holds: zero `TeamCreate` calls beyond the single one at skill Step 2.
-- Every `Agent(...)` call in the session carries `team_name="<lead_team_name>"`. No teammate spawn omits `team_name`.
-- If the lead attempts a second `TeamCreate` call, the runtime returns the exact error quoted in I-13's citation; the lead treats this as a signal to spawn a teammate into the existing team instead.
-- Working behavior is unchanged from a single-set cycle: grant → TeamCreate (once) → Agent spawns (many, all same team_name) → SendMessage shutdowns as needed → TeamDelete (once) → temp cleanup → Step 4.5 → revoke.
-**Failure mode.** A second `TeamCreate` call in the session, or any `Agent(...)` call without `team_name` once the team exists. Either signals the orchestrator-only invariant has been violated and the clean-room/team semantics are broken.
-**Observation source for this eval.** This eval was added after a real /bugteam run on PR #184 where the lead discovered a broken hook mid-cycle and initially spawned a standalone subagent (no `team_name`) for the adjacent audit — a direct violation. The runtime had already prevented a second `TeamCreate` with the error quoted in I-13. The eval codifies the correct path (spawn as teammate into existing team) so future runs do not repeat the violation.
----
 ## Iteration protocol
 1. **Cycle 0 — Reconcile predictions with reality.** On the first real run, diff every Layer B predicted trace against the observed trace. Patch this file to match reality and annotate each correction with a reason.
@@ -344,5 +305,5 @@ A minimal Python harness under `packages/claude-dev-env/skills/bugteam/evals/`:
 ## Open research items flagged during this pass
 1. **GitHub REST review-POST payload shape.** Eval 9 and Eval 10 depend on the exact body shape of `POST /pulls/<number>/reviews`. The `jq -n --rawfile ... --argjson ... | gh api ... --input -` fence lives in `SKILL.md` § Step 2.5 (**Review POST**); expanded copy in `reference/github-pr-reviews.md` § **Per-loop review**. Before running Eval 9/10 for real, fetch the current GitHub REST reference to confirm the request schema (fields `commit_id`, `event`, `body`, `comments[]`) and the multi-line anchor `{path, start_line, start_side, line, side, body}` shape still apply. Record the confirmed version and URL here.
-2. **`SendMessage` shutdown origination — RESOLVED.** `SendMessage` tool docs include the line "Don't originate `shutdown_request` unless asked." `TeamCreate` tool docs explicitly direct the lead to originate `{type: "shutdown_request"}` for teammate cleanup. Real-run observation (loop 1 of eval run 2026-04-18) resolved the contradiction: teammates self-terminate when their task is complete — the `Agent` call returns and the teammate's session ends without any `SendMessage`. The cycle proceeded correctly without the lead ever needing to originate a `shutdown_request`. `SKILL.md` § AUDIT / FIX actions document self-termination as the expected path and lead-originated `SendMessage(shutdown_request)` as a fallback; `reference/audit-and-teammates.md` carries the longer shutdown narrative. Layer A **I-4** encodes “no orphaned teammates,” not “always send SendMessage.”
+2. **Background subagent completion signal.** Real-run observation (loop 1 of eval run 2026-04-18) confirmed: background subagents self-terminate when their task is complete — the background-completion notification arrives and the lead reads the outcomes XML. No shutdown handshake required. `SKILL.md` § AUDIT / FIX actions document this flow. Layer A **I-4** encodes “fresh subagent per loop.”
 3. **Model override redundancy.** `clean-coder` pins `model: opus` in its agent definition, while `code-quality-agent` currently uses `model: inherit`. The explicit `model="opus"` in every spawn is insurance against frontmatter drift; on the first real run, confirm the resolved model is `claude-opus-4-7` and that effort defaults to `xhigh` (Claude Code shows the active effort next to the spinner per the model-config docs). If a teammate's frontmatter ever pins a non-default `effort:` value, that frontmatter overrides the model default for that subagent (https://code.claude.com/docs/en/model-config — *"Frontmatter effort applies when that skill or subagent is active, overriding the session level but not the environment variable."*).

package/skills/bugteam/reference/README.md CHANGED Viewed

@@ -4,10 +4,8 @@ Expanded material that used to live inline in `SKILL.md`. Load a file when the o
 | File | Domain |
 |------|--------|
-| [`workflow-path-a-orchestrated-teams.md`](workflow-path-a-orchestrated-teams.md) | **Path A only** — `TeamCreate`, `Agent` + `team_name`, `SendMessage`, `TeamDelete`, who posts Step 2.5 |
-| [`workflow-path-b-task-harness.md`](workflow-path-b-task-harness.md) | **Path B only** — `Task` harness (no `TeamCreate` / `TeamDelete`, lead Step 2.5 `gh api`, Step 4 omissions) |
-| [`design-rationale.md`](design-rationale.md) | Why agent teams (clean-room), table-of-contents habit, when `/bugteam` applies, refusal reasons |
-| [`team-setup.md`](team-setup.md) | Permissions grant (`CLAUDE_SKILL_DIR`), PR scope, `TeamCreate`, team name / sanitization / temp dir / roles / loop state |
+| [`design-rationale.md`](design-rationale.md) | Why clean-room subagents, table-of-contents habit, when `/bugteam` applies, refusal reasons |
+| [`team-setup.md`](team-setup.md) | Permissions grant (`CLAUDE_SKILL_DIR`), PR scope, run name / temp dir / loop state |
 | [`github-pr-reviews.md`](github-pr-reviews.md) | Per-loop reviews, `jq` + `gh api` payloads, anchors, fallbacks, REST endpoints |
 | [`audit-and-teammates.md`](audit-and-teammates.md) | Pre-audit gate, full cycle numbering, AUDIT and FIX actions, parallel auditors |
 | [`teardown-publish-permissions.md`](teardown-publish-permissions.md) | Utility scripts note, teardown, PR description rewrite, revoke, final report |

package/skills/bugteam/reference/audit-and-teammates.md CHANGED Viewed

@@ -24,11 +24,11 @@ Repeat until an exit condition fires.
    2. If exit code **0** → continue to step 2.5 (AUDIT spawn) below.
    3. If exit code **non-zero** → spawn a new **clean-coder** teammate — **standards-fix pass** — with instructions: read the script’s stderr, edit the repo until a **re-run** of the **same** gate command exits **0**, then one commit, `git push`, shutdown. Repeat standards-fix spawns until the gate exits **0** or **5** failed gate rounds (each round = one teammate session after a non-zero gate). If still non-zero after 5 rounds → exit reason = `error: code rules gate failed pre-audit`.
    4. After gate exit **0**, increment `loop_count`. If `loop_count > 10`, exit reason = `cap reached` (counts **audits**, not standards-only rounds).
-   5. Execute **AUDIT action** (spawn bugfind). Print progress: `Loop <N> audit: ...`
+   5. Execute **AUDIT action** (spawn bugfind). Print progress: `Loop <L> audit: ...`
 3. **FIX path** (when `last_action == "audited"` and `last_findings.total > 0`):
    1. Increment `loop_count`. If `loop_count > 10`, exit reason = `cap reached`.
-   2. Execute **FIX action** (spawn bugfix clean-coder for audit findings). Print: `Loop <N> fix: commit ...`
+   2. Execute **FIX action** (spawn bugfix clean-coder for audit findings). Print: `Loop <L> fix: commit ...`
    3. Set `last_action = "fixed"`, update `audit_log`, loop to step 1 (next iteration hits **pre-audit path** before the next AUDIT).
 4. After **AUDIT**, update `last_action`, `last_findings`, `audit_log`; print the audit progress line if not already printed.
@@ -39,62 +39,45 @@ Repeat until an exit condition fires.
 ## AUDIT action (clean-room teammate, fresh per loop)
-Capture a fresh PR diff for this loop into the per-team scoped directory so concurrent `/bugteam` runs keep patches isolated. Use the literal `<team_temp_dir>` resolved once in Step 2 — Claude resolves the absolute path; every shell receives the same literal value.
+Capture a fresh PR diff for this loop into the per-PR scoped directory so concurrent `/bugteam` runs keep patches isolated. Use the literal `<run_temp_dir>` resolved once in Step 2 — Claude resolves the absolute path; every shell receives the same literal value.
 Commands and `Agent(...)` shape: `SKILL.md`.
-`<team_temp_dir>` includes the sanitized `team_name` and timestamp; `team_name` is already prefixed with `bugteam-`. Claude resolves `Path(tempfile.gettempdir()) / team_name` once and passes that absolute path to every shell. `tempfile.gettempdir()` honors `TMPDIR`, `TEMP`, `TMP` and falls back to the OS temp directory, so the same approach works on macOS, Linux, Windows cmd.exe, and PowerShell.
+`<run_temp_dir>` includes the sanitized `team_name` and timestamp; `team_name` is already prefixed with `bugteam-`. Claude resolves `Path(tempfile.gettempdir()) / team_name` once and passes that absolute path to every shell. `tempfile.gettempdir()` honors `TMPDIR`, `TEMP`, `TMP` and falls back to the OS temp directory, so the same approach works on macOS, Linux, Windows cmd.exe, and PowerShell.
 Each loop calls `Agent` again with a fresh invocation so the teammate starts with its own context window. Doc line on lead history: [`../sources.md`](../sources.md).
 See [`../PROMPTS.md`](../PROMPTS.md) for AUDIT spawn-prompt XML and bugfind outcome schema. Substitute placeholders (`repo`, `branch`, `base_branch`, `pr_url`, `loop`, `diff_path`) into the `prompt` argument.
-After the teammate returns, the lead reads `.bugteam-loop-<N>.outcomes.xml` with the `Read` tool, parses it, and populates `loop_comment_index` from `<finding>` elements.
+After the teammate returns, the lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` from the worktree directory with the `Read` tool, parses it, and populates `loop_comment_index` from `<finding>` elements.
 ### Shutdown (bugfind)
-**Expected path — self-termination:** Teammates often self-terminate when complete — the `Agent` call returns and the session ends. Then no `SendMessage` is needed.
-**Fallback — lead-initiated shutdown:** If the teammate still appears active after `Agent` returns, send:
-```
-SendMessage(
-  to="bugfind",
-  message={
-    "type": "shutdown_request",
-    "reason": "audit loop <N> complete; outcome XML captured"
-  }
-)
-```
-The teammate replies with `{type: "shutdown_response", approve: true}`. If `approve` is `false`, exit reason = `error: bugfind teammate refused shutdown` → Step 4 teardown then Step 5 revoke.
+Teammates self-terminate when complete — the background-completion notification arrives and the lead reads the outcomes XML. If the notification does not arrive within the lead timeout (120s), treat as a hard blocker and abort the loop.
 `last_action = "audited"`. Append audit metadata to `audit_log`.
 ### Parallel auditors (`loop_count >= 4`)
-The pre-audit gate must pass immediately before this step. After three full audit/fix rounds without convergence, issue three `Agent` calls in **one** assistant message so they run in parallel:
+The pre-audit gate must pass immediately before this step. After three full audit/fix rounds without convergence, issue eleven `Agent` calls in **one** assistant message so they run in parallel:
 ```
-Agent(subagent_type="code-quality-agent", name="bugfind-loop-<N>-a", team_name="<team_name>", model="opus", description="Bugfind audit loop <N> variant a", prompt="<audit XML; write outcome to .bugteam-loop-<N>.outcomes.xml; post the per-loop review; read and merge b/c outcomes from <team_temp_dir>/loop-<N>-b.outcomes.xml and <team_temp_dir>/loop-<N>-c.outcomes.xml>")
-Agent(subagent_type="code-quality-agent", name="bugfind-loop-<N>-b", team_name="<team_name>", model="opus", description="Bugfind audit loop <N> variant b", prompt="<audit XML; write outcome to <team_temp_dir>/loop-<N>-b.outcomes.xml; skip PR posting>")
-Agent(subagent_type="code-quality-agent", name="bugfind-loop-<N>-c", team_name="<team_name>", model="opus", description="Bugfind audit loop <N> variant c", prompt="<audit XML; write outcome to <team_temp_dir>/loop-<N>-c.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-a", team_name="<team_name>", model="opus", run_in_background=true, description="Bugfind audit PR <N> loop <L> validator", prompt="<audit XML; poll for all 10 sibling XMLs at <run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml through <run_temp_dir>/pr-<N>/loop-<L>-k.outcomes.xml (60s timeout, 2s interval); on timeout: log diagnostics entry, proceed with validated findings from available XMLs; validate each finding: file exists, line in bounds, excerpt matches claimed line, category A-J, severity P0/P1/P2; quarantine hallucinated findings to <run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json under validator_rejected; de-dup by (file, line, category), max severity wins, keep longest description on conflict; re-id as loop<L>-<K>; write <worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml; post review>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-b", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant b", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-c", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant c", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-c.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-d", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant d", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-d.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-e", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant e", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-e.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-f", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant f", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-f.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-g", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant g", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-g.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-h", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant h", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-h.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-i", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant i", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-i.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-j", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant j", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-j.outcomes.xml; skip PR posting>")
+Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-k", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant k", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-k.outcomes.xml; skip PR posting>")
 ```
-Teammate `-a` is the post-owner: read all three outcome XML files at explicit absolute paths (`.bugteam-loop-<N>.outcomes.xml` in cwd, plus sibling paths under `<team_temp_dir>`), merge findings by `(file, line, category_letter)` (collapse duplicates, keep longest description and highest severity), re-assign merged IDs as `loopN-K`, post the single per-loop review. The `-a` prompt must embed sibling paths as literal absolutes so `Read` works without discovery.
-Shutdown order: parallel `SendMessage` to `b` and `c`, then `a`:
+Teammate `-a` is the opus validator: polls for all 10 sibling XMLs at explicit absolute paths under `<run_temp_dir>/pr-<N>` (60s timeout, 2s interval; on timeout: log diagnostics entry, proceed with validated findings from available XMLs), then validates each finding — file exists, line in bounds, excerpt matches claimed line, category is A–J, severity is P0/P1/P2. Hallucinated findings are quarantined to `<run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json` under `validator_rejected`. Valid findings are de-duplicated by `(file, line, category)` (max severity wins, keep longest description on conflict) and re-assigned merged IDs as `loop<L>-<K>`. The `-a` prompt must embed sibling paths as literal absolutes so `Read` works without discovery.
-```
-SendMessage(to="bugfind-loop-<N>-b", message={"type": "shutdown_request", "reason": "variant XML captured"})
-SendMessage(to="bugfind-loop-<N>-c", message={"type": "shutdown_request", "reason": "variant XML captured"})
-```
-then
-```
-SendMessage(to="bugfind-loop-<N>-a", message={"type": "shutdown_request", "reason": "merged review posted"})
-```
+All subagents self-terminate via background completion. The lead awaits only the validator (-a) notification (120s timeout). Missing notification → hard blocker.
 ## FIX action (fresh teammate)
@@ -106,17 +89,7 @@ After replies, the teammate writes outcome XML (schema in [`../PROMPTS.md`](../P
 ### Shutdown (bugfix)
-Same self-termination vs `SendMessage` split as bugfind. Fallback message:
-```
-SendMessage(
-  to="bugfix",
-  message={
-    "type": "shutdown_request",
-    "reason": "fix loop <N> complete; commit <sha7> pushed"
-  }
-)
-```
+Same self-termination model as bugfind. Missing notification → hard blocker.
 `approve: false` → `error: bugfix teammate refused shutdown` → Step 4 then 5.

package/skills/bugteam/reference/audit-contract.md CHANGED Viewed

@@ -8,7 +8,7 @@ Shared output schema and audit-loop contract used by `/bugteam`, `/qbug`, `/find
 - Adversarial second pass
 - Haiku secondary auditor
 - Post-fix self-audit
-- Persistence (loop-N-audit.json, loop-N-diagnostics.json)
+- Persistence (loop-<L>-audit.json, loop-<L>-diagnostics.json)
 ## Finding schema
@@ -18,7 +18,7 @@ Each finding an audit produces MUST be one of exactly two shapes.
 ```json
 {
-  "id": "loop<N>-<K>",
+  "id": "loop<L>-<K>",
   "file": "path/relative/to/repo/root.py",
   "line": 123,
   "category": "A | B | C | D | E | F | G | H | I | J",
@@ -29,7 +29,7 @@ Each finding an audit produces MUST be one of exactly two shapes.
 }
 ```
-`id` is `loop<N>-<K>` where `N` is the loop counter (1-based) and `K` is the 1-based index within the loop. For `/findbugs` which runs once, use `find<K>`.
+`id` is `loop<L>-<K>` where `L` is the loop counter (1-based) and `K` is the 1-based index within the loop. For `/findbugs` which runs once, use `find<K>`.
 ### Shape B — structured proof-of-absence
@@ -105,9 +105,9 @@ Merge rules:
 - **Unique-to-Haiku findings**: added to the primary set with Haiku's severity and source annotation.
 - **Unique-to-primary findings**: kept as-is.
 - **Zero Haiku findings**: primary set trusted; proceed.
-- **Malformed or non-parseable Haiku output**: lead trusts the primary set, logs the event in `loop-<N>-diagnostics.json` under `haiku_findings` as `[{"parse_error": "<message>"}]`.
+- **Malformed or non-parseable Haiku output**: lead trusts the primary set, logs the event in `loop-<L>-diagnostics.json` under `haiku_findings` as `[{"parse_error": "<message>"}]`.
-For multi-subagent skills (`/bugteam`) the parallel-auditors pattern in [`audit-and-teammates.md`](audit-and-teammates.md) already provides cross-model coverage via the three variant teammates.
+For multi-subagent skills (`/bugteam`) the parallel-auditors pattern in [`audit-and-teammates.md`](audit-and-teammates.md) already provides cross-model coverage via 10 haiku auditors + opus validator.
 ## Post-fix self-audit
@@ -131,7 +131,7 @@ Sequence:
 Every audit loop writes two JSON files under the skill's scoped temp directory (resolved via `tempfile.gettempdir()`):
-### `loop-<N>-audit.json`
+### `loop-<L>-audit.json`
 ```json
 {
@@ -141,7 +141,7 @@ Every audit loop writes two JSON files under the skill's scoped temp directory (
 }
 ```
-### `loop-<N>-diagnostics.json`
+### `loop-<L>-diagnostics.json`
 ```json
 {

package/skills/bugteam/reference/design-rationale.md CHANGED Viewed

@@ -2,13 +2,9 @@
 ## Core principle (expanded)
-A Claude Code **agent team** runs the audit-and-fix loop until convergence. The bugfind teammate audits clean-room (own context window, no chat history); the bugfix teammate addresses each audit's findings; both spawn fresh per loop. A 10-loop hard cap prevents runaway cost. Project permissions are granted at session start and revoked at session end.
+Background subagents (`Agent(..., run_in_background=true)`) run the audit-and-fix loop until convergence. The bugfind subagent audits clean-room (own context window, no chat history); the bugfix subagent addresses each audit’s findings; both spawn fresh per loop with no shared state. A 10-loop hard cap prevents runaway cost. Project permissions are granted at session start and revoked at session end.
-Teammate isolation versus subagents returning into the lead’s context is the clean-room property. Verbatim Anthropic quotes and URLs: [`../sources.md`](../sources.md).
-## Why not parallel subagents here
-Subagents return their results into the lead’s context, which accumulates across loops. Agent-team teammates are independent sessions with their own context windows and do not pollute the lead. The lead can shut down and respawn each loop so every audit starts fresh. For `/bugteam`, the independent-context property is required; parallel subagents fail the clean-room requirement. Supporting quotes: [`../sources.md`](../sources.md) (subagents vs agent teams).
+Fresh-spawn clean-room isolation: each `Agent` call creates a new subagent with its own context window and no access to prior conversation. After the subagent writes its outcome XML and self-terminates, the lead reads the file. Results never accumulate in the lead’s context beyond the XML artifact. Verbatim Anthropic quotes and URLs: [`../sources.md`](../sources.md).
 ## Table of contents in `SKILL.md`
@@ -20,9 +16,8 @@ The user wants automated convergence on a clean PR without babysitting each step
 ### Refusal reasons (detail)
-- **Agent teams off:** Without the feature flag, the workflow cannot run.
 - **No PR / diff:** There is nothing scoped to audit.
-- **Dirty tree:** The fix teammate will commit; uncommitted local work would be mixed into automated commits.
+- **Dirty tree:** The fix subagent will commit; uncommitted local work would be mixed into automated commits.
 - **Missing subagents:** Both `code-quality-agent` and `clean-coder` must exist in the environment before Step 0.
 Exact refusal strings remain in `SKILL.md`.