claude-dev-env 1.36.2 → 1.37.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. package/_shared/pr-loop/scripts/config/preflight_constants.py +29 -8
  2. package/_shared/pr-loop/scripts/preflight.py +242 -20
  3. package/_shared/pr-loop/scripts/tests/test_preflight.py +362 -25
  4. package/_shared/pr-loop/scripts/tests/test_preflight_constants.py +9 -14
  5. package/hooks/blocking/code_rules_enforcer.py +269 -23
  6. package/hooks/blocking/test_code_rules_enforcer_unused_imports.py +157 -1
  7. package/hooks/config/test_unused_module_import_constants.py +48 -0
  8. package/hooks/config/unused_module_import_constants.py +41 -0
  9. package/package.json +1 -1
  10. package/rules/gh-paginate.md +4 -50
  11. package/rules/no-historical-clutter.md +36 -0
  12. package/skills/bg-agent/SKILL.md +69 -0
  13. package/skills/bugteam/CONSTRAINTS.md +10 -19
  14. package/skills/bugteam/PROMPTS.md +21 -14
  15. package/skills/bugteam/SKILL.md +122 -208
  16. package/skills/bugteam/SKILL_EVALS.md +75 -114
  17. package/skills/bugteam/reference/README.md +2 -4
  18. package/skills/bugteam/reference/audit-and-teammates.md +21 -48
  19. package/skills/bugteam/reference/audit-contract.md +7 -7
  20. package/skills/bugteam/reference/design-rationale.md +3 -8
  21. package/skills/bugteam/reference/team-setup.md +11 -19
  22. package/skills/bugteam/reference/teardown-publish-permissions.md +2 -14
  23. package/skills/bugteam/scripts/config/__init__.py +0 -0
  24. package/skills/bugteam/scripts/config/reflow_skill_md_constants.py +12 -0
  25. package/skills/bugteam/scripts/reflow_skill_md.py +51 -47
  26. package/skills/bugteam/sources.md +1 -25
  27. package/skills/bugteam/test_skill_additions.py +4 -13
  28. package/skills/fresh-branch/SKILL.md +71 -0
  29. package/skills/gotcha/SKILL.md +73 -0
  30. package/skills/monitor-open-prs/SKILL.md +4 -37
  31. package/skills/monitor-open-prs/test_skill_contract.py +0 -5
  32. package/skills/pr-converge/SKILL.md +60 -1298
  33. package/skills/pr-converge/reference/convergence-gates.md +122 -0
  34. package/skills/pr-converge/reference/examples.md +76 -0
  35. package/skills/pr-converge/reference/fix-protocol.md +56 -0
  36. package/skills/pr-converge/reference/ground-rules.md +13 -0
  37. package/skills/pr-converge/reference/multi-pr-orchestration.md +204 -0
  38. package/skills/pr-converge/reference/per-tick.md +204 -0
  39. package/skills/pr-converge/reference/state-schema.md +19 -0
  40. package/skills/pr-converge/reference/stop-conditions.md +26 -0
  41. package/skills/pr-converge/scripts/README.md +36 -9
  42. package/skills/pr-converge/scripts/check_pr_mergeability.py +1 -2
  43. package/skills/pr-converge/scripts/config/pr_converge_constants.py +74 -5
  44. package/skills/pr-converge/scripts/config/reflow_skill_md_constants.py +13 -0
  45. package/skills/pr-converge/scripts/config/test_pr_converge_constants.py +0 -24
  46. package/skills/pr-converge/scripts/cursor-agents-continue.ahk +22 -2
  47. package/skills/pr-converge/scripts/fetch_bugbot_inline_comments.py +19 -59
  48. package/skills/pr-converge/scripts/fetch_bugbot_reviews.py +15 -61
  49. package/skills/pr-converge/scripts/fetch_claude_inline_comments.py +70 -0
  50. package/skills/pr-converge/scripts/fetch_claude_reviews.py +61 -0
  51. package/skills/pr-converge/scripts/fetch_copilot_inline_comments.py +19 -61
  52. package/skills/pr-converge/scripts/fetch_copilot_reviews.py +14 -74
  53. package/skills/pr-converge/scripts/reflow_skill_md.py +71 -50
  54. package/skills/pr-converge/scripts/reviewer_fetch_core.py +153 -0
  55. package/skills/pr-converge/scripts/reviewer_specs.py +98 -0
  56. package/skills/pr-converge/scripts/test_cursor_agents_continue.py +65 -0
  57. package/skills/pr-converge/scripts/test_fetch_bugbot_inline_comments.py +107 -6
  58. package/skills/pr-converge/scripts/test_fetch_bugbot_reviews.py +85 -6
  59. package/skills/pr-converge/scripts/test_fetch_claude_inline_comments.py +485 -0
  60. package/skills/pr-converge/scripts/test_fetch_claude_reviews.py +368 -0
  61. package/skills/pr-converge/scripts/test_fetch_copilot_inline_comments.py +74 -6
  62. package/skills/pr-converge/scripts/test_fetch_copilot_reviews.py +94 -8
  63. package/skills/pr-converge/scripts/test_reflow_skill_md.py +162 -0
  64. package/skills/pr-converge/scripts/test_reviewer_fetch_core.py +448 -0
  65. package/skills/pr-converge/scripts/test_reviewer_specs.py +107 -0
  66. package/skills/pr-converge/scripts/test_view_pr_context.py +44 -0
  67. package/skills/pr-converge/scripts/view_pr_context.py +35 -4
  68. package/skills/pr-converge/workflows/schedule-wakeup-loop.md +24 -22
  69. package/skills/bugteam/reference/workflow-path-a-orchestrated-teams.md +0 -113
  70. package/skills/bugteam/reference/workflow-path-b-task-harness.md +0 -48
  71. package/skills/bugteam/test_team_lifecycle.py +0 -103
  72. package/skills/monitor-open-prs/test_team_lifecycle.py +0 -46
  73. package/skills/pr-converge/scripts/open_followup_copilot_pr.py +0 -136
  74. package/skills/pr-converge/scripts/test_open_followup_copilot_pr.py +0 -236
  75. package/skills/pr-converge/test_team_lifecycle.py +0 -56
  76. package/skills/pr-converge/workflows/ahk-auto-continue-loop.md +0 -108
@@ -14,23 +14,20 @@ Evals are split into two layers. Both layers run against the same trace but carr
14
14
 
15
15
  ## Ironclad invariants (Layer A, apply to every eval)
16
16
 
17
- Each invariant cites the normative section or companion file it derives from. **Path A vs Path B:** `SKILL.md` **Path routing** splits harnesses. Invariants **I-1, I-3, I-4, I-7, I-9, I-11 (teammate shutdown → `TeamDelete` prefix), I-13** apply to **Path A only**. **Path B** substitutes per [`reference/workflow-path-b-task-harness.md`](reference/workflow-path-b-task-harness.md): no `TeamCreate` / `TeamDelete`; parallel auditors use parallel **`Task`** calls; **I-12 Path B** — the **lead** runs Step 2.5 `gh api` posts. **I-2, I-5, I-6, I-8, I-10** apply to **both** paths (revoke once; fresh spawn per loop; `model="opus"` on audit/fix workers; cap; read outcome XML).
17
+ Each invariant cites the normative section or companion file it derives from. All spawns use `Agent(..., run_in_background=true)`. Invariants apply uniformly across all eval fixtures.
18
18
 
19
19
  | # | Invariant | Citation |
20
20
  |---|---|---|
21
- | I-1 | **Path A:** `Bash` invoking `scripts/grant_project_claude_permissions.py` precedes every `TeamCreate`. **Path B:** grant precedes first audit **`Task`**. | `SKILL.md` § Step 0; § Path routing |
22
- | I-2 | `Bash` invoking `scripts/revoke_project_claude_permissions.py` runs exactly once per invocation on every exit path, **after** teardown that applies to that path (`TeamDelete` only on Path A). | `SKILL.md` § Step 5 |
23
- | I-3 | **Path A:** Exactly one `TeamCreate` and exactly one `TeamDelete` per invocation. **Path B:** zero `TeamCreate` / `TeamDelete`. | `SKILL.md` § Step 2; § Step 4 |
24
- | I-4 | **Path A:** Before `TeamDelete`, no teammate remains active without cleanup (self-terminated `Agent` or `SendMessage` shutdown). | `SKILL.md` § AUDIT/FIX shutdown; § Step 4 |
25
- | I-5 | **Path A:** `Agent` calls are fresh per loop. **Path B:** `Task` calls for audit/fix are fresh per loop (same `name` discipline where the host exposes naming). | `CONSTRAINTS.md` — **Fresh teammate per loop**; deltas **Clean-room note** |
26
- | I-6 | Both paths: audit and fix worker spawns pass `model="opus"` on `Agent` **or** `Task` as documented in `SKILL.md` § AUDIT/FIX. | `SKILL.md` § Step 2 (**Roles**); `CONSTRAINTS.md` **Opus 4.7 at xhigh effort for both teammates** |
27
- | I-7 | **Path A:** `TeamDelete()` is called with no arguments. **Path B:** omit. | `SKILL.md` § Step 4 |
28
- | I-8 | Loop count 10 audits. 11th audit never fires. | `SKILL.md` YAML `description` (10-loop cap); § Step 3 (**Pre-audit** / **FIX** increment rules) |
29
- | I-9 | **Path A:** From loop 4 onward without convergence, three parallel `Agent` calls in one message. **Path B:** three parallel **`Task`** calls. | `SKILL.md` § AUDIT action (**Parallel auditors**); `reference/workflow-path-b-task-harness.md` |
30
- | I-10 | Lead reads `.bugteam-loop-<N>.outcomes.xml` with the `Read` tool after each audit, before the next action. | `SKILL.md` § AUDIT action |
31
- | I-11 | **Path A:** `git worktree remove` each PR → teammate shutdowns → `TeamDelete` → `rmtree` `<team_temp_dir>` → Step 4.5 → revoke. **Path B:** `git worktree remove` each PR → (omit shutdown / `TeamDelete`) → `rmtree` → Step 4.5 → revoke. | `SKILL.md` § Step 4; § Step 4.5; § Step 5; `reference/workflow-path-a-orchestrated-teams.md` § Step 4; `reference/workflow-path-b-task-harness.md` § Step 4 |
32
- | I-12 | **Path A:** Lead never posts PR review / finding / fix replies except Step 4.5 body. **Path B:** Lead performs Step 2.5 posts per deltas; Step 4.5 unchanged. | `CONSTRAINTS.md` — **Audit/fix comment posting** |
33
- | I-13 | **Path A:** Only the lead invokes `TeamCreate`; every teammate `Agent(..., team_name=...)`. **Path B:** no `TeamCreate`; `Task` spawns omit `team_name`. | `CONSTRAINTS.md` — **Path A — orchestrator-only `TeamCreate`**; `reference/workflow-path-b-task-harness.md` |
21
+ | I-1 | `Bash` invoking `scripts/grant_project_claude_permissions.py` precedes the first audit `Agent` spawn. | `SKILL.md` § Step 0 |
22
+ | I-2 | `Bash` invoking `scripts/revoke_project_claude_permissions.py` runs exactly once per invocation on every exit path, after teardown. | `SKILL.md` § Step 5 |
23
+ | I-3 | Orchestration uses `Agent(..., run_in_background=true)` only no `TeamCreate`, `TeamDelete`, `SendMessage`, or `Task` tool calls. | `SKILL.md` § Step 2; § Step 4 |
24
+ | I-4 | `Agent` calls are fresh per loop (`run_in_background=true`; new `name` each loop). | `CONSTRAINTS.md` **Fresh subagent per loop** |
25
+ | I-5 | Audit sibling spawns pass `model="haiku"`; validator and fix spawns pass `model="opus"`. | `SKILL.md` § AUDIT action (parallel auditors); § FIX action; `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for validator and fix subagents** |
26
+ | I-6 | Loop count 10 audits. 11th audit never fires. | `SKILL.md` YAML `description` (10-loop cap); § Step 3 (**Pre-audit** / **FIX** increment rules) |
27
+ | I-7 | From loop 4 onward without convergence, eleven parallel `Agent(..., run_in_background=true)` calls in one message for audit. | `SKILL.md` § AUDIT action (**Parallel auditors**) |
28
+ | I-8 | Lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` with the `Read` tool after each audit, before the next action. | `SKILL.md` § AUDIT action |
29
+ | I-9 | Teardown sequence: `git worktree remove` each PR `rmtree` `<run_temp_dir>` Step 4.5 revoke. | `SKILL.md` § Step 4; § Step 4.5; § Step 5 |
30
+ | I-10 | The bugfind subagent posts ONE per-loop review; the bugfix subagent posts fix replies. The lead's only PR-write action is the Step 4.5 description rewrite. | `CONSTRAINTS.md` **Audit/fix comment posting** |
34
31
 
35
32
  Any eval failing one or more Layer A invariants fails the run.
36
33
 
@@ -46,25 +43,23 @@ The harness does not yet exist; this document defines its contract.
46
43
 
47
44
  ---
48
45
 
49
- ## Eval 1 — Path B: agent teams env unset (Task harness, not a refusal)
46
+ ## Eval 1 — Smoke: background subagent spawns fire correctly
50
47
 
51
- **Scenario.** `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS` is unset in both `claude config` and `~/.claude/settings.json`.
48
+ **Scenario.** PR exists; PR is a clean target with no unusual pre-conditions.
52
49
 
53
50
  **Trigger.** `/bugteam`
54
51
 
55
- **Layer A invariants.** Path B subset (I-2, I-5, I-6, I-8, I-10; I-1/I-3/I-4/I-7/I-9/I-11/I-13 N/A or Path-B-shaped).
52
+ **Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-8, I-9, I-10.
56
53
 
57
- **Layer B predicted trace (Path B smoke).**
58
- 1. `Bash("claude config get env.CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS")` empty.
59
- 2. `Bash("python .../grant_project_claude_permissions.py")` runs (Step 0).
60
- 3. **No** `TeamCreate`.
61
- 4. At least one `Task(subagent_type="code-quality-agent", ...)` or host-equivalent for AUDIT; FIX rounds use the host-appropriate FIX `Task` from `workflow-path-b-task-harness.md` § **FIX spawn** (`clean-coder` subtype on Claude Code when accepted; `generalPurpose` + clean-coder **Read** preamble on Cursor when the enum rejects `clean-coder`).
54
+ **Layer B predicted trace (smoke).**
55
+ 1. `Bash("python .../grant_project_claude_permissions.py")` runs (Step 0).
56
+ 2. `Agent(subagent_type="code-quality-agent", name="bugfind-pr...-loop1", run_in_background=true, model="opus", ...)` spawned for AUDIT.
57
+ 3. Lead awaits background-completion notification, then `Read(".bugteam-pr42-loop1.outcomes.xml")`.
58
+ 4. `Agent(subagent_type="clean-coder", name="bugfix-pr...-loop1", run_in_background=true, model="opus", ...)` spawned for FIX (if findings).
62
59
  5. `Bash("python .../revoke_project_claude_permissions.py")` on exit.
63
60
 
64
61
  **Pass criteria.**
65
- - **No** refusal string about missing agent teams.
66
- - Zero `TeamCreate`, zero `TeamDelete`, zero teammate `SendMessage` shutdowns.
67
- - Non-zero `Task` (or `Agent` without `team_name` only if the host maps Path B that way) carrying **`code-quality-agent`** / **fix worker under the clean-coder contract** (subtype `clean-coder` where accepted, else `generalPurpose` + `clean-coder.md` Read per `workflow-path-b-task-harness.md`).
62
+ - Non-zero `Agent(subagent_type="code-quality-agent", run_in_background=true)` and `Agent(subagent_type="clean-coder", run_in_background=true)` calls.
68
63
 
69
64
  ---
70
65
 
@@ -75,7 +70,7 @@ The harness does not yet exist; this document defines its contract.
75
70
  **Layer B predicted trace.**
76
71
  1. `Bash("gh pr view --json ...")` → non-zero exit.
77
72
  2. `Bash("git merge-base HEAD origin/main")` → empty.
78
- 3. No grant script, no `TeamCreate`.
73
+ 3. No grant script.
79
74
 
80
75
  **Pass criteria.** Assistant message matches `No PR or upstream diff. /bugteam needs a target.`. Zero downstream tool calls.
81
76
 
@@ -93,15 +88,15 @@ The harness does not yet exist; this document defines its contract.
93
88
 
94
89
  **Scenario.** `code-quality-agent` is present in the available-agents list; `clean-coder` is not.
95
90
 
96
- **Pass criteria.** Assistant message contains `Required subagent type clean-coder not installed.`. Zero grant script call, zero `TeamCreate`.
91
+ **Pass criteria.** Assistant message contains `Required subagent type clean-coder not installed.`. Zero grant script call, zero `Agent` spawns.
97
92
 
98
93
  ---
99
94
 
100
- ## Eval 5 — Happy path: converges in 2 loops (Path A fixture)
95
+ ## Eval 5 — Happy path: converges in 2 loops
101
96
 
102
- **Scenario.** PR #42 contains three P1 bugs all addressable by the mock fix teammate. Loop 1 audit returns 3 findings; loop 1 fix commits cleanly; loop 2 audit returns zero findings. **`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`** (Path A — `TeamCreate` + teammate `Agent`).
97
+ **Scenario.** PR #42 contains three P1 bugs all addressable by the mock fix subagent. Loop 1 audit returns 3 findings; loop 1 fix commits cleanly; loop 2 audit returns zero findings.
103
98
 
104
- **Layer A invariants.** Path A: I-1, I-2, I-3, I-4, I-5, I-6, I-7, I-10, I-11, I-12, I-13.
99
+ **Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-6, I-8, I-9, I-10.
105
100
 
106
101
  **Layer B predicted trace.**
107
102
 
@@ -109,41 +104,40 @@ The harness does not yet exist; this document defines its contract.
109
104
  |---|---|---|
110
105
  | 1 | `Bash("python .../scripts/grant_project_claude_permissions.py")` | `SKILL.md` § Step 0 |
111
106
  | 2 | `Bash("gh pr view --json number,baseRefName,headRefName,url")` | `SKILL.md` § Step 1 |
112
- | 3 | `Bash("git rev-parse HEAD")` → captures `starting_sha` | `SKILL.md` § Step 2 — **Loop state** block |
113
- | 4 | `TeamCreate(team_name="bugteam-pr-42-<ts>", description=..., agent_type="team-lead")` | `SKILL.md` § Step 2 |
114
- | 5 | `Bash("mkdir -p <team_temp_dir>")` | `SKILL.md` § AUDIT action |
115
- | 6 | `Bash("gh pr diff 42 -R ... > <team_temp_dir>/loop-1.patch")` | `SKILL.md` § AUDIT action |
116
- | 7 | `Agent(subagent_type="code-quality-agent", name="bugfind", team_name=..., model="opus", description=..., prompt=<audit XML loop 1>)` | `SKILL.md` § AUDIT action |
117
- | 8 | `Read(".bugteam-loop-1.outcomes.xml")` | `SKILL.md` § AUDIT action |
118
- | 9 | `SendMessage(to="bugfind", message={type: "shutdown_request", reason: "audit loop 1 complete; outcome XML captured"})` | `SKILL.md` § AUDIT action (**Shutdown** fallback) |
119
- | 10 | `Agent(subagent_type="clean-coder", name="bugfix", team_name=..., model="opus", description=..., prompt=<fix XML loop 1>)` | `SKILL.md` § FIX action |
120
- | 11 | `Read(".bugteam-loop-1.outcomes.xml")` — bugfix outcome XML overwrites same filename | `SKILL.md` § FIX action |
121
- | 12 | `Bash("git rev-parse HEAD")` → verify HEAD advanced | `SKILL.md` § FIX action (**Verify**) |
122
- | 13 | `Bash("git fetch origin <branch> && git rev-parse origin/<branch>")` → verify push landed | `SKILL.md` § FIX action (**Verify**) |
123
- | 14 | `SendMessage(to="bugfix", message={type: "shutdown_request", reason: "fix loop 1 complete; commit <sha7> pushed"})` | `SKILL.md` § FIX action (**Shutdown** fallback) |
124
- | 15 | `Bash("gh pr diff 42 -R ... > <team_temp_dir>/loop-2.patch")` | `SKILL.md` § AUDIT action |
125
- | 16 | `Agent(subagent_type="code-quality-agent", name="bugfind", ...)` (loop 2) | `SKILL.md` § AUDIT action |
126
- | 17 | `Read(".bugteam-loop-2.outcomes.xml")` zero findings | `SKILL.md` § AUDIT action |
127
- | 18 | `SendMessage(to="bugfind", message={type: "shutdown_request", reason: "audit loop 2 complete; zero findings"})` | `SKILL.md` § AUDIT action (**Shutdown** fallback) |
128
- | 19 | `TeamDelete()` | `SKILL.md` § Step 4 |
129
- | 20 | `Bash("python -c \"import os, shutil, stat, sys; h = lambda f, p, *_: (os.chmod(p, stat.S_IWRITE), f(p)); shutil.rmtree(r'<team_temp_dir>', **({'onexc': h} if sys.version_info >= (3, 12) else {'onerror': h}))\"")` | `SKILL.md` § Step 4 (Windows-safe teardown) |
107
+ | 3 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → captures `starting_sha` | `SKILL.md` § Step 2 — **Loop state** block |
108
+ | 4 | `Bash("mkdir -p <run_temp_dir>/pr-42")` | `SKILL.md` § AUDIT action |
109
+ | 5 | `Bash("gh pr diff 42 -R ... > <run_temp_dir>/pr-42/loop-1.patch")` | `SKILL.md` § AUDIT action |
110
+ | 6 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop1", run_in_background=true, model="opus", description=..., prompt=<audit XML loop 1>)` | `SKILL.md` § AUDIT action |
111
+ | 7 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
112
+ | 8 | `Read(".bugteam-pr42-loop1.outcomes.xml")` | `SKILL.md` § AUDIT action |
113
+ | 9 | `Agent(subagent_type="clean-coder", name="bugfix-pr42-loop1", run_in_background=true, model="opus", description=..., prompt=<fix XML loop 1>)` | `SKILL.md` § FIX action |
114
+ | 10 | Lead awaits background-completion notification | `SKILL.md` § FIX action |
115
+ | 11 | `Read(".bugteam-pr42-loop1.outcomes.xml")` — bugfix outcome XML | `SKILL.md` § FIX action |
116
+ | 12 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → verify HEAD advanced | `SKILL.md` § FIX action (**Verify**) |
117
+ | 13 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" fetch origin <branch>")` → fetch remote state | `SKILL.md` § FIX action (**Verify**) |
118
+ | 14 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse origin/<branch>")` confirm matches HEAD | `SKILL.md` § FIX action (**Verify**) |
119
+ | 15 | `Bash("gh pr diff 42 -R ... > <run_temp_dir>/pr-42/loop-2.patch")` | `SKILL.md` § AUDIT action |
120
+ | 16 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop2", run_in_background=true, ...)` (loop 2) | `SKILL.md` § AUDIT action |
121
+ | 17 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
122
+ | 18 | `Read(".bugteam-pr42-loop2.outcomes.xml")` zero findings | `SKILL.md` § AUDIT action |
123
+ | 19 | `Bash("git worktree remove \"<run_temp_dir>/pr-42/worktree\"")` | `SKILL.md` § Step 4 step 1 |
124
+ | 20 | `Bash("python -c \"...shutil.rmtree(r'<run_temp_dir>', ...)\"")` | `SKILL.md` § Step 4 step 2 (Windows-safe teardown) |
130
125
  | 21 | `Bash("gh pr diff 42 -R ... > .bugteam-final.diff")` | `SKILL.md` § Step 4.5 step 1 |
131
126
  | 22 | `Bash("gh pr view 42 -R ... --json body --jq .body > .bugteam-original-body.md")` | `SKILL.md` § Step 4.5 step 2 |
132
127
  | 23 | `Agent(subagent_type="pr-description-writer", description=..., prompt=<brief>)` | `SKILL.md` § Step 4.5 |
133
128
  | 24 | `Write(".bugteam-final-body.md", <returned body>)` | `SKILL.md` § Step 4.5 step 4 |
134
129
  | 25 | `Bash("gh pr edit 42 -R ... --body-file .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 4 |
135
- | 26 | `Bash("rm .bugteam-final.diff .bugteam-original-body.md .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 5 (lead may add `.bugteam-loop-*.outcomes.xml` in the same or a separate `rm` — reconcile on first real run) |
130
+ | 26 | `Bash("rm .bugteam-final.diff .bugteam-original-body.md .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 5 |
136
131
  | 27 | `Bash("python .../scripts/revoke_project_claude_permissions.py")` | `SKILL.md` § Step 5 |
137
132
 
138
133
  **Pass criteria.**
139
134
  - All Layer A invariants hold.
140
- - Exactly 2 `Agent(name="bugfind"...)` calls, exactly 1 `Agent(name="bugfix"...)` call.
141
- - Exactly 2 bugfind shutdown messages + 1 bugfix shutdown message.
135
+ - Exactly 2 `Agent(name="bugfind-pr42-loop...")` calls, exactly 1 `Agent(name="bugfix-pr42-loop...")` call.
142
136
  - Final report contains `/bugteam exit: converged` and `Loops: 2`.
143
137
 
144
138
  **Process check after first real run.** Compare the observed trace against steps 1–27. Common expected divergences that should not fail the eval:
145
139
  - Extra `Bash("git rev-parse HEAD")` calls the lead inserts for bookkeeping.
146
- - Consolidated `Bash` calls (step 26 may split into two or three calls).
140
+ - Consolidated `Bash` calls (step 25 may split into two or three calls).
147
141
  - Extra `Read` calls when the lead re-reads an outcome XML to quote specific findings.
148
142
  - Reordered but still-Layer-A-compliant cleanup sequencing.
149
143
 
@@ -151,40 +145,40 @@ Patch this table to match observation and annotate each correction.
151
145
 
152
146
  ---
153
147
 
154
- ## Eval 6 — Stuck path: fix teammate produces no commit
148
+ ## Eval 6 — Stuck path: fix subagent produces no commit
155
149
 
156
- **Scenario.** Loop 1 audit finds 2 P1 bugs; the mock fix teammate reports both as `could_not_address` (no commit created).
150
+ **Scenario.** Loop 1 audit finds 2 P1 bugs; the mock fix subagent reports both as `could_not_address` (no commit created).
157
151
 
158
- **Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-6, I-7, I-10, I-11, I-12. I-8 trivially holds.
152
+ **Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-6, I-8, I-9, I-10. I-6 trivially holds.
159
153
 
160
- **Layer B predicted trace.** Identical to Eval 5 steps 1–14 with this divergence:
154
+ **Layer B predicted trace.** Identical to Eval 5 steps 1–12 with this divergence:
161
155
  - Step 11 bugfix outcome XML marks every finding `status="could_not_address"`.
162
156
  - Step 12 `Bash("git rev-parse HEAD")` returns the pre-fix SHA unchanged.
163
- - Skill sets exit reason = `stuck`, skips loop 2, and falls through to `TeamDelete()`.
157
+ - Skill sets exit reason = `stuck`, skips loop 2, and falls through to `rmtree`.
164
158
 
165
159
  **Pass criteria.**
166
160
  - Loop count stops at 1.
167
161
  - Final report contains `/bugteam exit: stuck` and names the two unresolved findings.
168
- - Steps 19–27 fire despite the stuck exit — I-2 and I-11 enforce this.
162
+ - Steps 19–26 fire despite the stuck exit — I-2 and I-9 enforce this.
169
163
 
170
164
  ---
171
165
 
172
166
  ## Eval 7 — Cap reached: 10 loops, no convergence
173
167
 
174
- **Scenario.** Mock audit returns one P2 finding every loop. Mock fix teammate always commits but never clears the finding.
168
+ **Scenario.** Mock audit returns one P2 finding every loop. Mock fix subagent always commits but never clears the finding.
175
169
 
176
- **Layer A invariants.** All of I-1 through I-12.
170
+ **Layer A invariants.** All of I-1 through I-10.
177
171
 
178
172
  **Layer B predicted behavior.**
179
- - Loops 1–3: single `Agent(name="bugfind")` per loop.
180
- - Loops 4–10: three parallel `Agent(name="bugfind-loop-<N>-a/b/c")` in a single assistant message per loop, followed by two parallel `-b`/`-c` shutdowns and one `-a` shutdown.
181
- - Each loop produces one `Agent(name="bugfix")` and its matching shutdown.
173
+ - Loops 1–3: single `Agent(name="bugfind-pr<N>-loop<L>", run_in_background=true)` per loop.
174
+ - Loops 4–10: eleven parallel `Agent(name="bugfind-pr<N>-loop<L>-[a..k]", run_in_background=true)` in a single assistant message per loop (10 haiku + 1 opus validator); lead awaits the validator notification.
175
+ - Each loop produces one `Agent(name="bugfix-pr<N>-loop<L>", run_in_background=true)`.
182
176
  - Exactly 10 audit phases, exactly 10 fix phases.
183
- - Steps 19–27 from Eval 5 fire at teardown.
177
+ - Steps 19–26 from Eval 5 fire at teardown.
184
178
 
185
179
  **Pass criteria.**
186
- - I-8 holds: exactly 10 audit phases.
187
- - I-9 holds: loops 4–10 each emit three audit `Agent` calls in a single assistant message.
180
+ - I-6 holds: exactly 10 audit phases.
181
+ - I-7 holds: loops 4–10 each emit eleven audit `Agent` calls in a single assistant message.
188
182
  - Final report contains `/bugteam exit: cap reached` and the remaining bug count.
189
183
 
190
184
  **Process check.** The distinct `Agent(name=...)` audit-call count is a prediction. On the first real run, record the exact count and rewrite the formula here.
@@ -195,12 +189,12 @@ Patch this table to match observation and annotate each correction.
195
189
 
196
190
  **Scenario.** Loop 1 audit returns zero findings.
197
191
 
198
- **Layer A invariants.** I-1 through I-7, I-10, I-11, I-12.
192
+ **Layer A invariants.** I-1, I-2, I-3, I-4, I-5, I-6, I-8, I-9, I-10.
199
193
 
200
- **Layer B predicted trace.** Eval 5 steps 1–9 and 19–27 only — no FIX phase because zero findings means the skill exits the loop at `last_action == "audited"` and `last_findings.total == 0`.
194
+ **Layer B predicted trace.** Eval 5 steps 1–8 and 19–26 only — no FIX phase because zero findings means the skill exits the loop at `last_action == "audited"` and `last_findings.total == 0`.
201
195
 
202
196
  **Pass criteria.**
203
- - Exactly 1 `Agent(name="bugfind"...)` call, 0 `Agent(name="bugfix"...)` calls, 1 bugfind shutdown.
197
+ - Exactly 1 `Agent(subagent_type="code-quality-agent", run_in_background=true)` call, 0 fix agent spawns.
204
198
  - Bugfind's outcome XML records zero findings; the per-loop review POST carries body `## /bugteam loop 1 audit: 0P0 / 0P1 / 0P2 → clean`.
205
199
  - Step 4.5 and Step 5 still fire.
206
200
 
@@ -212,7 +206,7 @@ Patch this table to match observation and annotate each correction.
212
206
 
213
207
  **Layer A invariants.** Same as Eval 5.
214
208
 
215
- **Layer B predicted teammate-side behavior** (observed via the recorded `gh api ... /reviews` POST payload in the bugfind teammate fixture).
209
+ **Layer B predicted subagent-side behavior** (observed via the recorded `gh api ... /reviews` POST payload in the bugfind subagent fixture).
216
210
  - `comments[]` length in the POST body = 2 (anchored findings only).
217
211
  - Review body contains a `### Findings without a diff anchor` section listing the third finding.
218
212
  - Bugfix outcome XML marks all 3 findings with a `reply_comment_url`; the unanchored finding's `used_fallback="true"` and `finding_comment_url` equals the parent review URL.
@@ -242,9 +236,9 @@ Patch this table to match observation and annotate each correction.
242
236
  - Bugfix teammate outcome XML marks every finding `status="hook_blocked"` with populated `<hook_output>`.
243
237
  - Bugfix teammate posts `Hook blocked the fix commit: <one-line summary>` to each finding comment.
244
238
  - Lead's `Bash("git rev-parse HEAD")` after fix detects no SHA change → exit reason `stuck`.
245
- - Steps 19–27 from Eval 5 fire at teardown.
239
+ - Steps 19–26 from Eval 5 fire at teardown.
246
240
 
247
- **Pass criteria.** Layer A I-2 and I-11 hold. Final report contains `/bugteam exit: stuck` and surfaces the hook_output summary.
241
+ **Pass criteria.** Layer A I-2 and I-9 hold. Final report contains `/bugteam exit: stuck` and surfaces the hook_output summary.
248
242
 
249
243
  ---
250
244
 
@@ -252,13 +246,13 @@ Patch this table to match observation and annotate each correction.
252
246
 
253
247
  **Scenario.** The available-agents list does not include `pr-description-writer` but does include `general-purpose`.
254
248
 
255
- **Layer B predicted trace.** Eval 5 steps 1–22 identical; step 23 becomes:
249
+ **Layer B predicted trace.** Eval 5 steps 1–21 identical; step 22 becomes:
256
250
 
257
251
  ```
258
252
  Agent(subagent_type="general-purpose", description="Rewrite PR 42 body from cumulative diff", prompt=<same brief>)
259
253
  ```
260
254
 
261
- Steps 2427 follow normally.
255
+ Steps 2326 follow normally.
262
256
 
263
257
  **Pass criteria.** Exactly 1 `Agent(subagent_type="general-purpose", ...)` call for the description rewrite. `gh pr edit` fires. Final report carries no Step 4.5 skip warning.
264
258
 
@@ -268,7 +262,7 @@ Steps 24–27 follow normally.
268
262
 
269
263
  **Scenario.** Neither `pr-description-writer` nor `general-purpose` appear in the available-agents list.
270
264
 
271
- **Layer B predicted trace.** Eval 5 steps 1–22, then skip steps 2325. Steps 26–27 still fire.
265
+ **Layer B predicted trace.** Eval 5 steps 1–21, then skip steps 2224. Steps 25–26 still fire.
272
266
 
273
267
  **Pass criteria.**
274
268
  - Zero `Agent` calls for PR description rewriting.
@@ -280,50 +274,17 @@ Steps 24–27 follow normally.
280
274
 
281
275
  ## Eval 14 — Permissions revoke on error path
282
276
 
283
- **Scenario.** Bugfind teammate refuses `shutdown_request` during loop 1, returning `{type: "shutdown_response", approve: false}`.
277
+ **Scenario.** Bugfind subagent completes but writes no outcomes XML (background subagent completes notification arrives with no file at the expected path).
284
278
 
285
- **Layer B predicted trace.** Eval 5 steps 1–8, then:
286
- - Step 9 `SendMessage(to="bugfind", ...)` receives `approve: false`.
287
- - Skill sets exit reason = `error: bugfind teammate refused shutdown`.
288
- - Steps 19–27 all fire (Layer A I-2 and I-11 mandate this).
279
+ **Layer B predicted trace.** Eval 5 steps 1–7, then:
280
+ - Lead awaits notification and calls `Read(".bugteam-pr42-loop1.outcomes.xml")` file missing.
281
+ - Skill sets exit reason = `error: outcomes XML missing after bugfind loop 1`.
282
+ - Teardown (steps 19–26 from Eval 5) all fire.
289
283
 
290
284
  **Pass criteria.** Final report surfaces the error and the loop number. Revoke fires despite the error.
291
285
 
292
286
  ---
293
287
 
294
- ## Eval 15 — Orchestrator-only `TeamCreate` (supplementary work path)
295
-
296
- **Scenario.** A loop 1 audit surfaces a P0/P1 finding whose root cause sits in adjacent infrastructure the lead needs to fix before the cycle can converge (e.g., a broken CI hook, a misbehaving lint config, a wrong GitHub API shape in a teammate's own dependency). The lead recognizes supplementary work is needed and decides to spawn additional teammates to handle it.
297
-
298
- **Layer A invariants.** I-1, I-3, I-4, I-5, I-6, I-7, I-11, I-12, **I-13 (primary focus)**.
299
-
300
- **Layer B predicted trace.** Eval 5 steps 1–9 identical. At step 10 (where a standard cycle spawns `bugfix`), the lead decides the finding requires adjacent infrastructure work first. Rather than call `TeamCreate` for a new team, the lead spawns a supplementary teammate into the existing team:
301
-
302
- ```
303
- Agent(
304
- subagent_type="code-quality-agent",
305
- name="bugfind-adjacent",
306
- team_name="<lead_team_name>", // same team as bugfind/bugfix
307
- model="opus",
308
- description="Supplementary audit of adjacent infrastructure",
309
- prompt=<brief naming the specific adjacent files + observed symptom>
310
- )
311
- ```
312
-
313
- The adjacent-audit teammate writes its own outcome XML, self-terminates. Lead reads the XML, decides fix strategy, spawns an adjacent-fix teammate into the same team. Cycle eventually returns to the standard `bugfix` spawn for the original finding(s). All spawns pass the same `team_name`.
314
-
315
- **Pass criteria.**
316
- - Layer A I-13 holds: zero `TeamCreate` calls beyond the single one at skill Step 2.
317
- - Every `Agent(...)` call in the session carries `team_name="<lead_team_name>"`. No teammate spawn omits `team_name`.
318
- - If the lead attempts a second `TeamCreate` call, the runtime returns the exact error quoted in I-13's citation; the lead treats this as a signal to spawn a teammate into the existing team instead.
319
- - Working behavior is unchanged from a single-set cycle: grant → TeamCreate (once) → Agent spawns (many, all same team_name) → SendMessage shutdowns as needed → TeamDelete (once) → temp cleanup → Step 4.5 → revoke.
320
-
321
- **Failure mode.** A second `TeamCreate` call in the session, or any `Agent(...)` call without `team_name` once the team exists. Either signals the orchestrator-only invariant has been violated and the clean-room/team semantics are broken.
322
-
323
- **Observation source for this eval.** This eval was added after a real /bugteam run on PR #184 where the lead discovered a broken hook mid-cycle and initially spawned a standalone subagent (no `team_name`) for the adjacent audit — a direct violation. The runtime had already prevented a second `TeamCreate` with the error quoted in I-13. The eval codifies the correct path (spawn as teammate into existing team) so future runs do not repeat the violation.
324
-
325
- ---
326
-
327
288
  ## Iteration protocol
328
289
 
329
290
  1. **Cycle 0 — Reconcile predictions with reality.** On the first real run, diff every Layer B predicted trace against the observed trace. Patch this file to match reality and annotate each correction with a reason.
@@ -344,5 +305,5 @@ A minimal Python harness under `packages/claude-dev-env/skills/bugteam/evals/`:
344
305
  ## Open research items flagged during this pass
345
306
 
346
307
  1. **GitHub REST review-POST payload shape.** Eval 9 and Eval 10 depend on the exact body shape of `POST /pulls/<number>/reviews`. The `jq -n --rawfile ... --argjson ... | gh api ... --input -` fence lives in `SKILL.md` § Step 2.5 (**Review POST**); expanded copy in `reference/github-pr-reviews.md` § **Per-loop review**. Before running Eval 9/10 for real, fetch the current GitHub REST reference to confirm the request schema (fields `commit_id`, `event`, `body`, `comments[]`) and the multi-line anchor `{path, start_line, start_side, line, side, body}` shape still apply. Record the confirmed version and URL here.
347
- 2. **`SendMessage` shutdown origination — RESOLVED.** `SendMessage` tool docs include the line "Don't originate `shutdown_request` unless asked." `TeamCreate` tool docs explicitly direct the lead to originate `{type: "shutdown_request"}` for teammate cleanup. Real-run observation (loop 1 of eval run 2026-04-18) resolved the contradiction: teammates self-terminate when their task is complete — the `Agent` call returns and the teammate's session ends without any `SendMessage`. The cycle proceeded correctly without the lead ever needing to originate a `shutdown_request`. `SKILL.md` § AUDIT / FIX actions document self-termination as the expected path and lead-originated `SendMessage(shutdown_request)` as a fallback; `reference/audit-and-teammates.md` carries the longer shutdown narrative. Layer A **I-4** encodes “no orphaned teammates,” not “always send SendMessage.”
308
+ 2. **Background subagent completion signal.** Real-run observation (loop 1 of eval run 2026-04-18) confirmed: background subagents self-terminate when their task is complete — the background-completion notification arrives and the lead reads the outcomes XML. No shutdown handshake required. `SKILL.md` § AUDIT / FIX actions document this flow. Layer A **I-4** encodes “fresh subagent per loop.”
348
309
  3. **Model override redundancy.** `clean-coder` pins `model: opus` in its agent definition, while `code-quality-agent` currently uses `model: inherit`. The explicit `model="opus"` in every spawn is insurance against frontmatter drift; on the first real run, confirm the resolved model is `claude-opus-4-7` and that effort defaults to `xhigh` (Claude Code shows the active effort next to the spinner per the model-config docs). If a teammate's frontmatter ever pins a non-default `effort:` value, that frontmatter overrides the model default for that subagent (https://code.claude.com/docs/en/model-config — *"Frontmatter effort applies when that skill or subagent is active, overriding the session level but not the environment variable."*).
@@ -4,10 +4,8 @@ Expanded material that used to live inline in `SKILL.md`. Load a file when the o
4
4
 
5
5
  | File | Domain |
6
6
  |------|--------|
7
- | [`workflow-path-a-orchestrated-teams.md`](workflow-path-a-orchestrated-teams.md) | **Path A only** `TeamCreate`, `Agent` + `team_name`, `SendMessage`, `TeamDelete`, who posts Step 2.5 |
8
- | [`workflow-path-b-task-harness.md`](workflow-path-b-task-harness.md) | **Path B only** — `Task` harness (no `TeamCreate` / `TeamDelete`, lead Step 2.5 `gh api`, Step 4 omissions) |
9
- | [`design-rationale.md`](design-rationale.md) | Why agent teams (clean-room), table-of-contents habit, when `/bugteam` applies, refusal reasons |
10
- | [`team-setup.md`](team-setup.md) | Permissions grant (`CLAUDE_SKILL_DIR`), PR scope, `TeamCreate`, team name / sanitization / temp dir / roles / loop state |
7
+ | [`design-rationale.md`](design-rationale.md) | Why clean-room subagents, table-of-contents habit, when `/bugteam` applies, refusal reasons |
8
+ | [`team-setup.md`](team-setup.md) | Permissions grant (`CLAUDE_SKILL_DIR`), PR scope, run name / temp dir / loop state |
11
9
  | [`github-pr-reviews.md`](github-pr-reviews.md) | Per-loop reviews, `jq` + `gh api` payloads, anchors, fallbacks, REST endpoints |
12
10
  | [`audit-and-teammates.md`](audit-and-teammates.md) | Pre-audit gate, full cycle numbering, AUDIT and FIX actions, parallel auditors |
13
11
  | [`teardown-publish-permissions.md`](teardown-publish-permissions.md) | Utility scripts note, teardown, PR description rewrite, revoke, final report |
@@ -24,11 +24,11 @@ Repeat until an exit condition fires.
24
24
  2. If exit code **0** → continue to step 2.5 (AUDIT spawn) below.
25
25
  3. If exit code **non-zero** → spawn a new **clean-coder** teammate — **standards-fix pass** — with instructions: read the script’s stderr, edit the repo until a **re-run** of the **same** gate command exits **0**, then one commit, `git push`, shutdown. Repeat standards-fix spawns until the gate exits **0** or **5** failed gate rounds (each round = one teammate session after a non-zero gate). If still non-zero after 5 rounds → exit reason = `error: code rules gate failed pre-audit`.
26
26
  4. After gate exit **0**, increment `loop_count`. If `loop_count > 10`, exit reason = `cap reached` (counts **audits**, not standards-only rounds).
27
- 5. Execute **AUDIT action** (spawn bugfind). Print progress: `Loop <N> audit: ...`
27
+ 5. Execute **AUDIT action** (spawn bugfind). Print progress: `Loop <L> audit: ...`
28
28
 
29
29
  3. **FIX path** (when `last_action == "audited"` and `last_findings.total > 0`):
30
30
  1. Increment `loop_count`. If `loop_count > 10`, exit reason = `cap reached`.
31
- 2. Execute **FIX action** (spawn bugfix clean-coder for audit findings). Print: `Loop <N> fix: commit ...`
31
+ 2. Execute **FIX action** (spawn bugfix clean-coder for audit findings). Print: `Loop <L> fix: commit ...`
32
32
  3. Set `last_action = "fixed"`, update `audit_log`, loop to step 1 (next iteration hits **pre-audit path** before the next AUDIT).
33
33
 
34
34
  4. After **AUDIT**, update `last_action`, `last_findings`, `audit_log`; print the audit progress line if not already printed.
@@ -39,62 +39,45 @@ Repeat until an exit condition fires.
39
39
 
40
40
  ## AUDIT action (clean-room teammate, fresh per loop)
41
41
 
42
- Capture a fresh PR diff for this loop into the per-team scoped directory so concurrent `/bugteam` runs keep patches isolated. Use the literal `<team_temp_dir>` resolved once in Step 2 — Claude resolves the absolute path; every shell receives the same literal value.
42
+ Capture a fresh PR diff for this loop into the per-PR scoped directory so concurrent `/bugteam` runs keep patches isolated. Use the literal `<run_temp_dir>` resolved once in Step 2 — Claude resolves the absolute path; every shell receives the same literal value.
43
43
 
44
44
  Commands and `Agent(...)` shape: `SKILL.md`.
45
45
 
46
- `<team_temp_dir>` includes the sanitized `team_name` and timestamp; `team_name` is already prefixed with `bugteam-`. Claude resolves `Path(tempfile.gettempdir()) / team_name` once and passes that absolute path to every shell. `tempfile.gettempdir()` honors `TMPDIR`, `TEMP`, `TMP` and falls back to the OS temp directory, so the same approach works on macOS, Linux, Windows cmd.exe, and PowerShell.
46
+ `<run_temp_dir>` includes the sanitized `team_name` and timestamp; `team_name` is already prefixed with `bugteam-`. Claude resolves `Path(tempfile.gettempdir()) / team_name` once and passes that absolute path to every shell. `tempfile.gettempdir()` honors `TMPDIR`, `TEMP`, `TMP` and falls back to the OS temp directory, so the same approach works on macOS, Linux, Windows cmd.exe, and PowerShell.
47
47
 
48
48
  Each loop calls `Agent` again with a fresh invocation so the teammate starts with its own context window. Doc line on lead history: [`../sources.md`](../sources.md).
49
49
 
50
50
  See [`../PROMPTS.md`](../PROMPTS.md) for AUDIT spawn-prompt XML and bugfind outcome schema. Substitute placeholders (`repo`, `branch`, `base_branch`, `pr_url`, `loop`, `diff_path`) into the `prompt` argument.
51
51
 
52
- After the teammate returns, the lead reads `.bugteam-loop-<N>.outcomes.xml` with the `Read` tool, parses it, and populates `loop_comment_index` from `<finding>` elements.
52
+ After the teammate returns, the lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` from the worktree directory with the `Read` tool, parses it, and populates `loop_comment_index` from `<finding>` elements.
53
53
 
54
54
  ### Shutdown (bugfind)
55
55
 
56
- **Expected path — self-termination:** Teammates often self-terminate when complete — the `Agent` call returns and the session ends. Then no `SendMessage` is needed.
57
-
58
- **Fallback — lead-initiated shutdown:** If the teammate still appears active after `Agent` returns, send:
59
-
60
- ```
61
- SendMessage(
62
- to="bugfind",
63
- message={
64
- "type": "shutdown_request",
65
- "reason": "audit loop <N> complete; outcome XML captured"
66
- }
67
- )
68
- ```
69
-
70
- The teammate replies with `{type: "shutdown_response", approve: true}`. If `approve` is `false`, exit reason = `error: bugfind teammate refused shutdown` → Step 4 teardown then Step 5 revoke.
56
+ Teammates self-terminate when complete — the background-completion notification arrives and the lead reads the outcomes XML. If the notification does not arrive within the lead timeout (120s), treat as a hard blocker and abort the loop.
71
57
 
72
58
  `last_action = "audited"`. Append audit metadata to `audit_log`.
73
59
 
74
60
  ### Parallel auditors (`loop_count >= 4`)
75
61
 
76
- The pre-audit gate must pass immediately before this step. After three full audit/fix rounds without convergence, issue three `Agent` calls in **one** assistant message so they run in parallel:
62
+ The pre-audit gate must pass immediately before this step. After three full audit/fix rounds without convergence, issue eleven `Agent` calls in **one** assistant message so they run in parallel:
77
63
 
78
64
  ```
79
- Agent(subagent_type="code-quality-agent", name="bugfind-loop-<N>-a", team_name="<team_name>", model="opus", description="Bugfind audit loop <N> variant a", prompt="<audit XML; write outcome to .bugteam-loop-<N>.outcomes.xml; post the per-loop review; read and merge b/c outcomes from <team_temp_dir>/loop-<N>-b.outcomes.xml and <team_temp_dir>/loop-<N>-c.outcomes.xml>")
80
- Agent(subagent_type="code-quality-agent", name="bugfind-loop-<N>-b", team_name="<team_name>", model="opus", description="Bugfind audit loop <N> variant b", prompt="<audit XML; write outcome to <team_temp_dir>/loop-<N>-b.outcomes.xml; skip PR posting>")
81
- Agent(subagent_type="code-quality-agent", name="bugfind-loop-<N>-c", team_name="<team_name>", model="opus", description="Bugfind audit loop <N> variant c", prompt="<audit XML; write outcome to <team_temp_dir>/loop-<N>-c.outcomes.xml; skip PR posting>")
65
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-a", team_name="<team_name>", model="opus", run_in_background=true, description="Bugfind audit PR <N> loop <L> validator", prompt="<audit XML; poll for all 10 sibling XMLs at <run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml through <run_temp_dir>/pr-<N>/loop-<L>-k.outcomes.xml (60s timeout, 2s interval); on timeout: log diagnostics entry, proceed with validated findings from available XMLs; validate each finding: file exists, line in bounds, excerpt matches claimed line, category A-J, severity P0/P1/P2; quarantine hallucinated findings to <run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json under validator_rejected; de-dup by (file, line, category), max severity wins, keep longest description on conflict; re-id as loop<L>-<K>; write <worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml; post review>")
66
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-b", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant b", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml; skip PR posting>")
67
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-c", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant c", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-c.outcomes.xml; skip PR posting>")
68
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-d", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant d", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-d.outcomes.xml; skip PR posting>")
69
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-e", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant e", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-e.outcomes.xml; skip PR posting>")
70
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-f", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant f", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-f.outcomes.xml; skip PR posting>")
71
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-g", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant g", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-g.outcomes.xml; skip PR posting>")
72
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-h", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant h", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-h.outcomes.xml; skip PR posting>")
73
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-i", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant i", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-i.outcomes.xml; skip PR posting>")
74
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-j", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant j", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-j.outcomes.xml; skip PR posting>")
75
+ Agent(subagent_type="code-quality-agent", name="bugfind-pr<N>-loop<L>-k", team_name="<team_name>", model="haiku", run_in_background=true, description="Bugfind audit PR <N> loop <L> variant k", prompt="<audit XML; write outcome to <run_temp_dir>/pr-<N>/loop-<L>-k.outcomes.xml; skip PR posting>")
82
76
  ```
83
77
 
84
- Teammate `-a` is the post-owner: read all three outcome XML files at explicit absolute paths (`.bugteam-loop-<N>.outcomes.xml` in cwd, plus sibling paths under `<team_temp_dir>`), merge findings by `(file, line, category_letter)` (collapse duplicates, keep longest description and highest severity), re-assign merged IDs as `loopN-K`, post the single per-loop review. The `-a` prompt must embed sibling paths as literal absolutes so `Read` works without discovery.
85
-
86
- Shutdown order: parallel `SendMessage` to `b` and `c`, then `a`:
78
+ Teammate `-a` is the opus validator: polls for all 10 sibling XMLs at explicit absolute paths under `<run_temp_dir>/pr-<N>` (60s timeout, 2s interval; on timeout: log diagnostics entry, proceed with validated findings from available XMLs), then validates each finding — file exists, line in bounds, excerpt matches claimed line, category is A–J, severity is P0/P1/P2. Hallucinated findings are quarantined to `<run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json` under `validator_rejected`. Valid findings are de-duplicated by `(file, line, category)` (max severity wins, keep longest description on conflict) and re-assigned merged IDs as `loop<L>-<K>`. The `-a` prompt must embed sibling paths as literal absolutes so `Read` works without discovery.
87
79
 
88
- ```
89
- SendMessage(to="bugfind-loop-<N>-b", message={"type": "shutdown_request", "reason": "variant XML captured"})
90
- SendMessage(to="bugfind-loop-<N>-c", message={"type": "shutdown_request", "reason": "variant XML captured"})
91
- ```
92
-
93
- then
94
-
95
- ```
96
- SendMessage(to="bugfind-loop-<N>-a", message={"type": "shutdown_request", "reason": "merged review posted"})
97
- ```
80
+ All subagents self-terminate via background completion. The lead awaits only the validator (-a) notification (120s timeout). Missing notification → hard blocker.
98
81
 
99
82
  ## FIX action (fresh teammate)
100
83
 
@@ -106,17 +89,7 @@ After replies, the teammate writes outcome XML (schema in [`../PROMPTS.md`](../P
106
89
 
107
90
  ### Shutdown (bugfix)
108
91
 
109
- Same self-termination vs `SendMessage` split as bugfind. Fallback message:
110
-
111
- ```
112
- SendMessage(
113
- to="bugfix",
114
- message={
115
- "type": "shutdown_request",
116
- "reason": "fix loop <N> complete; commit <sha7> pushed"
117
- }
118
- )
119
- ```
92
+ Same self-termination model as bugfind. Missing notification → hard blocker.
120
93
 
121
94
  `approve: false` → `error: bugfix teammate refused shutdown` → Step 4 then 5.
122
95
 
@@ -8,7 +8,7 @@ Shared output schema and audit-loop contract used by `/bugteam`, `/qbug`, `/find
8
8
  - Adversarial second pass
9
9
  - Haiku secondary auditor
10
10
  - Post-fix self-audit
11
- - Persistence (loop-N-audit.json, loop-N-diagnostics.json)
11
+ - Persistence (loop-<L>-audit.json, loop-<L>-diagnostics.json)
12
12
 
13
13
  ## Finding schema
14
14
 
@@ -18,7 +18,7 @@ Each finding an audit produces MUST be one of exactly two shapes.
18
18
 
19
19
  ```json
20
20
  {
21
- "id": "loop<N>-<K>",
21
+ "id": "loop<L>-<K>",
22
22
  "file": "path/relative/to/repo/root.py",
23
23
  "line": 123,
24
24
  "category": "A | B | C | D | E | F | G | H | I | J",
@@ -29,7 +29,7 @@ Each finding an audit produces MUST be one of exactly two shapes.
29
29
  }
30
30
  ```
31
31
 
32
- `id` is `loop<N>-<K>` where `N` is the loop counter (1-based) and `K` is the 1-based index within the loop. For `/findbugs` which runs once, use `find<K>`.
32
+ `id` is `loop<L>-<K>` where `L` is the loop counter (1-based) and `K` is the 1-based index within the loop. For `/findbugs` which runs once, use `find<K>`.
33
33
 
34
34
  ### Shape B — structured proof-of-absence
35
35
 
@@ -105,9 +105,9 @@ Merge rules:
105
105
  - **Unique-to-Haiku findings**: added to the primary set with Haiku's severity and source annotation.
106
106
  - **Unique-to-primary findings**: kept as-is.
107
107
  - **Zero Haiku findings**: primary set trusted; proceed.
108
- - **Malformed or non-parseable Haiku output**: lead trusts the primary set, logs the event in `loop-<N>-diagnostics.json` under `haiku_findings` as `[{"parse_error": "<message>"}]`.
108
+ - **Malformed or non-parseable Haiku output**: lead trusts the primary set, logs the event in `loop-<L>-diagnostics.json` under `haiku_findings` as `[{"parse_error": "<message>"}]`.
109
109
 
110
- For multi-subagent skills (`/bugteam`) the parallel-auditors pattern in [`audit-and-teammates.md`](audit-and-teammates.md) already provides cross-model coverage via the three variant teammates.
110
+ For multi-subagent skills (`/bugteam`) the parallel-auditors pattern in [`audit-and-teammates.md`](audit-and-teammates.md) already provides cross-model coverage via 10 haiku auditors + opus validator.
111
111
 
112
112
  ## Post-fix self-audit
113
113
 
@@ -131,7 +131,7 @@ Sequence:
131
131
 
132
132
  Every audit loop writes two JSON files under the skill's scoped temp directory (resolved via `tempfile.gettempdir()`):
133
133
 
134
- ### `loop-<N>-audit.json`
134
+ ### `loop-<L>-audit.json`
135
135
 
136
136
  ```json
137
137
  {
@@ -141,7 +141,7 @@ Every audit loop writes two JSON files under the skill's scoped temp directory (
141
141
  }
142
142
  ```
143
143
 
144
- ### `loop-<N>-diagnostics.json`
144
+ ### `loop-<L>-diagnostics.json`
145
145
 
146
146
  ```json
147
147
  {
@@ -2,13 +2,9 @@
2
2
 
3
3
  ## Core principle (expanded)
4
4
 
5
- A Claude Code **agent team** runs the audit-and-fix loop until convergence. The bugfind teammate audits clean-room (own context window, no chat history); the bugfix teammate addresses each audit's findings; both spawn fresh per loop. A 10-loop hard cap prevents runaway cost. Project permissions are granted at session start and revoked at session end.
5
+ Background subagents (`Agent(..., run_in_background=true)`) run the audit-and-fix loop until convergence. The bugfind subagent audits clean-room (own context window, no chat history); the bugfix subagent addresses each audits findings; both spawn fresh per loop with no shared state. A 10-loop hard cap prevents runaway cost. Project permissions are granted at session start and revoked at session end.
6
6
 
7
- Teammate isolation versus subagents returning into the lead’s context is the clean-room property. Verbatim Anthropic quotes and URLs: [`../sources.md`](../sources.md).
8
-
9
- ## Why not parallel subagents here
10
-
11
- Subagents return their results into the lead’s context, which accumulates across loops. Agent-team teammates are independent sessions with their own context windows and do not pollute the lead. The lead can shut down and respawn each loop so every audit starts fresh. For `/bugteam`, the independent-context property is required; parallel subagents fail the clean-room requirement. Supporting quotes: [`../sources.md`](../sources.md) (subagents vs agent teams).
7
+ Fresh-spawn clean-room isolation: each `Agent` call creates a new subagent with its own context window and no access to prior conversation. After the subagent writes its outcome XML and self-terminates, the lead reads the file. Results never accumulate in the lead’s context beyond the XML artifact. Verbatim Anthropic quotes and URLs: [`../sources.md`](../sources.md).
12
8
 
13
9
  ## Table of contents in `SKILL.md`
14
10
 
@@ -20,9 +16,8 @@ The user wants automated convergence on a clean PR without babysitting each step
20
16
 
21
17
  ### Refusal reasons (detail)
22
18
 
23
- - **Agent teams off:** Without the feature flag, the workflow cannot run.
24
19
  - **No PR / diff:** There is nothing scoped to audit.
25
- - **Dirty tree:** The fix teammate will commit; uncommitted local work would be mixed into automated commits.
20
+ - **Dirty tree:** The fix subagent will commit; uncommitted local work would be mixed into automated commits.
26
21
  - **Missing subagents:** Both `code-quality-agent` and `clean-coder` must exist in the environment before Step 0.
27
22
 
28
23
  Exact refusal strings remain in `SKILL.md`.