claude-dev-env 1.37.0 → 1.38.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (95) hide show
  1. package/CLAUDE.md +3 -0
  2. package/_shared/pr-loop/audit-contract.md +4 -3
  3. package/_shared/pr-loop/fix-protocol.md +2 -0
  4. package/_shared/pr-loop/gh-payloads.md +38 -37
  5. package/_shared/pr-loop/scripts/README.md +0 -1
  6. package/_shared/pr-loop/scripts/preflight.py +2 -1
  7. package/_shared/pr-loop/scripts/tests/test_code_rules_gate.py +2 -2
  8. package/_shared/pr-loop/scripts/tests/test_preflight.py +22 -0
  9. package/_shared/pr-loop/state-schema.md +10 -10
  10. package/agents/clean-coder.md +4 -0
  11. package/agents/code-quality-agent.md +23 -85
  12. package/agents/groq-coder.md +8 -6
  13. package/hooks/blocking/__init__.py +0 -0
  14. package/hooks/blocking/hedging_language_blocker.py +2 -2
  15. package/hooks/blocking/state_description_blocker.py +243 -0
  16. package/hooks/blocking/tdd_enforcer.py +94 -0
  17. package/hooks/blocking/test_hedging_language_blocker.py +1 -1
  18. package/hooks/blocking/test_state_description_blocker.py +618 -0
  19. package/hooks/blocking/test_tdd_enforcer.py +152 -0
  20. package/hooks/config/state_description_blocker_constants.py +130 -0
  21. package/hooks/hooks.json +10 -0
  22. package/package.json +1 -1
  23. package/rules/gh-paginate.md +4 -50
  24. package/rules/no-historical-clutter.md +57 -0
  25. package/scripts/config/groq_bugteam_config.py +13 -5
  26. package/skills/bugteam/CONSTRAINTS.md +20 -27
  27. package/skills/bugteam/EXAMPLES.md +1 -1
  28. package/skills/bugteam/PROMPTS.md +78 -42
  29. package/skills/bugteam/SKILL.md +76 -63
  30. package/skills/bugteam/SKILL_EVALS.md +12 -12
  31. package/skills/bugteam/reference/audit-and-teammates.md +21 -48
  32. package/skills/bugteam/reference/audit-contract.md +7 -7
  33. package/skills/bugteam/reference/github-pr-reviews.md +31 -31
  34. package/skills/bugteam/reference/team-setup.md +1 -1
  35. package/skills/bugteam/reference/teardown-publish-permissions.md +4 -4
  36. package/skills/copilot-review/SKILL.md +7 -14
  37. package/skills/findbugs/SKILL.md +2 -2
  38. package/skills/fixbugs/SKILL.md +1 -1
  39. package/skills/monitor-open-prs/SKILL.md +6 -6
  40. package/skills/pr-converge/SKILL.md +7 -6
  41. package/skills/pr-converge/reference/convergence-gates.md +46 -44
  42. package/skills/pr-converge/reference/examples.md +4 -4
  43. package/skills/pr-converge/reference/fix-protocol.md +8 -8
  44. package/skills/pr-converge/reference/multi-pr-orchestration.md +10 -10
  45. package/skills/pr-converge/reference/per-tick.md +24 -36
  46. package/skills/pr-converge/reference/stop-conditions.md +7 -7
  47. package/skills/pr-converge/scripts/README.md +65 -117
  48. package/skills/pr-review-responder/EXAMPLES.md +2 -2
  49. package/skills/pr-review-responder/PRINCIPLES.md +2 -8
  50. package/skills/pr-review-responder/README.md +7 -48
  51. package/skills/pr-review-responder/SKILL.md +2 -3
  52. package/skills/pr-review-responder/TESTING.md +8 -65
  53. package/skills/qbug/SKILL.md +10 -16
  54. package/_shared/pr-loop/scripts/config/gh_util_constants.py +0 -31
  55. package/_shared/pr-loop/scripts/gh_util.py +0 -193
  56. package/_shared/pr-loop/scripts/tests/test_gh_util.py +0 -257
  57. package/_shared/pr-loop/scripts/tests/test_gh_util_constants.py +0 -61
  58. package/skills/pr-converge/scripts/check_pr_mergeability.py +0 -78
  59. package/skills/pr-converge/scripts/config/pr_converge_constants.py +0 -118
  60. package/skills/pr-converge/scripts/config/test_pr_converge_constants.py +0 -152
  61. package/skills/pr-converge/scripts/fetch_bugbot_inline_comments.py +0 -70
  62. package/skills/pr-converge/scripts/fetch_bugbot_reviews.py +0 -57
  63. package/skills/pr-converge/scripts/fetch_claude_inline_comments.py +0 -70
  64. package/skills/pr-converge/scripts/fetch_claude_reviews.py +0 -61
  65. package/skills/pr-converge/scripts/fetch_copilot_inline_comments.py +0 -70
  66. package/skills/pr-converge/scripts/fetch_copilot_reviews.py +0 -61
  67. package/skills/pr-converge/scripts/mark_pr_ready.py +0 -54
  68. package/skills/pr-converge/scripts/post-bugbot-run.helpers.ps1 +0 -49
  69. package/skills/pr-converge/scripts/post-bugbot-run.ps1 +0 -33
  70. package/skills/pr-converge/scripts/reply_to_inline_comment.py +0 -84
  71. package/skills/pr-converge/scripts/request_copilot_review.py +0 -71
  72. package/skills/pr-converge/scripts/resolve_pr_head.py +0 -58
  73. package/skills/pr-converge/scripts/review_field_helpers.py +0 -43
  74. package/skills/pr-converge/scripts/reviewer_fetch_core.py +0 -153
  75. package/skills/pr-converge/scripts/reviewer_specs.py +0 -98
  76. package/skills/pr-converge/scripts/test_check_pr_mergeability.py +0 -126
  77. package/skills/pr-converge/scripts/test_fetch_bugbot_inline_comments.py +0 -443
  78. package/skills/pr-converge/scripts/test_fetch_bugbot_reviews.py +0 -299
  79. package/skills/pr-converge/scripts/test_fetch_claude_inline_comments.py +0 -485
  80. package/skills/pr-converge/scripts/test_fetch_claude_reviews.py +0 -368
  81. package/skills/pr-converge/scripts/test_fetch_copilot_inline_comments.py +0 -440
  82. package/skills/pr-converge/scripts/test_fetch_copilot_reviews.py +0 -366
  83. package/skills/pr-converge/scripts/test_mark_pr_ready.py +0 -69
  84. package/skills/pr-converge/scripts/test_post_bugbot_run.py +0 -195
  85. package/skills/pr-converge/scripts/test_reply_to_inline_comment.py +0 -159
  86. package/skills/pr-converge/scripts/test_request_copilot_review.py +0 -101
  87. package/skills/pr-converge/scripts/test_resolve_pr_head.py +0 -79
  88. package/skills/pr-converge/scripts/test_review_field_helpers.py +0 -80
  89. package/skills/pr-converge/scripts/test_reviewer_fetch_core.py +0 -448
  90. package/skills/pr-converge/scripts/test_reviewer_specs.py +0 -107
  91. package/skills/pr-converge/scripts/test_trigger_bugbot.py +0 -139
  92. package/skills/pr-converge/scripts/test_view_pr_context.py +0 -111
  93. package/skills/pr-converge/scripts/trigger_bugbot.py +0 -77
  94. package/skills/pr-converge/scripts/view_pr_context.py +0 -47
  95. package/skills/pr-review-responder/scripts/respond_to_reviews.py +0 -376
@@ -10,21 +10,40 @@ Keep the spawn prompt self-contained: reference only the PR scope, audit rubric,
10
10
  <branch>head ref</branch>
11
11
  <base_branch>base ref</base_branch>
12
12
  <pr_url>full URL</pr_url>
13
- <loop>N</loop>
13
+ <loop>L</loop>
14
14
  <pr_number>N</pr_number>
15
15
  <worktree_path>absolute path from Step 1 per-PR workspace</worktree_path>
16
16
  </context>
17
17
 
18
- cd into `<worktree_path>` before any git, gh, or file operation.
18
+ cd into `<worktree_path>` before any git or file operation.
19
19
 
20
20
  <scope>
21
21
  <diff_path>Absolute path to the per-PR patch file: <run_temp_dir>/pr-<N>/loop-<L>.patch (same path as gh pr diff redirect in AUDIT)</diff_path>
22
22
  <scope_rule>Audit only lines added or modified in the diff. Pre-existing code on untouched lines is out of scope.</scope_rule>
23
+ <changed_files_rule>Build the list of changed file paths from the diff. Open each one with Read and audit cross-file consistency. Read every changed test file and cross-reference test assertions, expected values, and mock setup against the production code's config constants and function signatures. When a test file asserts a value that diverges from config, file a finding under category J.</changed_files_rule>
23
24
  </scope>
24
25
 
25
26
  <bug_categories>
26
- Investigate each category explicitly. For each, return either at least
27
- one finding OR a verified-clean entry with the evidence used to clear it:
27
+ Investigate each of the eleven categories (A–K) explicitly. For each,
28
+ return either at least one finding OR a verified-clean entry with the
29
+ evidence used to clear it. A category is verified-clean only when one
30
+ complete execution path through the changed code has been traced from
31
+ entry to exit. Surface-level scanning is insufficient evidence. The
32
+ evidence field must name (1) the specific function examined, (2) the
33
+ code path traced from entry to exit, and (3) the specific check performed.
34
+ Generic phrases such as "verified clean", "no issues found",
35
+ "pattern appears correct", "looks good", "seems fine", and
36
+ "no problems detected" do not satisfy the verified-clean requirement.
37
+ When evidence contains any of these phrases, the category is not
38
+ verified-clean -- re-audit with a concrete trace.
39
+
40
+ Categories A–K (one-line summary; full rubric and sub-bucket decomposition
41
+ for each is in `packages/claude-dev-env/audit-rubrics/category_rubrics/`;
42
+ ready-to-send Variant C prompts — each with a PR/repo-independent
43
+ generalized skeleton above a `---` separator and a worked example against
44
+ an authentic PR below — are in
45
+ `packages/claude-dev-env/audit-rubrics/prompts/`):
46
+
28
47
  A. API contract verification (signatures, return types, async/await correctness)
29
48
  B. Selector / query / engine compatibility
30
49
  C. Resource cleanup and lifecycle (file handles, connections, processes, locks)
@@ -35,6 +54,10 @@ cd into `<worktree_path>` before any git, gh, or file operation.
35
54
  H. Security boundaries (injection, path traversal, auth bypass, secret leakage)
36
55
  I. Concurrency hazards (race conditions, missing awaits, shared mutable state)
37
56
  J. Magic values and configuration drift
57
+ K. Codebase conflicts — a change updates one site of a pattern but a parallel
58
+ site in unchanged code stays stale, producing contradictory behavior;
59
+ the diff is internally consistent, the bug emerges only against unchanged
60
+ code (canonical example: jl-cmd/claude-code-config PR #397 r3210166636)
38
61
  </bug_categories>
39
62
 
40
63
  <constraints>
@@ -42,51 +65,57 @@ cd into `<worktree_path>` before any git, gh, or file operation.
42
65
  - Cite file:line for every finding.
43
66
  - When the diff alone does not provide enough context to confirm a bug,
44
67
  list it under "Open questions" rather than assert it.
68
+ - For every finding, search `git grep` for all callers of the targeted function. When the obvious fix would silently change behavior for other call paths, include a fix constraint that preserves them.
45
69
  </constraints>
46
70
 
47
71
  <comment_posting>
48
- 1. Audit the diff against the 10 categories above. Buffer the findings
49
- in memory; all posting happens at step 6 once anchors are validated.
50
- 2. Assign each finding a stable finding_id of exactly the form `loopN-K`
51
- where K is 1-based within this loop.
52
- 3. Validate every finding's (file, line) against the captured diff. Split
53
- findings into two buckets: anchored (line is in the diff) and
54
- unanchored (line is not in the diff goes into the review body's
55
- "Findings without a diff anchor" section per Step 2.5).
56
- 4. Build the review body per Step 2.5's review-body shape, filling in the
57
- P0/P1/P2 counts and the unanchored-findings list (if any).
58
- 5. For each anchored finding, write its body to its own temp file:
72
+ Sibling auditors (-b through -k): run only steps 1–2 (audit, assign IDs,
73
+ capture excerpt, validate anchors), then write outcome XML per <output_format> and return.
74
+ Skip steps 3–5 sibling auditors do not post PR reviews.
75
+
76
+ Validator (-a) and single-opus auditors: run all steps below.
77
+
78
+ 1. Audit the diff against the 11 categories above. Buffer the findings
79
+ in memory; all posting happens at step 4 once anchors are validated.
80
+ 2. Assign each finding a stable finding_id of exactly the form `loop<L>-<K>`
81
+ where <K> is 1-based within this loop.
82
+ 3. For each finding, capture a verbatim excerpt from the target file at the cited
83
+ line. Populate the `<excerpt>` element in the outcome XML with it. Validate
84
+ every finding's (file, line) against the captured diff. Split findings into two
85
+ buckets: anchored (line is in the diff) and unanchored (line is not in the diff
86
+ — goes into the review body's "Findings without a diff anchor" section per
87
+ Step 2.5). Format each finding body as:
59
88
 
60
89
  **[severity] one-line title**
61
90
  Category: <letter> (<category name>)
62
91
  <2-3 sentence description with concrete trace>
63
92
 
64
- _From /bugteam audit loop N._
65
-
66
- 6. Post ONE review via Step 2.5's per-loop review CLI shape. Harvest the
67
- parent review `html_url` from the response JSON and the `comments[]`
68
- child entries (each with its own `id` and `html_url`). Match child
69
- entries to anchored findings in index order.
70
- 7. If the review POST itself fails, use Step 2.5's Review POST failure
71
- fallback (single issue comment with full body and all findings inline).
72
- 8. Write every body (review body, each finding body, any fallback body)
73
- to its own temp file. Load each file into the JSON payload via jq's
74
- `--rawfile` or `-Rs`, then pipe the jq output to `gh api ... --input -`
75
- so every body reaches GitHub as file contents inside the JSON payload.
93
+ _From /bugteam audit loop <L>._
94
+
95
+ 4. Post ONE review via `pull_request_review_write(method="create",
96
+ event="COMMENT", body=<review_body>, owner=<O>, repo=<R>,
97
+ pullNumber=<N>, comments=[...])`. See Step 2.5 in SKILL.md for the full
98
+ parameter shape. Harvest the parent review `html_url` from the response
99
+ and the `comments[]` child entries (each with its own `id` and `html_url`).
100
+ Match child entries to anchored findings in index order.
101
+ 5. If the review POST fails, use `add_issue_comment(owner=<O>, repo=<R>,
102
+ issueNumber=<N>, body=<full_text>)` as fallback.
103
+ Body text is passed directly as string parameters to the MCP tool calls
104
+ no temp files, no jq, no shell pipes.
76
105
  </comment_posting>
77
106
 
78
107
  <output_format>
79
- For the primary (-a) auditor: write the outcome XML below to .bugteam-pr<N>-loop<L>.outcomes.xml inside
80
- the PR's worktree directory (<worktree_path>). For sibling auditors (-b/-c): write to <run_temp_dir>/pr-<N>/loop-<L>-{b,c}.outcomes.xml (absolute path passed in prompt). Return only that path on stdout. The schema:
108
+ For the (-a) validator: write the outcome XML below to .bugteam-pr<N>-loop<L>.outcomes.xml inside
109
+ the PR's worktree directory (<worktree_path>). For sibling auditors (-b through -k): write to <run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml (absolute path passed in prompt). Sibling auditors do not post PR reviews; set review_url, finding_comment_id, and finding_comment_url to empty strings, and used_fallback to "false". Omit unanchored findings from sibling output — only the validator handles those. Return only that path on stdout. The schema:
81
110
  </output_format>
82
111
  ```
83
112
 
84
113
  ## AUDIT outcome XML schema (bugfind writes this)
85
114
 
86
115
  ```xml
87
- <bugteam_audit loop="<N>" review_url="<url>">
116
+ <bugteam_audit loop="<L>" review_url="<url>">
88
117
  <finding
89
- finding_id="loop<N>-<index>"
118
+ finding_id="loop<L>-<K>"
90
119
  severity="P0|P1|P2"
91
120
  category="<letter>"
92
121
  file="<path>"
@@ -96,6 +125,7 @@ cd into `<worktree_path>` before any git, gh, or file operation.
96
125
  used_fallback="true|false"
97
126
  >
98
127
  <title>one-line title</title>
128
+ <excerpt>verbatim source line or snippet from the file at the cited line</excerpt>
99
129
  <description>2-3 sentence description with concrete trace</description>
100
130
  </finding>
101
131
  <verified_clean>
@@ -114,17 +144,17 @@ After the teammate writes the XML and returns, the lead reads `.bugteam-pr<N>-lo
114
144
  <branch>head</branch>
115
145
  <base_branch>base</base_branch>
116
146
  <pr_url>url</pr_url>
117
- <loop>N</loop>
147
+ <loop>L</loop>
118
148
  <pr_number>N</pr_number>
119
149
  <worktree_path>absolute path from Step 1 per-PR workspace</worktree_path>
120
150
  </context>
121
151
 
122
- cd into `<worktree_path>` before any git, gh, or file operation.
152
+ cd into `<worktree_path>` before any git or file operation.
123
153
 
124
154
  <bugs_to_fix>
125
155
  [for each P0/P1/P2 finding from last_findings:]
126
156
  <bug
127
- finding_id="loop<N>-<index>"
157
+ finding_id="loop<L>-<K>"
128
158
  severity="P0|P1|P2"
129
159
  file="<path>"
130
160
  line="<int>"
@@ -140,25 +170,28 @@ cd into `<worktree_path>` before any git, gh, or file operation.
140
170
  1. Read each referenced file before editing.
141
171
  2. Apply each fix you can address.
142
172
  3. Run `python -m py_compile` (or language-equivalent) on every modified file.
143
- 4. git add by explicit path, then git commit with a message summarizing the bugs fixed.
173
+ 4. Run the project's test suite and confirm all existing tests pass. If a test fails, diagnose the regression and fix it before committing.
174
+ 5. Read the previous loop's outcome XML (`<worktree_path>/.bugteam-pr<N>-loop<L-1>.outcomes.xml`) and obtain its total finding count. If this is the first loop (L <= 1) or the file does not exist, skip this comparison. Otherwise, re-read each changed file and count any new violations. Compute the post-fix total: previous total minus bugs fixed in this round plus new violations. If the post-fix total exceeds the previous total, flag all new findings as same-loop fix-targets and revise before committing.
175
+ 6. git add by explicit path, then git commit with a message summarizing the bugs fixed.
144
176
  - If the commit fails because a git hook (pre-commit, commit-msg, etc.) blocked it,
145
177
  capture the hook's stderr, write status=hook_blocked for every finding in this loop
146
178
  (the commit was atomic; if it failed, no finding was applied), populate hook_output
147
179
  on each outcome, and return WITHOUT retrying. The lead will treat this loop as no-progress.
148
- 5. git push with a plain fast-forward push (the default, no flag overrides).
149
- 6. For each bug, post a fix reply to its finding_comment_id via the
150
- Step 2.5 reply CLI shape:
180
+ 7. git push with a plain fast-forward push (the default, no flag overrides).
181
+ 8. For each bug, post a fix reply to its finding_comment_id via
182
+ `add_reply_to_pull_request_comment(commentId=<id>, body=<reply_text>,
183
+ owner=<O>, repo=<R>, pullNumber=<N>)`:
151
184
  - "Fixed in <commit_sha>" if the bug was addressed by your commit
152
185
  - "Could not address this loop: <one-line reason>" if you skipped or failed it
153
186
  - "Hook blocked the fix commit: <one-line summary>" if the commit was hook-blocked
154
- Use the Fix reply CLI shape from Step 2.5 (`jq -Rs | gh api .../comments/<id>/replies --input -`). Write every reply body to a temp file first.
155
- 7. Write `.bugteam-pr<N>-loop<L>.outcomes.xml` inside `<worktree_path>` (schema below) and return its path.
187
+ Body text is passed directly as string parameters -- no temp files, no jq, no shell pipes.
188
+ 9. Write `.bugteam-pr<N>-loop<L>.fix-outcomes.xml` inside `<worktree_path>` (schema below) and return its path.
156
189
  </execution>
157
190
 
158
191
  <outcome_xml_schema>
159
- <bugteam_fix loop="<N>" commit_sha="<sha or empty if no commit>">
192
+ <bugteam_fix loop="<L>" commit_sha="<sha or empty if no commit>">
160
193
  <outcome
161
- finding_id="loop<N>-<index>"
194
+ finding_id="loop<L>-<K>"
162
195
  status="fixed|could_not_address|hook_blocked"
163
196
  commit_sha="<sha if fixed, empty otherwise>"
164
197
  reply_comment_id="<id of the reply posted>"
@@ -179,5 +212,8 @@ cd into `<worktree_path>` before any git, gh, or file operation.
179
212
  - git add by explicit path — name each file being staged.
180
213
  - Preserve existing comments on lines you do not modify.
181
214
  - Type hints on every signature you touch.
215
+ - **Narrow scope.** Fix only the exact defect at the specified file:line. No restructuring, no inlining helpers, no renames, no "while I'm here" cleanup.
216
+ - **Preserve helpers.** Do not remove or inline existing helper functions unless the finding explicitly names the helper as the problem.
217
+ - **No regression.** Before committing, re-read each changed file and count any new violations. Compare the post-fix total (previous total minus bugs fixed plus new violations) against the previous loop's total finding count (from `<worktree_path>/.bugteam-pr<N>-loop<L-1>.outcomes.xml`). On the first loop (L <= 1) or when the file does not exist, skip this guard. The post-fix total must be flat or decreased relative to the previous loop. An increase means the fix introduced new bugs — revise before committing. Do not commit a regression.
182
218
  </constraints>
183
219
  ```
@@ -120,9 +120,9 @@ Non-zero → stop. Revoke in Step 5 on every exit path.
120
120
 
121
121
  ### Step 1: Resolve PR scope (once)
122
122
 
123
- Accept one or more PR numbers from the invocation. For each PR, run `gh pr view
124
- --json number,baseRefName,headRefName,url` (falling back to the merge-base diff
125
- path when no PR exists). Capture `all_prs = [{number, owner, repo, baseRef,
123
+ Accept one or more PR numbers from the invocation. For each PR, call
124
+ `pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` (falling back
125
+ to the merge-base diff path when no PR exists). Capture `all_prs = [{number, owner, repo, baseRef,
126
126
  headRef, url}, ...]`. A single-PR invocation produces a one-element list and
127
127
  follows the same downstream rules.
128
128
 
@@ -184,43 +184,36 @@ only PR write before Step 4.5 is the final description rewrite.
184
184
 
185
185
  Order: audit → buffer → validate anchors vs diff → single review POST.
186
186
  Review body states counts; zero findings → still one review, `comments: []`,
187
- body `## /bugteam loop <N> audit: 0P0 / 0P1 / 0P2 → clean`.
187
+ body `## /bugteam loop <L> audit: 0P0 / 0P1 / 0P2 → clean`.
188
188
 
189
- **Payloads:** build JSON with `jq --rawfile` / `-Rs`, pipe to `gh api ...
190
- --input -` (avoids shell-quoting; satisfies `gh-body-backtick-guard`). Write
191
- each markdown body to a temp file first.
189
+ **Payloads:** Use MCP tool calls (see below). Body text with markdown (backticks,
190
+ newlines, quotes) passes through safely as string parameters — no temp files, no
191
+ jq, no shell pipes.
192
192
 
193
193
  **Review POST** (one `comments[]` object per anchored finding; single-line
194
194
  `{path, line, side: "RIGHT", body}`; multi-line add `start_line`, `start_side:
195
195
  "RIGHT"`):
196
196
 
197
197
  ```
198
- jq -n \
199
- --rawfile review_body <tmp_review_body.md> \
200
- --arg commit_id "$(git rev-parse HEAD)" \
201
- --rawfile finding_body_1 <tmp_finding_1.md> \
202
- --arg path_1 "<file_1>" \
203
- --argjson line_1 <line_1> \
204
- [... one finding_body_K / path_K / line_K triple per finding ...] \
205
- '{
206
- commit_id: $commit_id,
207
- event: "COMMENT",
208
- body: $review_body,
209
- comments: [
210
- {path: $path_1, line: $line_1, side: "RIGHT", body: $finding_body_1}
211
- [, ... ]
212
- ]
213
- }' \
214
- | gh api repos/<owner>/<repo>/pulls/<number>/reviews -X POST --input -
198
+ pull_request_review_write(
199
+ method="create",
200
+ event="COMMENT",
201
+ body=<review_body_text>,
202
+ commitID=<head_sha_at_post_time>,
203
+ owner=<owner>, repo=<repo>, pullNumber=<number>,
204
+ comments=[
205
+ {path: <path_1>, line: <line_1>, side: "RIGHT", body: <finding_body_1>}
206
+ [, ... ]
207
+ ]
208
+ )
215
209
  ```
216
210
 
217
- **Fix reply:** `jq -Rs '{body: .}' <tmp_reply.md | gh api
218
- repos/<owner>/<repo>/pulls/<number>/comments/<finding_comment_id>/replies -X
219
- POST --input -`
211
+ **Fix reply:** `add_reply_to_pull_request_comment(commentId=<finding_comment_id>,
212
+ body=<reply_text>, owner=<owner>, repo=<repo>, pullNumber=<number>)`
220
213
 
221
- **Review POST fails:** issue comment fallback: `jq -Rs '{body: .}'
222
- <tmp_fallback.md | gh api repos/<owner>/<repo>/issues/<number>/comments -X POST
223
- --input -`
214
+ **Review POST fails:** issue comment fallback:
215
+ `add_issue_comment(owner=<owner>, repo=<repo>, issueNumber=<number>,
216
+ body=<fallback_text>)`
224
217
 
225
218
  `<head_sha_at_post_time>`: `git rev-parse HEAD` in subagent cwd immediately
226
219
  before POST.
@@ -228,7 +221,7 @@ before POST.
228
221
  **Review body template (`<tmp_review_body.md>`):**
229
222
 
230
223
  ```
231
- ## /bugteam loop <N> audit: <P0>P0 / <P1>P1 / <P2>P2
224
+ ## /bugteam loop <L> audit: <P0>P0 / <P1>P1 / <P2>P2
232
225
 
233
226
  ### Findings without a diff anchor
234
227
  (only if needed)
@@ -263,10 +256,16 @@ and before iteration begins, when `last_action == "fresh"`). A re-invocation of
263
256
  cleaned this HEAD (short-circuit) and otherwise records that prior loops were
264
257
  dirty so the AUDIT runs against the latest diff with that signal in mind:
265
258
 
266
- ```bash
267
- dirty_review_count=0
268
- gh api "repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100" --paginate --slurp \
269
- | jq '[.[][] | select((.body // "") | startswith("## /bugteam loop "))] | sort_by(.submitted_at) | reverse'
259
+ ```python
260
+ dirty_review_count = 0
261
+ all_reviews = pull_request_read(
262
+ method="get_reviews", pullNumber=N, owner=O, repo=R
263
+ )
264
+ prior_reviews = [
265
+ rev for rev in all_reviews
266
+ if rev.get("body", "").startswith("## /bugteam loop ")
267
+ ]
268
+ prior_reviews.sort(key=lambda rev: rev["submitted_at"], reverse=True)
270
269
  ```
271
270
 
272
271
  Iterate from index 0 (most recent) toward older entries:
@@ -313,16 +312,16 @@ Iterate from index 0 (most recent) toward older entries:
313
312
  Lead only; merge-base / diff semantics:
314
313
  [`../../_shared/pr-loop/code-rules-gate.md`][path-code-rules]; shared script
315
314
  inventory: [`../../_shared/pr-loop/scripts/README.md`][path-scripts-readme].
316
- Non-zero → spawn **clean-coder** standards-fix (read stderr, edit, re-run
315
+ Non-zero → spawn **clean-coder** standards-fix (`mode="bypassPermissions"`) (read stderr, edit, re-run
317
316
  **this same** command, one commit, `git push`, shutdown) until exit **0** or
318
317
  **5**
319
318
  failed gate rounds → `error: code rules gate failed pre-audit`. After **0**:
320
319
  `loop_count += 1`; if `loop_count > 10` → `cap reached`. Then **AUDIT**
321
- (bugfind); print `Loop <N> audit: ...`.
320
+ (bugfind); print `Loop <L> audit: ...`.
322
321
 
323
322
  3. **FIX** (`last_action == "audited"` and `last_findings.total > 0`):
324
323
  `loop_count += 1`; if `loop_count > 10` → `cap reached`; **FIX** (bugfix);
325
- print `Loop <N> fix: ...`; `last_action = "fixed"`, update `audit_log`; loop
324
+ print `Loop <L> fix: ...`; `last_action = "fixed"`, update `audit_log`; loop
326
325
  to step 1.
327
326
 
328
327
  4. After **AUDIT**: update `last_action`, `last_findings`, `audit_log`; print
@@ -335,12 +334,10 @@ before the next AUDIT.
335
334
 
336
335
  ### AUDIT action
337
336
 
338
- ```bash
339
- mkdir -p "<run_temp_dir>/pr-<N>"
340
- gh pr diff <N> -R <owner>/<repo> > "<run_temp_dir>/pr-<N>/loop-<L>.patch"
341
- ```
342
-
343
- **Spawn:**
337
+ 1. Create the directory: `mkdir -p "<run_temp_dir>/pr-<N>"`.
338
+ 2. Call `pull_request_read(method="get_diff", pullNumber=N, owner=O, repo=R)`
339
+ to capture the diff text, then write it to
340
+ `"<run_temp_dir>/pr-<N>/loop-<L>.patch"` using the `Write` tool.
344
341
 
345
342
  ```
346
343
  Agent(
@@ -361,18 +358,31 @@ background-completion notification, then reads
361
358
  `last_action = "audited"`; append audit line to `audit_log`.
362
359
 
363
360
  **Parallel auditors (`loop_count >= 4`):** gate passes immediately before;
364
- after three full audit/fix rounds without convergence, issue three `Agent`
365
- calls in one assistant message (`run_in_background=true`): `-a` posts the
366
- review and merges outcomes from `-b`/`-c` (read
367
- `.bugteam-pr<N>-loop<L>.outcomes.xml` plus
368
- `<run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml` and `...-c...`); merge key
369
- `(file, line, category_letter)`; re-id `loopN-K`. `-b`/`-c` write sibling XML
370
- only; prompts must pass literal absolute sibling paths. Output path
371
- contract: `-b`/`-c` write to `<run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml`
372
- and `<run_temp_dir>/pr-<N>/loop-<L>-c.outcomes.xml`; `-a` writes to
373
- `<worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml`.
374
- Lead awaits all three background-completion notifications before merging
375
- outcomes.
361
+ after three full audit/fix rounds without convergence, issue eleven `Agent`
362
+ calls in one assistant message (`run_in_background=true`):
363
+
364
+ - **10 haiku auditors (`-b` through `-k`):** `subagent_type="code-quality-agent"`,
365
+ `model="haiku"`, write sibling XML to
366
+ `<run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml`, skip PR posting.
367
+ Prompts must pass literal absolute sibling paths.
368
+ - **1 opus validator (`-a`):** `subagent_type="code-quality-agent"`,
369
+ `model="opus"`:
370
+ - Polls for all 10 sibling XMLs before proceeding (60s timeout, 2s interval). On timeout: log diagnostics entry, proceed with validated findings from available XMLs, report count in validator output.
371
+ - Validates each finding: file exists, line in bounds, excerpt contains the exact
372
+ text of the cited line, category is A–J, severity is P0/P1/P2.
373
+ - Hallucinated findings → quarantined to `<run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json` under
374
+ `validator_rejected` (added alongside the required diagnostics keys defined in the shared audit contract).
375
+ - De-dups by `(file, line, category)`, max severity wins; on conflict, keep longest description text.
376
+ - Re-ids as `loop<L>-<K>`.
377
+ - Writes `<worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml`, posts review.
378
+
379
+ Lead awaits the opus validator (-a) background-completion notification (120s
380
+ timeout). The validator independently polls all 10 sibling XMLs; the lead does
381
+ not gate on haiku peer completion. On lead timeout: the validator did not post
382
+ a merged review — treat as a hard blocker and abort the loop.
383
+
384
+ The sibling-output paths in [`PROMPTS.md`](PROMPTS.md) must cover the full
385
+ `-b` through `-k` range.
376
386
 
377
387
  ### FIX action
378
388
 
@@ -398,6 +408,8 @@ advanced; `git -C "<run_temp_dir>/pr-<N>/worktree" fetch origin <branch> && git
398
408
  `HEAD`. Unchanged HEAD →
399
409
  `stuck — bugfix subagent could not address findings`.
400
410
 
411
+ **Scope verification.** Run `git diff HEAD~1 --name-only` and compare against the set of files referenced in bugs_to_fix. When the commit touches any file NOT in the bugs_to_fix list, downgrade the outcome to `unverified_fixed` with reason "commit touched unexpected files: <list>".
412
+
401
413
  ### Step 4: Teardown
402
414
 
403
415
  1. For each PR in `all_prs`: `git worktree remove
@@ -418,16 +430,17 @@ else {'onerror': h}))"
418
430
  ### Step 4.5: PR description
419
431
 
420
432
  Lead only; cumulative product narrative (not process). Delegate body to
421
- `pr-description-writer` via `Agent` (else `general-purpose`) so the
422
- mandatory-pr-description hook accepts `gh pr edit`.
433
+ `pr-description-writer` via `Agent` (`mode="bypassPermissions"`) (else `general-purpose`) so the
434
+ mandatory-pr-description hook accepts `update_pull_request`.
423
435
 
424
- 1. `gh pr diff <number> -R <owner>/<repo> > .bugteam-final.diff`
425
- 2. `gh pr view <number> -R <owner>/<repo> --json body --jq .body >
426
- .bugteam-original-body.md`
436
+ 1. `pull_request_read(method="get_diff", pullNumber=N, owner=O, repo=R)` write
437
+ output to `.bugteam-final.diff` with `Write` tool.
438
+ 2. `pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` → extract
439
+ `.body` from response, write to `.bugteam-original-body.md` with `Write` tool.
427
440
  3. Agent brief: paths + branch names; describe merge-ready change from diff;
428
441
  keep curated original sections intact; return markdown body.
429
- 4. Write `.bugteam-final-body.md`; `gh pr edit <number> -R <owner>/<repo>
430
- --body-file .bugteam-final-body.md`
442
+ 4. Write `.bugteam-final-body.md`; `update_pull_request(pullNumber=N, owner=O,
443
+ repo=R, body=<body_text>)`.
431
444
  5. Delete the three temp files.
432
445
 
433
446
  On failure: log in final report; continue to Step 5.
@@ -22,9 +22,9 @@ Each invariant cites the normative section or companion file it derives from. Al
22
22
  | I-2 | `Bash` invoking `scripts/revoke_project_claude_permissions.py` runs exactly once per invocation on every exit path, after teardown. | `SKILL.md` § Step 5 |
23
23
  | I-3 | Orchestration uses `Agent(..., run_in_background=true)` only — no `TeamCreate`, `TeamDelete`, `SendMessage`, or `Task` tool calls. | `SKILL.md` § Step 2; § Step 4 |
24
24
  | I-4 | `Agent` calls are fresh per loop (`run_in_background=true`; new `name` each loop). | `CONSTRAINTS.md` — **Fresh subagent per loop** |
25
- | I-5 | Audit and fix spawns pass `model="opus"` on every `Agent` call. | `SKILL.md` § AUDIT action; § FIX action; `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for both subagents** |
25
+ | I-5 | Audit sibling spawns pass `model="haiku"`; validator and fix spawns pass `model="opus"`. | `SKILL.md` § AUDIT action (parallel auditors); § FIX action; `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for validator and fix subagents** |
26
26
  | I-6 | Loop count ≤ 10 audits. 11th audit never fires. | `SKILL.md` YAML `description` (10-loop cap); § Step 3 (**Pre-audit** / **FIX** increment rules) |
27
- | I-7 | From loop 4 onward without convergence, three parallel `Agent(..., run_in_background=true)` calls in one message for audit. | `SKILL.md` § AUDIT action (**Parallel auditors**) |
27
+ | I-7 | From loop 4 onward without convergence, eleven parallel `Agent(..., run_in_background=true)` calls in one message for audit. | `SKILL.md` § AUDIT action (**Parallel auditors**) |
28
28
  | I-8 | Lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` with the `Read` tool after each audit, before the next action. | `SKILL.md` § AUDIT action |
29
29
  | I-9 | Teardown sequence: `git worktree remove` each PR → `rmtree` `<run_temp_dir>` → Step 4.5 → revoke. | `SKILL.md` § Step 4; § Step 4.5; § Step 5 |
30
30
  | I-10 | The bugfind subagent posts ONE per-loop review; the bugfix subagent posts fix replies. The lead's only PR-write action is the Step 4.5 description rewrite. | `CONSTRAINTS.md` — **Audit/fix comment posting** |
@@ -68,7 +68,7 @@ The harness does not yet exist; this document defines its contract.
68
68
  **Scenario.** Current branch is `main` with no PR and no upstream difference.
69
69
 
70
70
  **Layer B predicted trace.**
71
- 1. `Bash("gh pr view --json ...")` → non-zero exit.
71
+ 1. `pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` → fails / no matching PR.
72
72
  2. `Bash("git merge-base HEAD origin/main")` → empty.
73
73
  3. No grant script.
74
74
 
@@ -103,10 +103,10 @@ The harness does not yet exist; this document defines its contract.
103
103
  | # | Tool call | Source |
104
104
  |---|---|---|
105
105
  | 1 | `Bash("python .../scripts/grant_project_claude_permissions.py")` | `SKILL.md` § Step 0 |
106
- | 2 | `Bash("gh pr view --json number,baseRefName,headRefName,url")` | `SKILL.md` § Step 1 |
106
+ | 2 | `pull_request_read(method="get", pullNumber=42, owner=..., repo=...)` | `SKILL.md` § Step 1 |
107
107
  | 3 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → captures `starting_sha` | `SKILL.md` § Step 2 — **Loop state** block |
108
108
  | 4 | `Bash("mkdir -p <run_temp_dir>/pr-42")` | `SKILL.md` § AUDIT action |
109
- | 5 | `Bash("gh pr diff 42 -R ... > <run_temp_dir>/pr-42/loop-1.patch")` | `SKILL.md` § AUDIT action |
109
+ | 5 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` write to `<run_temp_dir>/pr-42/loop-1.patch` | `SKILL.md` § AUDIT action |
110
110
  | 6 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop1", run_in_background=true, model="opus", description=..., prompt=<audit XML loop 1>)` | `SKILL.md` § AUDIT action |
111
111
  | 7 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
112
112
  | 8 | `Read(".bugteam-pr42-loop1.outcomes.xml")` | `SKILL.md` § AUDIT action |
@@ -116,17 +116,17 @@ The harness does not yet exist; this document defines its contract.
116
116
  | 12 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → verify HEAD advanced | `SKILL.md` § FIX action (**Verify**) |
117
117
  | 13 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" fetch origin <branch>")` → fetch remote state | `SKILL.md` § FIX action (**Verify**) |
118
118
  | 14 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse origin/<branch>")` → confirm matches HEAD | `SKILL.md` § FIX action (**Verify**) |
119
- | 15 | `Bash("gh pr diff 42 -R ... > <run_temp_dir>/pr-42/loop-2.patch")` | `SKILL.md` § AUDIT action |
119
+ | 15 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` write to `<run_temp_dir>/pr-42/loop-2.patch` | `SKILL.md` § AUDIT action |
120
120
  | 16 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop2", run_in_background=true, ...)` (loop 2) | `SKILL.md` § AUDIT action |
121
121
  | 17 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
122
122
  | 18 | `Read(".bugteam-pr42-loop2.outcomes.xml")` — zero findings | `SKILL.md` § AUDIT action |
123
123
  | 19 | `Bash("git worktree remove \"<run_temp_dir>/pr-42/worktree\"")` | `SKILL.md` § Step 4 step 1 |
124
124
  | 20 | `Bash("python -c \"...shutil.rmtree(r'<run_temp_dir>', ...)\"")` | `SKILL.md` § Step 4 step 2 (Windows-safe teardown) |
125
- | 21 | `Bash("gh pr diff 42 -R ... > .bugteam-final.diff")` | `SKILL.md` § Step 4.5 step 1 |
126
- | 22 | `Bash("gh pr view 42 -R ... --json body --jq .body > .bugteam-original-body.md")` | `SKILL.md` § Step 4.5 step 2 |
125
+ | 21 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` write to `.bugteam-final.diff` | `SKILL.md` § Step 4.5 step 1 |
126
+ | 22 | `pull_request_read(method="get", pullNumber=42, owner=..., repo=...)` extract `.body`, write to `.bugteam-original-body.md` | `SKILL.md` § Step 4.5 step 2 |
127
127
  | 23 | `Agent(subagent_type="pr-description-writer", description=..., prompt=<brief>)` | `SKILL.md` § Step 4.5 |
128
128
  | 24 | `Write(".bugteam-final-body.md", <returned body>)` | `SKILL.md` § Step 4.5 step 4 |
129
- | 25 | `Bash("gh pr edit 42 -R ... --body-file .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 4 |
129
+ | 25 | `update_pull_request(pullNumber=42, owner=..., repo=..., body=...)` | `SKILL.md` § Step 4.5 step 4 |
130
130
  | 26 | `Bash("rm .bugteam-final.diff .bugteam-original-body.md .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 5 |
131
131
  | 27 | `Bash("python .../scripts/revoke_project_claude_permissions.py")` | `SKILL.md` § Step 5 |
132
132
 
@@ -171,14 +171,14 @@ Patch this table to match observation and annotate each correction.
171
171
 
172
172
  **Layer B predicted behavior.**
173
173
  - Loops 1–3: single `Agent(name="bugfind-pr<N>-loop<L>", run_in_background=true)` per loop.
174
- - Loops 4–10: three parallel `Agent(name="bugfind-pr<N>-loop<L>-[abc]", run_in_background=true)` in a single assistant message per loop; lead awaits all three notifications then merges outcomes.
174
+ - Loops 4–10: eleven parallel `Agent(name="bugfind-pr<N>-loop<L>-[a..k]", run_in_background=true)` in a single assistant message per loop (10 haiku + 1 opus validator); lead awaits the validator notification.
175
175
  - Each loop produces one `Agent(name="bugfix-pr<N>-loop<L>", run_in_background=true)`.
176
176
  - Exactly 10 audit phases, exactly 10 fix phases.
177
177
  - Steps 19–26 from Eval 5 fire at teardown.
178
178
 
179
179
  **Pass criteria.**
180
180
  - I-6 holds: exactly 10 audit phases.
181
- - I-7 holds: loops 4–10 each emit three audit `Agent` calls in a single assistant message.
181
+ - I-7 holds: loops 4–10 each emit eleven audit `Agent` calls in a single assistant message.
182
182
  - Final report contains `/bugteam exit: cap reached` and the remaining bug count.
183
183
 
184
184
  **Process check.** The distinct `Agent(name=...)` audit-call count is a prediction. On the first real run, record the exact count and rewrite the formula here.
@@ -224,7 +224,7 @@ Patch this table to match observation and annotate each correction.
224
224
  - Every finding's outcome XML carries `used_fallback="true"` and the issue-comment URL as `finding_comment_url`.
225
225
  - Cycle continues to the FIX action without aborting.
226
226
 
227
- **Open item for the real run.** The issue-comments fallback shape is `jq -Rs | gh api .../issues/<number>/comments --input -` (`SKILL.md` § Step 2.5 **Review POST fails**; full narrative in `reference/github-pr-reviews.md` § **Review POST failure fallback**). Before running Eval 10 for real, confirm the teammate obeys this shape — the fixture must assert the endpoint path and the `--input -` pattern.
227
+ **Open item for the real run.** The issue-comments fallback uses `add_issue_comment(owner=..., repo=..., issueNumber=42, body=...)` (`SKILL.md` § Step 2.5 **Review POST fails**; full narrative in `reference/github-pr-reviews.md` § **Review POST failure fallback**). Before running Eval 10 for real, confirm the teammate obeys this shape — the fixture must assert the `add_issue_comment` tool call.
228
228
 
229
229
  ---
230
230