npm - claude-dev-env - Versions diffs - 1.37.0 → 1.38.0 - Mend

claude-dev-env 1.37.0 → 1.38.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (95) hide show

package/CLAUDE.md +3 -0
package/_shared/pr-loop/audit-contract.md +4 -3
package/_shared/pr-loop/fix-protocol.md +2 -0
package/_shared/pr-loop/gh-payloads.md +38 -37
package/_shared/pr-loop/scripts/README.md +0 -1
package/_shared/pr-loop/scripts/preflight.py +2 -1
package/_shared/pr-loop/scripts/tests/test_code_rules_gate.py +2 -2
package/_shared/pr-loop/scripts/tests/test_preflight.py +22 -0
package/_shared/pr-loop/state-schema.md +10 -10
package/agents/clean-coder.md +4 -0
package/agents/code-quality-agent.md +23 -85
package/agents/groq-coder.md +8 -6
package/hooks/blocking/__init__.py +0 -0
package/hooks/blocking/hedging_language_blocker.py +2 -2
package/hooks/blocking/state_description_blocker.py +243 -0
package/hooks/blocking/tdd_enforcer.py +94 -0
package/hooks/blocking/test_hedging_language_blocker.py +1 -1
package/hooks/blocking/test_state_description_blocker.py +618 -0
package/hooks/blocking/test_tdd_enforcer.py +152 -0
package/hooks/config/state_description_blocker_constants.py +130 -0
package/hooks/hooks.json +10 -0
package/package.json +1 -1
package/rules/gh-paginate.md +4 -50
package/rules/no-historical-clutter.md +57 -0
package/scripts/config/groq_bugteam_config.py +13 -5
package/skills/bugteam/CONSTRAINTS.md +20 -27
package/skills/bugteam/EXAMPLES.md +1 -1
package/skills/bugteam/PROMPTS.md +78 -42
package/skills/bugteam/SKILL.md +76 -63
package/skills/bugteam/SKILL_EVALS.md +12 -12
package/skills/bugteam/reference/audit-and-teammates.md +21 -48
package/skills/bugteam/reference/audit-contract.md +7 -7
package/skills/bugteam/reference/github-pr-reviews.md +31 -31
package/skills/bugteam/reference/team-setup.md +1 -1
package/skills/bugteam/reference/teardown-publish-permissions.md +4 -4
package/skills/copilot-review/SKILL.md +7 -14
package/skills/findbugs/SKILL.md +2 -2
package/skills/fixbugs/SKILL.md +1 -1
package/skills/monitor-open-prs/SKILL.md +6 -6
package/skills/pr-converge/SKILL.md +7 -6
package/skills/pr-converge/reference/convergence-gates.md +46 -44
package/skills/pr-converge/reference/examples.md +4 -4
package/skills/pr-converge/reference/fix-protocol.md +8 -8
package/skills/pr-converge/reference/multi-pr-orchestration.md +10 -10
package/skills/pr-converge/reference/per-tick.md +24 -36
package/skills/pr-converge/reference/stop-conditions.md +7 -7
package/skills/pr-converge/scripts/README.md +65 -117
package/skills/pr-review-responder/EXAMPLES.md +2 -2
package/skills/pr-review-responder/PRINCIPLES.md +2 -8
package/skills/pr-review-responder/README.md +7 -48
package/skills/pr-review-responder/SKILL.md +2 -3
package/skills/pr-review-responder/TESTING.md +8 -65
package/skills/qbug/SKILL.md +10 -16
package/_shared/pr-loop/scripts/config/gh_util_constants.py +0 -31
package/_shared/pr-loop/scripts/gh_util.py +0 -193
package/_shared/pr-loop/scripts/tests/test_gh_util.py +0 -257
package/_shared/pr-loop/scripts/tests/test_gh_util_constants.py +0 -61
package/skills/pr-converge/scripts/check_pr_mergeability.py +0 -78
package/skills/pr-converge/scripts/config/pr_converge_constants.py +0 -118
package/skills/pr-converge/scripts/config/test_pr_converge_constants.py +0 -152
package/skills/pr-converge/scripts/fetch_bugbot_inline_comments.py +0 -70
package/skills/pr-converge/scripts/fetch_bugbot_reviews.py +0 -57
package/skills/pr-converge/scripts/fetch_claude_inline_comments.py +0 -70
package/skills/pr-converge/scripts/fetch_claude_reviews.py +0 -61
package/skills/pr-converge/scripts/fetch_copilot_inline_comments.py +0 -70
package/skills/pr-converge/scripts/fetch_copilot_reviews.py +0 -61
package/skills/pr-converge/scripts/mark_pr_ready.py +0 -54
package/skills/pr-converge/scripts/post-bugbot-run.helpers.ps1 +0 -49
package/skills/pr-converge/scripts/post-bugbot-run.ps1 +0 -33
package/skills/pr-converge/scripts/reply_to_inline_comment.py +0 -84
package/skills/pr-converge/scripts/request_copilot_review.py +0 -71
package/skills/pr-converge/scripts/resolve_pr_head.py +0 -58
package/skills/pr-converge/scripts/review_field_helpers.py +0 -43
package/skills/pr-converge/scripts/reviewer_fetch_core.py +0 -153
package/skills/pr-converge/scripts/reviewer_specs.py +0 -98
package/skills/pr-converge/scripts/test_check_pr_mergeability.py +0 -126
package/skills/pr-converge/scripts/test_fetch_bugbot_inline_comments.py +0 -443
package/skills/pr-converge/scripts/test_fetch_bugbot_reviews.py +0 -299
package/skills/pr-converge/scripts/test_fetch_claude_inline_comments.py +0 -485
package/skills/pr-converge/scripts/test_fetch_claude_reviews.py +0 -368
package/skills/pr-converge/scripts/test_fetch_copilot_inline_comments.py +0 -440
package/skills/pr-converge/scripts/test_fetch_copilot_reviews.py +0 -366
package/skills/pr-converge/scripts/test_mark_pr_ready.py +0 -69
package/skills/pr-converge/scripts/test_post_bugbot_run.py +0 -195
package/skills/pr-converge/scripts/test_reply_to_inline_comment.py +0 -159
package/skills/pr-converge/scripts/test_request_copilot_review.py +0 -101
package/skills/pr-converge/scripts/test_resolve_pr_head.py +0 -79
package/skills/pr-converge/scripts/test_review_field_helpers.py +0 -80
package/skills/pr-converge/scripts/test_reviewer_fetch_core.py +0 -448
package/skills/pr-converge/scripts/test_reviewer_specs.py +0 -107
package/skills/pr-converge/scripts/test_trigger_bugbot.py +0 -139
package/skills/pr-converge/scripts/test_view_pr_context.py +0 -111
package/skills/pr-converge/scripts/trigger_bugbot.py +0 -77
package/skills/pr-converge/scripts/view_pr_context.py +0 -47
package/skills/pr-review-responder/scripts/respond_to_reviews.py +0 -376

package/skills/bugteam/PROMPTS.md CHANGED Viewed

@@ -10,21 +10,40 @@ Keep the spawn prompt self-contained: reference only the PR scope, audit rubric,
   <branch>head ref</branch>
   <base_branch>base ref</base_branch>
   <pr_url>full URL</pr_url>
-  <loop>N</loop>
+  <loop>L</loop>
   <pr_number>N</pr_number>
   <worktree_path>absolute path from Step 1 per-PR workspace</worktree_path>
 </context>
-cd into `<worktree_path>` before any git, gh, or file operation.
+cd into `<worktree_path>` before any git or file operation.
 <scope>
   <diff_path>Absolute path to the per-PR patch file: <run_temp_dir>/pr-<N>/loop-<L>.patch (same path as gh pr diff redirect in AUDIT)</diff_path>
   <scope_rule>Audit only lines added or modified in the diff. Pre-existing code on untouched lines is out of scope.</scope_rule>
+  <changed_files_rule>Build the list of changed file paths from the diff. Open each one with Read and audit cross-file consistency. Read every changed test file and cross-reference test assertions, expected values, and mock setup against the production code's config constants and function signatures. When a test file asserts a value that diverges from config, file a finding under category J.</changed_files_rule>
 </scope>
 <bug_categories>
-  Investigate each category explicitly. For each, return either at least
-  one finding OR a verified-clean entry with the evidence used to clear it:
+  Investigate each of the eleven categories (A–K) explicitly. For each,
+  return either at least one finding OR a verified-clean entry with the
+  evidence used to clear it. A category is verified-clean only when one
+  complete execution path through the changed code has been traced from
+  entry to exit. Surface-level scanning is insufficient evidence. The
+  evidence field must name (1) the specific function examined, (2) the
+  code path traced from entry to exit, and (3) the specific check performed.
+  Generic phrases such as "verified clean", "no issues found",
+  "pattern appears correct", "looks good", "seems fine", and
+  "no problems detected" do not satisfy the verified-clean requirement.
+  When evidence contains any of these phrases, the category is not
+  verified-clean -- re-audit with a concrete trace.
+  Categories A–K (one-line summary; full rubric and sub-bucket decomposition
+  for each is in `packages/claude-dev-env/audit-rubrics/category_rubrics/`;
+  ready-to-send Variant C prompts — each with a PR/repo-independent
+  generalized skeleton above a `---` separator and a worked example against
+  an authentic PR below — are in
+  `packages/claude-dev-env/audit-rubrics/prompts/`):
   A. API contract verification (signatures, return types, async/await correctness)
   B. Selector / query / engine compatibility
   C. Resource cleanup and lifecycle (file handles, connections, processes, locks)
@@ -35,6 +54,10 @@ cd into `<worktree_path>` before any git, gh, or file operation.
   H. Security boundaries (injection, path traversal, auth bypass, secret leakage)
   I. Concurrency hazards (race conditions, missing awaits, shared mutable state)
   J. Magic values and configuration drift
+  K. Codebase conflicts — a change updates one site of a pattern but a parallel
+     site in unchanged code stays stale, producing contradictory behavior;
+     the diff is internally consistent, the bug emerges only against unchanged
+     code (canonical example: jl-cmd/claude-code-config PR #397 r3210166636)
 </bug_categories>
 <constraints>
@@ -42,51 +65,57 @@ cd into `<worktree_path>` before any git, gh, or file operation.
   - Cite file:line for every finding.
   - When the diff alone does not provide enough context to confirm a bug,
     list it under "Open questions" rather than assert it.
+  - For every finding, search `git grep` for all callers of the targeted function. When the obvious fix would silently change behavior for other call paths, include a fix constraint that preserves them.
 </constraints>
 <comment_posting>
-  1. Audit the diff against the 10 categories above. Buffer the findings
-     in memory; all posting happens at step 6 once anchors are validated.
-  2. Assign each finding a stable finding_id of exactly the form `loopN-K`
-     where K is 1-based within this loop.
-  3. Validate every finding's (file, line) against the captured diff. Split
-     findings into two buckets: anchored (line is in the diff) and
-     unanchored (line is not in the diff — goes into the review body's
-     "Findings without a diff anchor" section per Step 2.5).
-  4. Build the review body per Step 2.5's review-body shape, filling in the
-     P0/P1/P2 counts and the unanchored-findings list (if any).
-  5. For each anchored finding, write its body to its own temp file:
+  Sibling auditors (-b through -k): run only steps 1–2 (audit, assign IDs,
+  capture excerpt, validate anchors), then write outcome XML per <output_format> and return.
+  Skip steps 3–5 — sibling auditors do not post PR reviews.
+  Validator (-a) and single-opus auditors: run all steps below.
+  1. Audit the diff against the 11 categories above. Buffer the findings
+     in memory; all posting happens at step 4 once anchors are validated.
+  2. Assign each finding a stable finding_id of exactly the form `loop<L>-<K>`
+     where <K> is 1-based within this loop.
+  3. For each finding, capture a verbatim excerpt from the target file at the cited
+     line. Populate the `<excerpt>` element in the outcome XML with it. Validate
+     every finding's (file, line) against the captured diff. Split findings into two
+     buckets: anchored (line is in the diff) and unanchored (line is not in the diff
+     — goes into the review body's "Findings without a diff anchor" section per
+     Step 2.5). Format each finding body as:
        **[severity] one-line title**
        Category: <letter> (<category name>)
        <2-3 sentence description with concrete trace>
-       _From /bugteam audit loop N._
-  6. Post ONE review via Step 2.5's per-loop review CLI shape. Harvest the
-     parent review `html_url` from the response JSON and the `comments[]`
-     child entries (each with its own `id` and `html_url`). Match child
-     entries to anchored findings in index order.
-  7. If the review POST itself fails, use Step 2.5's Review POST failure
-     fallback (single issue comment with full body and all findings inline).
-  8. Write every body (review body, each finding body, any fallback body)
-     to its own temp file. Load each file into the JSON payload via jq's
-     `--rawfile` or `-Rs`, then pipe the jq output to `gh api ... --input -`
-     so every body reaches GitHub as file contents inside the JSON payload.
+       _From /bugteam audit loop <L>._
+  4. Post ONE review via `pull_request_review_write(method="create",
+     event="COMMENT", body=<review_body>, owner=<O>, repo=<R>,
+     pullNumber=<N>, comments=[...])`. See Step 2.5 in SKILL.md for the full
+     parameter shape. Harvest the parent review `html_url` from the response
+     and the `comments[]` child entries (each with its own `id` and `html_url`).
+     Match child entries to anchored findings in index order.
+  5. If the review POST fails, use `add_issue_comment(owner=<O>, repo=<R>,
+     issueNumber=<N>, body=<full_text>)` as fallback.
+  Body text is passed directly as string parameters to the MCP tool calls —
+  no temp files, no jq, no shell pipes.
 </comment_posting>
 <output_format>
-  For the primary (-a) auditor: write the outcome XML below to .bugteam-pr<N>-loop<L>.outcomes.xml inside
-  the PR's worktree directory (<worktree_path>). For sibling auditors (-b/-c): write to <run_temp_dir>/pr-<N>/loop-<L>-{b,c}.outcomes.xml (absolute path passed in prompt). Return only that path on stdout. The schema:
+  For the (-a) validator: write the outcome XML below to .bugteam-pr<N>-loop<L>.outcomes.xml inside
+  the PR's worktree directory (<worktree_path>). For sibling auditors (-b through -k): write to <run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml (absolute path passed in prompt). Sibling auditors do not post PR reviews; set review_url, finding_comment_id, and finding_comment_url to empty strings, and used_fallback to "false". Omit unanchored findings from sibling output — only the validator handles those. Return only that path on stdout. The schema:
 </output_format>
 ```
 ## AUDIT outcome XML schema (bugfind writes this)
 ```xml
-<bugteam_audit loop="<N>" review_url="<url>">
+<bugteam_audit loop="<L>" review_url="<url>">
   <finding
-    finding_id="loop<N>-<index>"
+    finding_id="loop<L>-<K>"
     severity="P0|P1|P2"
     category="<letter>"
     file="<path>"
@@ -96,6 +125,7 @@ cd into `<worktree_path>` before any git, gh, or file operation.
     used_fallback="true|false"
   >
     <title>one-line title</title>
+    <excerpt>verbatim source line or snippet from the file at the cited line</excerpt>
     <description>2-3 sentence description with concrete trace</description>
   </finding>
   <verified_clean>
@@ -114,17 +144,17 @@ After the teammate writes the XML and returns, the lead reads `.bugteam-pr<N>-lo
   <branch>head</branch>
   <base_branch>base</base_branch>
   <pr_url>url</pr_url>
-  <loop>N</loop>
+  <loop>L</loop>
   <pr_number>N</pr_number>
   <worktree_path>absolute path from Step 1 per-PR workspace</worktree_path>
 </context>
-cd into `<worktree_path>` before any git, gh, or file operation.
+cd into `<worktree_path>` before any git or file operation.
 <bugs_to_fix>
   [for each P0/P1/P2 finding from last_findings:]
   <bug
-    finding_id="loop<N>-<index>"
+    finding_id="loop<L>-<K>"
     severity="P0|P1|P2"
     file="<path>"
     line="<int>"
@@ -140,25 +170,28 @@ cd into `<worktree_path>` before any git, gh, or file operation.
   1. Read each referenced file before editing.
   2. Apply each fix you can address.
   3. Run `python -m py_compile` (or language-equivalent) on every modified file.
-  4. git add by explicit path, then git commit with a message summarizing the bugs fixed.
+  4. Run the project's test suite and confirm all existing tests pass. If a test fails, diagnose the regression and fix it before committing.
+  5. Read the previous loop's outcome XML (`<worktree_path>/.bugteam-pr<N>-loop<L-1>.outcomes.xml`) and obtain its total finding count. If this is the first loop (L <= 1) or the file does not exist, skip this comparison. Otherwise, re-read each changed file and count any new violations. Compute the post-fix total: previous total minus bugs fixed in this round plus new violations. If the post-fix total exceeds the previous total, flag all new findings as same-loop fix-targets and revise before committing.
+  6. git add by explicit path, then git commit with a message summarizing the bugs fixed.
      - If the commit fails because a git hook (pre-commit, commit-msg, etc.) blocked it,
        capture the hook's stderr, write status=hook_blocked for every finding in this loop
        (the commit was atomic; if it failed, no finding was applied), populate hook_output
        on each outcome, and return WITHOUT retrying. The lead will treat this loop as no-progress.
-  5. git push with a plain fast-forward push (the default, no flag overrides).
-  6. For each bug, post a fix reply to its finding_comment_id via the
-     Step 2.5 reply CLI shape:
+  7. git push with a plain fast-forward push (the default, no flag overrides).
+  8. For each bug, post a fix reply to its finding_comment_id via
+     `add_reply_to_pull_request_comment(commentId=<id>, body=<reply_text>,
+     owner=<O>, repo=<R>, pullNumber=<N>)`:
      - "Fixed in <commit_sha>" if the bug was addressed by your commit
      - "Could not address this loop: <one-line reason>" if you skipped or failed it
      - "Hook blocked the fix commit: <one-line summary>" if the commit was hook-blocked
-     Use the Fix reply CLI shape from Step 2.5 (`jq -Rs | gh api .../comments/<id>/replies --input -`). Write every reply body to a temp file first.
-  7. Write `.bugteam-pr<N>-loop<L>.outcomes.xml` inside `<worktree_path>` (schema below) and return its path.
+     Body text is passed directly as string parameters -- no temp files, no jq, no shell pipes.
+  9. Write `.bugteam-pr<N>-loop<L>.fix-outcomes.xml` inside `<worktree_path>` (schema below) and return its path.
 </execution>
 <outcome_xml_schema>
-  <bugteam_fix loop="<N>" commit_sha="<sha or empty if no commit>">
+  <bugteam_fix loop="<L>" commit_sha="<sha or empty if no commit>">
     <outcome
-      finding_id="loop<N>-<index>"
+      finding_id="loop<L>-<K>"
       status="fixed|could_not_address|hook_blocked"
       commit_sha="<sha if fixed, empty otherwise>"
       reply_comment_id="<id of the reply posted>"
@@ -179,5 +212,8 @@ cd into `<worktree_path>` before any git, gh, or file operation.
   - git add by explicit path — name each file being staged.
   - Preserve existing comments on lines you do not modify.
   - Type hints on every signature you touch.
+  - **Narrow scope.** Fix only the exact defect at the specified file:line. No restructuring, no inlining helpers, no renames, no "while I'm here" cleanup.
+  - **Preserve helpers.** Do not remove or inline existing helper functions unless the finding explicitly names the helper as the problem.
+  - **No regression.** Before committing, re-read each changed file and count any new violations. Compare the post-fix total (previous total minus bugs fixed plus new violations) against the previous loop's total finding count (from `<worktree_path>/.bugteam-pr<N>-loop<L-1>.outcomes.xml`). On the first loop (L <= 1) or when the file does not exist, skip this guard. The post-fix total must be flat or decreased relative to the previous loop. An increase means the fix introduced new bugs — revise before committing. Do not commit a regression.
 </constraints>
 ```

package/skills/bugteam/SKILL.md CHANGED Viewed

@@ -120,9 +120,9 @@ Non-zero → stop. Revoke in Step 5 on every exit path.
 ### Step 1: Resolve PR scope (once)
-Accept one or more PR numbers from the invocation. For each PR, run `gh pr view
---json number,baseRefName,headRefName,url` (falling back to the merge-base diff
-path when no PR exists). Capture `all_prs = [{number, owner, repo, baseRef,
+Accept one or more PR numbers from the invocation. For each PR, call
+`pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` (falling back
+to the merge-base diff path when no PR exists). Capture `all_prs = [{number, owner, repo, baseRef,
 headRef, url}, ...]`. A single-PR invocation produces a one-element list and
 follows the same downstream rules.
@@ -184,43 +184,36 @@ only PR write before Step 4.5 is the final description rewrite.
 Order: audit → buffer → validate anchors vs diff → single review POST.
 Review body states counts; zero findings → still one review, `comments: []`,
-body `## /bugteam loop <N> audit: 0P0 / 0P1 / 0P2 → clean`.
+body `## /bugteam loop <L> audit: 0P0 / 0P1 / 0P2 → clean`.
-**Payloads:** build JSON with `jq --rawfile` / `-Rs`, pipe to `gh api ...
---input -` (avoids shell-quoting; satisfies `gh-body-backtick-guard`). Write
-each markdown body to a temp file first.
+**Payloads:** Use MCP tool calls (see below). Body text with markdown (backticks,
+newlines, quotes) passes through safely as string parameters — no temp files, no
+jq, no shell pipes.
 **Review POST** (one `comments[]` object per anchored finding; single-line
 `{path, line, side: "RIGHT", body}`; multi-line add `start_line`, `start_side:
 "RIGHT"`):
 ```
-jq -n \
---rawfile review_body <tmp_review_body.md> \
---arg commit_id "$(git rev-parse HEAD)" \
---rawfile finding_body_1 <tmp_finding_1.md> \
---arg path_1 "<file_1>" \
---argjson line_1 <line_1> \
-[... one finding_body_K / path_K / line_K triple per finding ...] \
-'{
-commit_id: $commit_id,
-event: "COMMENT",
-body: $review_body,
-comments: [
-{path: $path_1, line: $line_1, side: "RIGHT", body: $finding_body_1}
-[, ... ]
-]
-}' \
-| gh api repos/<owner>/<repo>/pulls/<number>/reviews -X POST --input -
+pull_request_review_write(
+  method="create",
+  event="COMMENT",
+  body=<review_body_text>,
+  commitID=<head_sha_at_post_time>,
+  owner=<owner>, repo=<repo>, pullNumber=<number>,
+  comments=[
+    {path: <path_1>, line: <line_1>, side: "RIGHT", body: <finding_body_1>}
+    [, ... ]
+  ]
+)
 ```
-**Fix reply:** `jq -Rs '{body: .}' <tmp_reply.md | gh api
-repos/<owner>/<repo>/pulls/<number>/comments/<finding_comment_id>/replies -X
-POST --input -`
+**Fix reply:** `add_reply_to_pull_request_comment(commentId=<finding_comment_id>,
+body=<reply_text>, owner=<owner>, repo=<repo>, pullNumber=<number>)`
-**Review POST fails:** issue comment fallback: `jq -Rs '{body: .}'
-<tmp_fallback.md | gh api repos/<owner>/<repo>/issues/<number>/comments -X POST
---input -`
+**Review POST fails:** issue comment fallback:
+`add_issue_comment(owner=<owner>, repo=<repo>, issueNumber=<number>,
+body=<fallback_text>)`
 `<head_sha_at_post_time>`: `git rev-parse HEAD` in subagent cwd immediately
 before POST.
@@ -228,7 +221,7 @@ before POST.
 **Review body template (`<tmp_review_body.md>`):**
 ```
-## /bugteam loop <N> audit: <P0>P0 / <P1>P1 / <P2>P2
+## /bugteam loop <L> audit: <P0>P0 / <P1>P1 / <P2>P2
 ### Findings without a diff anchor
 (only if needed)
@@ -263,10 +256,16 @@ and before iteration begins, when `last_action == "fresh"`). A re-invocation of
 cleaned this HEAD (short-circuit) and otherwise records that prior loops were
 dirty so the AUDIT runs against the latest diff with that signal in mind:
-```bash
-dirty_review_count=0
-gh api "repos/<owner>/<repo>/pulls/<number>/reviews?per_page=100" --paginate --slurp \
-  | jq '[.[][] | select((.body // "") | startswith("## /bugteam loop "))] | sort_by(.submitted_at) | reverse'
+```python
+dirty_review_count = 0
+all_reviews = pull_request_read(
+    method="get_reviews", pullNumber=N, owner=O, repo=R
+)
+prior_reviews = [
+    rev for rev in all_reviews
+    if rev.get("body", "").startswith("## /bugteam loop ")
+]
+prior_reviews.sort(key=lambda rev: rev["submitted_at"], reverse=True)
 ```
 Iterate from index 0 (most recent) toward older entries:
@@ -313,16 +312,16 @@ Iterate from index 0 (most recent) toward older entries:
 Lead only; merge-base / diff semantics:
 [`../../_shared/pr-loop/code-rules-gate.md`][path-code-rules]; shared script
 inventory: [`../../_shared/pr-loop/scripts/README.md`][path-scripts-readme].
-Non-zero → spawn **clean-coder** standards-fix (read stderr, edit, re-run
+Non-zero → spawn **clean-coder** standards-fix (`mode="bypassPermissions"`) (read stderr, edit, re-run
 **this same** command, one commit, `git push`, shutdown) until exit **0** or
 **5**
 failed gate rounds → `error: code rules gate failed pre-audit`. After **0**:
 `loop_count += 1`; if `loop_count > 10` → `cap reached`. Then **AUDIT**
-(bugfind); print `Loop <N> audit: ...`.
+(bugfind); print `Loop <L> audit: ...`.
 3. **FIX** (`last_action == "audited"` and `last_findings.total > 0`):
    `loop_count += 1`; if `loop_count > 10` → `cap reached`; **FIX** (bugfix);
-   print `Loop <N> fix: ...`; `last_action = "fixed"`, update `audit_log`; loop
+   print `Loop <L> fix: ...`; `last_action = "fixed"`, update `audit_log`; loop
    to step 1.
 4. After **AUDIT**: update `last_action`, `last_findings`, `audit_log`; print
@@ -335,12 +334,10 @@ before the next AUDIT.
 ### AUDIT action
-```bash
-mkdir -p "<run_temp_dir>/pr-<N>"
-gh pr diff <N> -R <owner>/<repo> > "<run_temp_dir>/pr-<N>/loop-<L>.patch"
-```
-**Spawn:**
+1. Create the directory: `mkdir -p "<run_temp_dir>/pr-<N>"`.
+2. Call `pull_request_read(method="get_diff", pullNumber=N, owner=O, repo=R)`
+   to capture the diff text, then write it to
+   `"<run_temp_dir>/pr-<N>/loop-<L>.patch"` using the `Write` tool.
 ```
 Agent(
@@ -361,18 +358,31 @@ background-completion notification, then reads
 `last_action = "audited"`; append audit line to `audit_log`.
 **Parallel auditors (`loop_count >= 4`):** gate passes immediately before;
-after three full audit/fix rounds without convergence, issue three `Agent`
-calls in one assistant message (`run_in_background=true`): `-a` posts the
-review and merges outcomes from `-b`/`-c` (read
-`.bugteam-pr<N>-loop<L>.outcomes.xml` plus
-`<run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml` and `...-c...`); merge key
-`(file, line, category_letter)`; re-id `loopN-K`. `-b`/`-c` write sibling XML
-only; prompts must pass literal absolute sibling paths. Output path
-contract: `-b`/`-c` write to `<run_temp_dir>/pr-<N>/loop-<L>-b.outcomes.xml`
-and `<run_temp_dir>/pr-<N>/loop-<L>-c.outcomes.xml`; `-a` writes to
-`<worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml`.
-Lead awaits all three background-completion notifications before merging
-outcomes.
+after three full audit/fix rounds without convergence, issue eleven `Agent`
+calls in one assistant message (`run_in_background=true`):
+- **10 haiku auditors (`-b` through `-k`):** `subagent_type="code-quality-agent"`,
+  `model="haiku"`, write sibling XML to
+  `<run_temp_dir>/pr-<N>/loop-<L>-<letter>.outcomes.xml`, skip PR posting.
+  Prompts must pass literal absolute sibling paths.
+- **1 opus validator (`-a`):** `subagent_type="code-quality-agent"`,
+  `model="opus"`:
+  - Polls for all 10 sibling XMLs before proceeding (60s timeout, 2s interval). On timeout: log diagnostics entry, proceed with validated findings from available XMLs, report count in validator output.
+  - Validates each finding: file exists, line in bounds, excerpt contains the exact
+    text of the cited line, category is A–J, severity is P0/P1/P2.
+  - Hallucinated findings → quarantined to `<run_temp_dir>/pr-<N>/loop-<L>-diagnostics.json` under
+    `validator_rejected` (added alongside the required diagnostics keys defined in the shared audit contract).
+  - De-dups by `(file, line, category)`, max severity wins; on conflict, keep longest description text.
+  - Re-ids as `loop<L>-<K>`.
+  - Writes `<worktree_path>/.bugteam-pr<N>-loop<L>.outcomes.xml`, posts review.
+Lead awaits the opus validator (-a) background-completion notification (120s
+timeout). The validator independently polls all 10 sibling XMLs; the lead does
+not gate on haiku peer completion. On lead timeout: the validator did not post
+a merged review — treat as a hard blocker and abort the loop.
+The sibling-output paths in [`PROMPTS.md`](PROMPTS.md) must cover the full
+`-b` through `-k` range.
 ### FIX action
@@ -398,6 +408,8 @@ advanced; `git -C "<run_temp_dir>/pr-<N>/worktree" fetch origin <branch> && git
 `HEAD`. Unchanged HEAD →
 `stuck — bugfix subagent could not address findings`.
+**Scope verification.** Run `git diff HEAD~1 --name-only` and compare against the set of files referenced in bugs_to_fix. When the commit touches any file NOT in the bugs_to_fix list, downgrade the outcome to `unverified_fixed` with reason "commit touched unexpected files: <list>".
 ### Step 4: Teardown
 1. For each PR in `all_prs`: `git worktree remove
@@ -418,16 +430,17 @@ else {'onerror': h}))"
 ### Step 4.5: PR description
 Lead only; cumulative product narrative (not process). Delegate body to
-`pr-description-writer` via `Agent` (else `general-purpose`) so the
-mandatory-pr-description hook accepts `gh pr edit`.
+`pr-description-writer` via `Agent` (`mode="bypassPermissions"`) (else `general-purpose`) so the
+mandatory-pr-description hook accepts `update_pull_request`.
-1. `gh pr diff <number> -R <owner>/<repo> > .bugteam-final.diff`
-2. `gh pr view <number> -R <owner>/<repo> --json body --jq .body >
-   .bugteam-original-body.md`
+1. `pull_request_read(method="get_diff", pullNumber=N, owner=O, repo=R)` → write
+   output to `.bugteam-final.diff` with `Write` tool.
+2. `pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` → extract
+   `.body` from response, write to `.bugteam-original-body.md` with `Write` tool.
 3. Agent brief: paths + branch names; describe merge-ready change from diff;
    keep curated original sections intact; return markdown body.
-4. Write `.bugteam-final-body.md`; `gh pr edit <number> -R <owner>/<repo>
-   --body-file .bugteam-final-body.md`
+4. Write `.bugteam-final-body.md`; `update_pull_request(pullNumber=N, owner=O,
+   repo=R, body=<body_text>)`.
 5. Delete the three temp files.
 On failure: log in final report; continue to Step 5.

package/skills/bugteam/SKILL_EVALS.md CHANGED Viewed

@@ -22,9 +22,9 @@ Each invariant cites the normative section or companion file it derives from. Al
 | I-2 | `Bash` invoking `scripts/revoke_project_claude_permissions.py` runs exactly once per invocation on every exit path, after teardown. | `SKILL.md` § Step 5 |
 | I-3 | Orchestration uses `Agent(..., run_in_background=true)` only — no `TeamCreate`, `TeamDelete`, `SendMessage`, or `Task` tool calls. | `SKILL.md` § Step 2; § Step 4 |
 | I-4 | `Agent` calls are fresh per loop (`run_in_background=true`; new `name` each loop). | `CONSTRAINTS.md` — **Fresh subagent per loop** |
-| I-5 | Audit and fix spawns pass `model="opus"` on every `Agent` call. | `SKILL.md` § AUDIT action; § FIX action; `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for both subagents** |
+| I-5 | Audit sibling spawns pass `model="haiku"`; validator and fix spawns pass `model="opus"`. | `SKILL.md` § AUDIT action (parallel auditors); § FIX action; `CONSTRAINTS.md` — **Opus 4.7 at xhigh effort for validator and fix subagents** |
 | I-6 | Loop count ≤ 10 audits. 11th audit never fires. | `SKILL.md` YAML `description` (10-loop cap); § Step 3 (**Pre-audit** / **FIX** increment rules) |
-| I-7 | From loop 4 onward without convergence, three parallel `Agent(..., run_in_background=true)` calls in one message for audit. | `SKILL.md` § AUDIT action (**Parallel auditors**) |
+| I-7 | From loop 4 onward without convergence, eleven parallel `Agent(..., run_in_background=true)` calls in one message for audit. | `SKILL.md` § AUDIT action (**Parallel auditors**) |
 | I-8 | Lead reads `.bugteam-pr<N>-loop<L>.outcomes.xml` with the `Read` tool after each audit, before the next action. | `SKILL.md` § AUDIT action |
 | I-9 | Teardown sequence: `git worktree remove` each PR → `rmtree` `<run_temp_dir>` → Step 4.5 → revoke. | `SKILL.md` § Step 4; § Step 4.5; § Step 5 |
 | I-10 | The bugfind subagent posts ONE per-loop review; the bugfix subagent posts fix replies. The lead's only PR-write action is the Step 4.5 description rewrite. | `CONSTRAINTS.md` — **Audit/fix comment posting** |
@@ -68,7 +68,7 @@ The harness does not yet exist; this document defines its contract.
 **Scenario.** Current branch is `main` with no PR and no upstream difference.
 **Layer B predicted trace.**
-1. `Bash("gh pr view --json ...")` → non-zero exit.
+1. `pull_request_read(method="get", pullNumber=N, owner=O, repo=R)` → fails / no matching PR.
 2. `Bash("git merge-base HEAD origin/main")` → empty.
 3. No grant script.
@@ -103,10 +103,10 @@ The harness does not yet exist; this document defines its contract.
 | # | Tool call | Source |
 |---|---|---|
 | 1 | `Bash("python .../scripts/grant_project_claude_permissions.py")` | `SKILL.md` § Step 0 |
-| 2 | `Bash("gh pr view --json number,baseRefName,headRefName,url")` | `SKILL.md` § Step 1 |
+| 2 | `pull_request_read(method="get", pullNumber=42, owner=..., repo=...)` | `SKILL.md` § Step 1 |
 | 3 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → captures `starting_sha` | `SKILL.md` § Step 2 — **Loop state** block |
 | 4 | `Bash("mkdir -p <run_temp_dir>/pr-42")` | `SKILL.md` § AUDIT action |
-| 5 | `Bash("gh pr diff 42 -R ... > <run_temp_dir>/pr-42/loop-1.patch")` | `SKILL.md` § AUDIT action |
+| 5 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` → write to `<run_temp_dir>/pr-42/loop-1.patch` | `SKILL.md` § AUDIT action |
 | 6 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop1", run_in_background=true, model="opus", description=..., prompt=<audit XML loop 1>)` | `SKILL.md` § AUDIT action |
 | 7 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
 | 8 | `Read(".bugteam-pr42-loop1.outcomes.xml")` | `SKILL.md` § AUDIT action |
@@ -116,17 +116,17 @@ The harness does not yet exist; this document defines its contract.
 | 12 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse HEAD")` → verify HEAD advanced | `SKILL.md` § FIX action (**Verify**) |
 | 13 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" fetch origin <branch>")` → fetch remote state | `SKILL.md` § FIX action (**Verify**) |
 | 14 | `Bash("git -C \"<run_temp_dir>/pr-42/worktree\" rev-parse origin/<branch>")` → confirm matches HEAD | `SKILL.md` § FIX action (**Verify**) |
-| 15 | `Bash("gh pr diff 42 -R ... > <run_temp_dir>/pr-42/loop-2.patch")` | `SKILL.md` § AUDIT action |
+| 15 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` → write to `<run_temp_dir>/pr-42/loop-2.patch` | `SKILL.md` § AUDIT action |
 | 16 | `Agent(subagent_type="code-quality-agent", name="bugfind-pr42-loop2", run_in_background=true, ...)` (loop 2) | `SKILL.md` § AUDIT action |
 | 17 | Lead awaits background-completion notification | `SKILL.md` § AUDIT action |
 | 18 | `Read(".bugteam-pr42-loop2.outcomes.xml")` — zero findings | `SKILL.md` § AUDIT action |
 | 19 | `Bash("git worktree remove \"<run_temp_dir>/pr-42/worktree\"")` | `SKILL.md` § Step 4 step 1 |
 | 20 | `Bash("python -c \"...shutil.rmtree(r'<run_temp_dir>', ...)\"")` | `SKILL.md` § Step 4 step 2 (Windows-safe teardown) |
-| 21 | `Bash("gh pr diff 42 -R ... > .bugteam-final.diff")` | `SKILL.md` § Step 4.5 step 1 |
-| 22 | `Bash("gh pr view 42 -R ... --json body --jq .body > .bugteam-original-body.md")` | `SKILL.md` § Step 4.5 step 2 |
+| 21 | `pull_request_read(method="get_diff", pullNumber=42, owner=..., repo=...)` → write to `.bugteam-final.diff` | `SKILL.md` § Step 4.5 step 1 |
+| 22 | `pull_request_read(method="get", pullNumber=42, owner=..., repo=...)` → extract `.body`, write to `.bugteam-original-body.md` | `SKILL.md` § Step 4.5 step 2 |
 | 23 | `Agent(subagent_type="pr-description-writer", description=..., prompt=<brief>)` | `SKILL.md` § Step 4.5 |
 | 24 | `Write(".bugteam-final-body.md", <returned body>)` | `SKILL.md` § Step 4.5 step 4 |
-| 25 | `Bash("gh pr edit 42 -R ... --body-file .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 4 |
+| 25 | `update_pull_request(pullNumber=42, owner=..., repo=..., body=...)` | `SKILL.md` § Step 4.5 step 4 |
 | 26 | `Bash("rm .bugteam-final.diff .bugteam-original-body.md .bugteam-final-body.md")` | `SKILL.md` § Step 4.5 step 5 |
 | 27 | `Bash("python .../scripts/revoke_project_claude_permissions.py")` | `SKILL.md` § Step 5 |
@@ -171,14 +171,14 @@ Patch this table to match observation and annotate each correction.
 **Layer B predicted behavior.**
 - Loops 1–3: single `Agent(name="bugfind-pr<N>-loop<L>", run_in_background=true)` per loop.
-- Loops 4–10: three parallel `Agent(name="bugfind-pr<N>-loop<L>-[abc]", run_in_background=true)` in a single assistant message per loop; lead awaits all three notifications then merges outcomes.
+- Loops 4–10: eleven parallel `Agent(name="bugfind-pr<N>-loop<L>-[a..k]", run_in_background=true)` in a single assistant message per loop (10 haiku + 1 opus validator); lead awaits the validator notification.
 - Each loop produces one `Agent(name="bugfix-pr<N>-loop<L>", run_in_background=true)`.
 - Exactly 10 audit phases, exactly 10 fix phases.
 - Steps 19–26 from Eval 5 fire at teardown.
 **Pass criteria.**
 - I-6 holds: exactly 10 audit phases.
-- I-7 holds: loops 4–10 each emit three audit `Agent` calls in a single assistant message.
+- I-7 holds: loops 4–10 each emit eleven audit `Agent` calls in a single assistant message.
 - Final report contains `/bugteam exit: cap reached` and the remaining bug count.
 **Process check.** The distinct `Agent(name=...)` audit-call count is a prediction. On the first real run, record the exact count and rewrite the formula here.
@@ -224,7 +224,7 @@ Patch this table to match observation and annotate each correction.
 - Every finding's outcome XML carries `used_fallback="true"` and the issue-comment URL as `finding_comment_url`.
 - Cycle continues to the FIX action without aborting.
-**Open item for the real run.** The issue-comments fallback shape is `jq -Rs | gh api .../issues/<number>/comments --input -` (`SKILL.md` § Step 2.5 **Review POST fails**; full narrative in `reference/github-pr-reviews.md` § **Review POST failure fallback**). Before running Eval 10 for real, confirm the teammate obeys this shape — the fixture must assert the endpoint path and the `--input -` pattern.
+**Open item for the real run.** The issue-comments fallback uses `add_issue_comment(owner=..., repo=..., issueNumber=42, body=...)` (`SKILL.md` § Step 2.5 **Review POST fails**; full narrative in `reference/github-pr-reviews.md` § **Review POST failure fallback**). Before running Eval 10 for real, confirm the teammate obeys this shape — the fixture must assert the `add_issue_comment` tool call.
 ---